data
collate_replace_corrupted(batch, dataset, default_collate_fn=None, max_retries=100)
Collate function that handles corrupted examples in a batch by replacing them with valid ones.
This function is designed to prevent training interruptions due to data corruption. It logs a warning to alert the user about the number of corrupted samples found.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
batch
|
Any
|
The batch of data from the DataLoader. |
required |
dataset
|
Dataset
|
The dataset being used, which should return |
required |
default_collate_fn
|
Callable | None
|
The default collate function to use once the batch is clean. |
None
|
max_retries
|
int
|
Maximum number of retry iterations to prevent infinite loops when replacements are also corrupted. Defaults to 100. |
100
|
Returns:
| Type | Description |
|---|---|
Any
|
A batch with corrupted examples replaced by valid ones. |
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If max_retries is reached and corrupted samples still remain, indicating a high corruption rate in the dataset. |