Skip to content

Commit

Permalink
Reset DataLoader workers instead of creating new ones (#35795)
Browse files Browse the repository at this point in the history
Summary:
This PR needs discussion as it changes the behavior of `DataLoader`. It can be closed if its not considered a good practice.

Currently, the `DataLoader` spawns a new `_BaseDataLoaderIter` object every epoch,
In the case of the multiprocess DataLoader, every epoch the worker processes are re-created and they make a copy of the original `Dataset` object.
If users want to cache data or do some tracking on their datasets, all their data will be wiped out every epoch. Notice that this doesn't happen when the number of workers is 0. giving some inconsistencies with the multiprocess and serial data loaders.

This PR keeps the `_BaseDataLoaderIter` object alive and just resets it within epochs, so the workers remain active and so their own `Dataset` objects. People seem to file issues about this often.

Pull Request resolved: pytorch/pytorch#35795

Reviewed By: ailzhang

Differential Revision: D23426612

Pulled By: VitalyFedyunin

fbshipit-source-id: e16950036bae35548cd0cfa78faa06b6c232a2ea
  • Loading branch information
Emilio Castillo authored and facebook-github-bot committed Sep 1, 2020
1 parent db6bd9d commit 5472426
Show file tree
Hide file tree
Showing 4 changed files with 301 additions and 129 deletions.
3 changes: 2 additions & 1 deletion docs/source/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,8 @@ These options are configured by the constructor arguments of a
DataLoader(dataset, batch_size=1, shuffle=False, sampler=None,
batch_sampler=None, num_workers=0, collate_fn=None,
pin_memory=False, drop_last=False, timeout=0,
worker_init_fn=None, *, prefetch_factor=2)
worker_init_fn=None, *, prefetch_factor=2,
persistent_workers=False)

The sections below describe in details the effects and usages of these options.

Expand Down
Loading

0 comments on commit 5472426

Please sign in to comment.