From efa101a2af3a22e152f004b00f7d9449e778cb84 Mon Sep 17 00:00:00 2001 From: Quentin Lhoest Date: Mon, 3 Jun 2024 20:16:36 +0200 Subject: [PATCH] add note --- docs/source/stream.mdx | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/docs/source/stream.mdx b/docs/source/stream.mdx index ec553552b28..8b1ff745041 100644 --- a/docs/source/stream.mdx +++ b/docs/source/stream.mdx @@ -412,3 +412,9 @@ This can be used with the `StatefulDataLoader` from `torchdata`: >>> # resume from checkpoint >>> dataloader.load_state_dict(state_dict) # uses iterable_dataset.load_state_dict() under the hood ``` + + + +Resuming returns exactly where the checkpoint was saved except in two cases: 1) examples from shuffle buffers are lost when resuming and the buffers are refilled with new data and 2) combinations of `.with_format(arrow)` and batched `.map()` may skip one batch. + +