-
Notifications
You must be signed in to change notification settings - Fork 103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add bpb and n_bytes to metric logging #41
Conversation
bytelatent/metrics.py
Outdated
if self.fs is None: | ||
self.jsonl_writer = open(self.outdir, "a") | ||
else: | ||
self.jsonl_writer = self.fs.open(self.outdir, "a") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What functionality does this add?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually (once we fix torch.distributed.checkpointing.save
to work with fsspec
), all of dump_dir
should be compatible with writing to nfs or s3/blob store.
bytelatent/train.py
Outdated
@@ -403,6 +419,24 @@ def train(args: TrainArgs): | |||
batch_patch_lengths = torch.from_numpy(batch.patch_lengths).cuda() | |||
mask = None if batch.mask is None else torch.from_numpy(batch.mask).cuda() | |||
|
|||
if args.data.tokenizer_args.name in ["bytes", "blt"]: | |||
if mask is None: | |||
n_bytes += batch_y.numel() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_bytes += mask.sum() if mask else batch_y.numel()
bytelatent/train.py
Outdated
f" grad: {grad_norm:.2e}" | ||
f" flops: {FLOPS:.2e}" | ||
f" wps: {wps:.2e}" | ||
f" iter: {curr_iter_time:>7}" | ||
f" data: {data_load_time:>5}" | ||
f" lr: {curr_lr:.2e}" | ||
f" n_bytes={total_n_bytes}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is confusing, we need a way to indicate per_gpu vs. across all gpus
bytelatent/train.py
Outdated
logger.info( | ||
f"step: {train_state.step}" | ||
f" acc: {train_state.acc_step}" | ||
f" loss: {round(loss.item(),4):>7}" | ||
f" loss: step={round(loss.item(),4):>7} avg={avg_loss}" | ||
f" bpb: {avg_bpb:3f}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is inconsistent, we normally report bpb per gpu
4e2ed0a
to
b6396eb
Compare
bytelatent/train.py
Outdated
logger.info( | ||
f"step: {train_state.step}" | ||
f" acc: {train_state.acc_step}" | ||
f" loss: {round(loss.item(),4):>7}" | ||
f" loss: [step_local={round(step_loss_per_gpu, 4):>7} interval_local={round(interval_loss_per_gpu, 4):>7} step_global={round(step_loss_across_gpus, 4):>7} interval_global={round(interval_loss_across_gpus, 4):>7}]" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is too many things - I think interval local and interval global is good enough
Summary: Test Plan:
Summary:
Test Plan: