Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add per-sample gradient norm computation as a functionality (#724)
Summary: Per-sample gradient norm is computed for Ghost Clipping, but it can be useful generally. Exposed it as a functionality. ``` ... loss.backward() per_sample_norms = model.per_sample_gradient_norms ``` Reviewed By: iden-kalemaj Differential Revision: D68634969
- Loading branch information