We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dear Authors,
In Line 97 (bilevel.py), it calculates the implicit gradient using the following code:
implicit_gradient = -args.lr2 * mask_grad_vec * param_grad_vec
Here, mask_grad_vec consists of non-zero mask gradients, and zero weight gradients, because of the pruning mode.
However, in the paper Algorithm A1 Bip, Line 5, the multiplication involves the gradient of weights ∇θ tr(m * θ), rather than that of a mask.
Can you help me understand this or is this possibly a bug?
Best
The text was updated successfully, but these errors were encountered:
I'm having the same problem. I changed switch_to_prune to switch_to_bilevel. but I'm not sure if that's correct.
Sorry, something went wrong.
me too
No branches or pull requests
Dear Authors,
In Line 97 (bilevel.py), it calculates the implicit gradient using the following code:
implicit_gradient = -args.lr2 * mask_grad_vec * param_grad_vec
Here, mask_grad_vec consists of non-zero mask gradients, and zero weight gradients, because of the pruning mode.
However, in the paper Algorithm A1 Bip, Line 5, the multiplication involves the gradient of weights ∇θ tr(m * θ), rather than that of a mask.
Can you help me understand this or is this possibly a bug?
Best
The text was updated successfully, but these errors were encountered: