Question about calculating implicit gradients? #3

yunzhuangshen · 2023-10-23T04:00:46Z

Dear Authors,

In Line 97 (bilevel.py), it calculates the implicit gradient using the following code:

implicit_gradient = -args.lr2 * mask_grad_vec * param_grad_vec

Here, mask_grad_vec consists of non-zero mask gradients, and zero weight gradients, because of the pruning mode.

However, in the paper Algorithm A1 Bip, Line 5, the multiplication involves the gradient of weights ∇θ tr(m * θ), rather than that of a mask.

Can you help me understand this or is this possibly a bug?

Best

qingsenchen · 2023-10-27T02:26:05Z

I'm having the same problem. I changed switch_to_prune to switch_to_bilevel. but I'm not sure if that's correct.

NKUShaw · 2024-12-25T19:01:04Z

me too

Provide feedback