You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi, friends, This is Qiong Liu, a Ph.D. student from Tsinghua University.
Thanks for you provide such valuable code. I have some questions about the SAC algorithm.
In class Actor() def forward():
code:
# get std
log_std = self.log_std_layer(x).tanh()
log_std = self.log_std_min + 0.5 * (
self.log_std_max - self.log_std_min
) * (log_std + 1)
I find other example use "log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)". I agree with you that use "tanh" rather than use "clamp" function. Maybe the reason is "tanh" can provide a better gradient. how about your opinions?
In class SACagent() def update_model():
2.1 Why first train actor, then train Q functions? Maybe train Q functions firstly can provide better guidelines for train actor?
2.2 in "advantage = q_pred - v_pred.detach()", after "detach()" for "v_pred.detach()", it no gradient information, so it maybe not affect the updating of actor. Can we just write this as "advantage = q_pred"?
hi, friends, This is Qiong Liu, a Ph.D. student from Tsinghua University.
Thanks for you provide such valuable code. I have some questions about the SAC algorithm.
code:
# get std
log_std = self.log_std_layer(x).tanh()
log_std = self.log_std_min + 0.5 * (
self.log_std_max - self.log_std_min
) * (log_std + 1)
I find other example use "log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)". I agree with you that use "tanh" rather than use "clamp" function. Maybe the reason is "tanh" can provide a better gradient. how about your opinions?
In class SACagent() def update_model():
2.1 Why first train actor, then train Q functions? Maybe train Q functions firstly can provide better guidelines for train actor?
2.2 in "advantage = q_pred - v_pred.detach()", after "detach()" for "v_pred.detach()", it no gradient information, so it maybe not affect the updating of actor. Can we just write this as "advantage = q_pred"?
Looking forward to your reply!
Thanks very much.
The text was updated successfully, but these errors were encountered: