ask question about algorithm SAC #34

ikelq · 2021-12-04T09:25:28Z

hi, friends, This is Qiong Liu, a Ph.D. student from Tsinghua University.

Thanks for you provide such valuable code. I have some questions about the SAC algorithm.

In class Actor() def forward():
code:
# get std
log_std = self.log_std_layer(x).tanh()
log_std = self.log_std_min + 0.5 * (
self.log_std_max - self.log_std_min
) * (log_std + 1)

I find other example use "log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)". I agree with you that use "tanh" rather than use "clamp" function. Maybe the reason is "tanh" can provide a better gradient. how about your opinions?

In class SACagent() def update_model():
2.1 Why first train actor, then train Q functions? Maybe train Q functions firstly can provide better guidelines for train actor?
2.2 in "advantage = q_pred - v_pred.detach()", after "detach()" for "v_pred.detach()", it no gradient information, so it maybe not affect the updating of actor. Can we just write this as "advantage = q_pred"?

 # v function loss
 v_pred = self.vf(state)
 q_pred = torch.min(
     self.qf_1(state, new_action), self.qf_2(state, new_action)
 )
 v_target = q_pred - alpha * log_prob
 vf_loss = F.mse_loss(v_pred, v_target.detach())
 
 if self.total_step % self.policy_update_freq == 0:
     # actor loss
     advantage = q_pred - v_pred.detach()
     actor_loss = (alpha * log_prob - advantage).mean()
     
     # train actor
     self.actor_optimizer.zero_grad()
     actor_loss.backward()
     self.actor_optimizer.step()
 
     # target update (vf)
     self._target_soft_update()
 else:
     actor_loss = torch.zeros(1)
     
 # train Q functions
 self.qf_1_optimizer.zero_grad()
 qf_1_loss.backward()
 self.qf_1_optimizer.step()

 self.qf_2_optimizer.zero_grad()
 qf_2_loss.backward()
 self.qf_2_optimizer.step()

Looking forward to your reply!
Thanks very much.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ask question about algorithm SAC #34

ask question about algorithm SAC #34

ikelq commented Dec 4, 2021 •

edited

Loading

ask question about algorithm SAC #34

ask question about algorithm SAC #34

Comments

ikelq commented Dec 4, 2021 • edited Loading

ikelq commented Dec 4, 2021 •

edited

Loading