Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ask question about algorithm SAC #34

Open
ikelq opened this issue Dec 4, 2021 · 0 comments
Open

ask question about algorithm SAC #34

ikelq opened this issue Dec 4, 2021 · 0 comments

Comments

@ikelq
Copy link

ikelq commented Dec 4, 2021

hi, friends, This is Qiong Liu, a Ph.D. student from Tsinghua University.

Thanks for you provide such valuable code. I have some questions about the SAC algorithm.

  1. In class Actor() def forward():
    code:
    # get std
    log_std = self.log_std_layer(x).tanh()
    log_std = self.log_std_min + 0.5 * (
    self.log_std_max - self.log_std_min
    ) * (log_std + 1)

I find other example use "log_std = torch.clamp(log_std, min=LOG_SIG_MIN, max=LOG_SIG_MAX)". I agree with you that use "tanh" rather than use "clamp" function. Maybe the reason is "tanh" can provide a better gradient. how about your opinions?

  1. In class SACagent() def update_model():
    2.1 Why first train actor, then train Q functions? Maybe train Q functions firstly can provide better guidelines for train actor?
    2.2 in "advantage = q_pred - v_pred.detach()", after "detach()" for "v_pred.detach()", it no gradient information, so it maybe not affect the updating of actor. Can we just write this as "advantage = q_pred"?

     # v function loss
     v_pred = self.vf(state)
     q_pred = torch.min(
         self.qf_1(state, new_action), self.qf_2(state, new_action)
     )
     v_target = q_pred - alpha * log_prob
     vf_loss = F.mse_loss(v_pred, v_target.detach())
     
     if self.total_step % self.policy_update_freq == 0:
         # actor loss
         advantage = q_pred - v_pred.detach()
         actor_loss = (alpha * log_prob - advantage).mean()
         
         # train actor
         self.actor_optimizer.zero_grad()
         actor_loss.backward()
         self.actor_optimizer.step()
     
         # target update (vf)
         self._target_soft_update()
     else:
         actor_loss = torch.zeros(1)
         
     # train Q functions
     self.qf_1_optimizer.zero_grad()
     qf_1_loss.backward()
     self.qf_1_optimizer.step()
    
     self.qf_2_optimizer.zero_grad()
     qf_2_loss.backward()
     self.qf_2_optimizer.step()
    

Looking forward to your reply!
Thanks very much.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant