-
Notifications
You must be signed in to change notification settings - Fork 23
Open
Description
First, thanks your implementation of so many CQL. The below question are some related to your implementation, and some are related to CQL itself.
- why the returned value of function
_compute_policy_valuesin CQL-SAC isqs1 - log_pis.detach(), qs2 - log_pis.detach()with detached log_pis, I think it should not be detached. - what is the meaning of
self.tempandself.cql_weightin CQL-SAC?I thinkself.cql_weightis duplicated ascql_alphahas a similar meaning. - Is it essential to use two q states in cql?
- In CQL-SAC-Discrete, I think the
q1insidecql1_scaled_loss = torch.logsumexp(q1, dim=1).mean() - q1.mean()should be an expect over all optional q(s,a), but not the best one, am I wrong?
5.In CQL-SAC, whyretain_graph=Truefor the Lagrange and critic optimizer? - the most important question: according to p29 from paper, for continuous action, to calc the
logsumexpobject, both q from uniform and q from pi are used, but why also use actions from pi here? I asked also here, but still at a loss.
And I know some CQL question should be ask from the original repo, but the author is no longer active.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels