Issues training Trajcetory Conditioned SAC policy

I am having trouble getting the Trajectory conditioned SAC policy to learn. I am using "TC-Driver/TC_Driver/train/train.py" to train the policy with the parameters below. I essentially left all parameters to their default values and tried my best to cross-reference this [paper](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=10755062) . My mode is `Frenet_trajectory`, `ep_len = 10000` , `params_noise =True`, `use_trajectory=True` and I trained for a total of 500,000 steps.  I am also not doing a wandb param sweep over reward penalty and am using the heuristic values suggested in the paper. 

**Would it be possible point me to a correct config file for training hyperparameters or even some pretrained model weights?** 

```python
    env_conf = {
        "mode": mode,
        "arch": arch,
        "map_name": map,
        "map": os.path.join(configs_dir, "{}".format(map)),
        "map_ext": conf.map_ext,
        "random_init": True,
        "sx": conf.sx,
        "sy": conf.sy,
        "stheta": conf.stheta,
        "num_agents": 1,
        "ep_len": ep_len,  # this sets the maximum episode length, it is ~1.5 times the best time for a lap, so it changes from track to track
        "obs_type": mode,
        "params_noise": params_noise,
        "var_mu": (0.075 / 2) ** 2,  # if all are set to 0 no noise is applied
        "var_Csf": 0,
        "var_Csr": 0,
        "redraw_upon_reset": True,
        "angle_limit": ang_deg * np.pi / 180,  # 30 deg
        "use_trajectory": use_trajectory,
        "max_vel": max_vel,
        "display_video": display_video, # TODO not used should be removed
        "curriculum": False, # no curriculum velocity for now
        "policy_type": "MlpPolicy",
        "total_timesteps": 5e5,
        "gamma": 0.99,
        "env_id":"f110_gym:f110rl-v0",
        "use_lidar":True,
        "action_pen":0.01,
        "params":params,
        "output_reg":np.diag([0.0, 0.0]) # steer action, throttle action
    }
```


These are the post-training results.

```
Eval num_timesteps=500000, episode_reward=-1.00 +/- 0.00                                                                                             
Episode length: 1.00 +/- 0.00                                                                                                                        
---------------------------------                                                                                                                    
| eval/              |          |                                                                                                                    
|    mean_ep_length  | 1        |                                                                                                                    
|    mean_reward     | -1       |                                                                                                                    
| time/              |          |                                                                                                                    
|    total timesteps | 500000   |                                                                                                                    
| train/             |          |                                                                                                                    
|    actor_loss      | 1.04     |                                                                                                                    
|    critic_loss     | 1.85e-07 |                                                                                                                    
|    ent_coef        | 3.52e-07 |                                                                                                                    
|    ent_coef_loss   | 3.9      |                                                                                                                    
|    learning_rate   | 0.0003   |                                         
|    n_updates       | 499899   |                                         
---------------------------------                                         
---------------------------------                                         
| rollout/           |          |                                         
|    ep_len_mean     | 1        |                                         
|    ep_rew_mean     | -1.04    |                                         
| time/              |          |                                         
|    episodes        | 500000   |                                         
|    fps             | 166      |                                         
|    time_elapsed    | 3004     |                                         
|    total timesteps | 500000   |                                         
---------------------------------   

```
Thank you!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issues training Trajcetory Conditioned SAC policy #3

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issues training Trajcetory Conditioned SAC policy #3

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions