Skip to content

Issues training Trajcetory Conditioned SAC policy #3

@nkepling

Description

@nkepling

I am having trouble getting the Trajectory conditioned SAC policy to learn. I am using "TC-Driver/TC_Driver/train/train.py" to train the policy with the parameters below. I essentially left all parameters to their default values and tried my best to cross-reference this paper . My mode is Frenet_trajectory, ep_len = 10000 , params_noise =True, use_trajectory=True and I trained for a total of 500,000 steps. I am also not doing a wandb param sweep over reward penalty and am using the heuristic values suggested in the paper.

Would it be possible point me to a correct config file for training hyperparameters or even some pretrained model weights?

    env_conf = {
        "mode": mode,
        "arch": arch,
        "map_name": map,
        "map": os.path.join(configs_dir, "{}".format(map)),
        "map_ext": conf.map_ext,
        "random_init": True,
        "sx": conf.sx,
        "sy": conf.sy,
        "stheta": conf.stheta,
        "num_agents": 1,
        "ep_len": ep_len,  # this sets the maximum episode length, it is ~1.5 times the best time for a lap, so it changes from track to track
        "obs_type": mode,
        "params_noise": params_noise,
        "var_mu": (0.075 / 2) ** 2,  # if all are set to 0 no noise is applied
        "var_Csf": 0,
        "var_Csr": 0,
        "redraw_upon_reset": True,
        "angle_limit": ang_deg * np.pi / 180,  # 30 deg
        "use_trajectory": use_trajectory,
        "max_vel": max_vel,
        "display_video": display_video, # TODO not used should be removed
        "curriculum": False, # no curriculum velocity for now
        "policy_type": "MlpPolicy",
        "total_timesteps": 5e5,
        "gamma": 0.99,
        "env_id":"f110_gym:f110rl-v0",
        "use_lidar":True,
        "action_pen":0.01,
        "params":params,
        "output_reg":np.diag([0.0, 0.0]) # steer action, throttle action
    }

These are the post-training results.

Eval num_timesteps=500000, episode_reward=-1.00 +/- 0.00                                                                                             
Episode length: 1.00 +/- 0.00                                                                                                                        
---------------------------------                                                                                                                    
| eval/              |          |                                                                                                                    
|    mean_ep_length  | 1        |                                                                                                                    
|    mean_reward     | -1       |                                                                                                                    
| time/              |          |                                                                                                                    
|    total timesteps | 500000   |                                                                                                                    
| train/             |          |                                                                                                                    
|    actor_loss      | 1.04     |                                                                                                                    
|    critic_loss     | 1.85e-07 |                                                                                                                    
|    ent_coef        | 3.52e-07 |                                                                                                                    
|    ent_coef_loss   | 3.9      |                                                                                                                    
|    learning_rate   | 0.0003   |                                         
|    n_updates       | 499899   |                                         
---------------------------------                                         
---------------------------------                                         
| rollout/           |          |                                         
|    ep_len_mean     | 1        |                                         
|    ep_rew_mean     | -1.04    |                                         
| time/              |          |                                         
|    episodes        | 500000   |                                         
|    fps             | 166      |                                         
|    time_elapsed    | 3004     |                                         
|    total timesteps | 500000   |                                         
---------------------------------   

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions