feat(rl): PPO training pipeline with smoke-train verification by thanhndv212 · Pull Request #6 · thanhndv212/drones-sim

thanhndv212 · 2026-06-21T11:58:34Z

M2b — RL Training Pipeline (COMPLETES M2)

New: `training/` package

File	Purpose
`train_ppo.py`	SB3 PPO entry point (CLI, SubprocVecEnv, VecNormalize)
`eval_policy.py`	Deterministic evaluation → `pos_rmse`, `success_rate`, `crash_rate`, `mean_reward`
`configs/ppo_hover.yaml`	YAML hyperparameters

New examples

examples/07_rl_hover.py — end-to-end: train PPO 50k steps, eval, print RMSE
examples/08_rl_vs_pid.py — head-to-head comparison vs cascaded PID on circular trajectory

pyproject.toml — optional RL extras

[project.optional-dependencies]
rl  = ["torch>=2.2", "stable-baselines3>=2.3", "gymnasium>=0.29", "tensorboard>=2.15", "pyyaml>=6.0"]
rl-dev = ["drones-sim[rl]", "wandb", "optuna>=3.5", "moviepy"]

Smoke-train verified ✅

50,000 timesteps in 25 s at ~2,000 fps (CPU, 2 envs)
Model checkpoint saved to training/checkpoints/final.zip
python -m training.eval_policy --path training/checkpoints/final.zip loads + evaluates
Pipeline: train → save → load → eval — verified end-to-end

Tests — 2 new in `tests/test_rl_training.py`

Smoke-train 1000 steps + save/reload/predict
Self-contained eval metrics test (tiny model → metrics dict)
All guarded by @pytest.mark.skipif(not _HAS_SB3)

.gitignore

Added training/checkpoints/ and tb/ (TensorBoard logs)

============================== 92 passed ==============================

M1 + M2 — COMPLETE 🎉

6 PRs, 35+ files, 2,000+ lines of code, 92 tests passing:

✅ Quaternion dynamics plant (no gimbal lock)
✅ CI workflow (ruff + pytest, Python 3.10–3.12)
✅ Disturbance framework (wind, ground effect, payload drop, motor failure)
✅ Telemetry logging (CSV + JSON Lines)
✅ RL Gymnasium env (modular action/obs/reward/task)
✅ RL training pipeline (PPO smoke-trained, verified)

Adds a complete RL training stack built on Stable-Baselines3 + PyTorch: training/ train_ppo.py — SB3 PPO CLI entry point (SubprocVecEnv + VecNormalize) eval_policy.py — load checkpoint → deterministic evaluation metrics configs/ppo_hover.yaml — YAML hyperparameter config examples/ 07_rl_hover.py — end-to-end: train PPO 50k steps, eval, print RMSE 08_rl_vs_pid.py — head-to-head comparison vs cascaded PID on circular trajectory pyproject.toml [project.optional-dependencies] rl = [torch, stable-baselines3, gymnasium, tensorboard, pyyaml] rl-dev = [..., wandb, optuna, moviepy] .gitignore: added training/checkpoints/ and tb/ (TensorBoard logs) Smoke-train results: 50,000 timesteps, 2 envs, ~2000 fps, 25 s wall-clock Model saves + loads successfully via eval_policy Tests (tests/test_rl_training.py): Smoke-train 1000 steps + save/reload/predict Self-contained eval_policy test (tiny model → metrics dict) All guarded by @pytest.mark.skipif(not _HAS_SB3) Full suite: 92 tests passed (ruff clean).

thanhndv212 merged commit 0c88ec7 into master Jun 21, 2026
4 checks passed

thanhndv212 deleted the feat/rl-training branch June 21, 2026 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rl): PPO training pipeline with smoke-train verification#6

feat(rl): PPO training pipeline with smoke-train verification#6
thanhndv212 merged 1 commit into
masterfrom
feat/rl-training

thanhndv212 commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thanhndv212 commented Jun 21, 2026

M2b — RL Training Pipeline (COMPLETES M2)

New: training/ package

New examples

pyproject.toml — optional RL extras

Smoke-train verified ✅

Tests — 2 new in tests/test_rl_training.py

.gitignore

M1 + M2 — COMPLETE 🎉

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New: `training/` package

Tests — 2 new in `tests/test_rl_training.py`