feat(rl): Gymnasium QuadcopterEnv with modular action/obs/reward/task system by thanhndv212 · Pull Request #5 · thanhndv212/drones-sim

thanhndv212 · 2026-06-21T11:54:38Z

M2a — RL Gymnasium Environment

New package `drones_sim.rl` — fully modular RL environment

Files:

Module	Purpose
`env.py`	`QuadcopterEnv(gym.Env)` — wraps `QuadcopterDynamics`
`actions.py`	`MotorSpeedAction`, `ThrustBodyRatesAction` (default)
`observations.py`	`RelativeStateObs` (17-D)
`reward.py`	`RewardConfig` + `reward()` — dense pos + sparse reach
`tasks.py`	`HoverTask`, `WaypointTask`, `TrackingTask`

Design highlights:

All components injected via constructor — fully swappable
Motor lag pre-fill in reset() prevents first-step altitude dip crash
Graceful import fallback when gymnasium not installed (pip install gymnasium)
Seed handling: seeds both global np.random (SensorNoiseModel compat) and isolated default_rng
Crash detection: ground contact, excessive tilt (>75°), out-of-bounds (>50m)

Tests — 5 new:

gymnasium Env contract (shapes, dtypes, finiteness across 100 steps)
Reward bounds O(±20) at near-hover
Hover baseline no-crash (ThrustBodyRatesAction, 200 steps)
Deterministic seed (obs + trajectory)
All @pytest.mark.skipif when gymnasium absent

Dependency: gymnasium is NOT required for the core package — added as a test-time dep via pip install gymnasium.

Next: PR6 — training scripts + smoke-train (PPO) 🚀

… system Adds drones_sim.rl package — a standard gymnasium.Env wrapper around QuadcopterDynamics. The design is fully modular: Actions (drones_sim/rl/actions.py) - MotorSpeedAction — raw 4-motor speeds [0, 4000] rad/s - ThrustBodyRatesAction — [thrust_N, wx, wy, wz] (recommended default) Observations (drones_sim/rl/observations.py) - RelativeStateObs — 17-D: [pos_err, vel, quat, omega, prev_action] Reward (drones_sim/rl/reward.py) - RewardConfig dataclass with per-term weights - reward() function — dense pos tracking + sparse reach bonus Tasks (drones_sim/rl/tasks.py) - HoverTask — fixed target position - WaypointTask — sequential waypoints with auto-advance - TrackingTask — trajectory reference follower Env (drones_sim/rl/env.py) - QuadcopterEnv(gym.Env) — reset/step/render/close - Motor lag pre-fill in reset() to prevent first-step altitude dip - Crash detection: ground contact, excessive tilt, out-of-bounds - Optional viser live rendering Tests (5 new in tests/test_rl_env.py) - gymnasium Env contract (shapes, dtypes, finiteness) - Reward bounds at near-hover (±20) - Hover baseline no-crash (ThrustBodyRatesAction) - Determinism with seed (obs + trajectory) - All tests skipped gracefully if gymnasium not installed Dependency: gymnasium is an optional extra (pip install gymnasium). The package imports gracefully fall back if gymnasium is absent.

thanhndv212 merged commit b65fa29 into master Jun 21, 2026
4 checks passed

thanhndv212 deleted the feat/rl-env branch June 21, 2026 11:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rl): Gymnasium QuadcopterEnv with modular action/obs/reward/task system#5

feat(rl): Gymnasium QuadcopterEnv with modular action/obs/reward/task system#5
thanhndv212 merged 1 commit into
masterfrom
feat/rl-env

thanhndv212 commented Jun 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thanhndv212 commented Jun 21, 2026

M2a — RL Gymnasium Environment

New package drones_sim.rl — fully modular RL environment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New package `drones_sim.rl` — fully modular RL environment