Skip to content

feat(rl): Gymnasium QuadcopterEnv with modular action/obs/reward/task system#5

Merged
thanhndv212 merged 1 commit into
masterfrom
feat/rl-env
Jun 21, 2026
Merged

feat(rl): Gymnasium QuadcopterEnv with modular action/obs/reward/task system#5
thanhndv212 merged 1 commit into
masterfrom
feat/rl-env

Conversation

@thanhndv212

Copy link
Copy Markdown
Owner

M2a — RL Gymnasium Environment

New package drones_sim.rl — fully modular RL environment

Files:

Module Purpose
env.py QuadcopterEnv(gym.Env) — wraps QuadcopterDynamics
actions.py MotorSpeedAction, ThrustBodyRatesAction (default)
observations.py RelativeStateObs (17-D)
reward.py RewardConfig + reward() — dense pos + sparse reach
tasks.py HoverTask, WaypointTask, TrackingTask

Design highlights:

  • All components injected via constructor — fully swappable
  • Motor lag pre-fill in reset() prevents first-step altitude dip crash
  • Graceful import fallback when gymnasium not installed (pip install gymnasium)
  • Seed handling: seeds both global np.random (SensorNoiseModel compat) and isolated default_rng
  • Crash detection: ground contact, excessive tilt (>75°), out-of-bounds (>50m)

Tests — 5 new:

  • gymnasium Env contract (shapes, dtypes, finiteness across 100 steps)
  • Reward bounds O(±20) at near-hover
  • Hover baseline no-crash (ThrustBodyRatesAction, 200 steps)
  • Deterministic seed (obs + trajectory)
  • All @pytest.mark.skipif when gymnasium absent

Dependency: gymnasium is NOT required for the core package — added as a test-time dep via pip install gymnasium.

Next: PR6 — training scripts + smoke-train (PPO) 🚀

… system

Adds drones_sim.rl package — a standard gymnasium.Env wrapper around
QuadcopterDynamics.  The design is fully modular:

Actions (drones_sim/rl/actions.py)
  - MotorSpeedAction         — raw 4-motor speeds [0, 4000] rad/s
  - ThrustBodyRatesAction    — [thrust_N, wx, wy, wz] (recommended default)

Observations (drones_sim/rl/observations.py)
  - RelativeStateObs   — 17-D: [pos_err, vel, quat, omega, prev_action]

Reward (drones_sim/rl/reward.py)
  - RewardConfig dataclass with per-term weights
  - reward() function — dense pos tracking + sparse reach bonus

Tasks (drones_sim/rl/tasks.py)
  - HoverTask    — fixed target position
  - WaypointTask — sequential waypoints with auto-advance
  - TrackingTask — trajectory reference follower

Env (drones_sim/rl/env.py)
  - QuadcopterEnv(gym.Env) — reset/step/render/close
  - Motor lag pre-fill in reset() to prevent first-step altitude dip
  - Crash detection: ground contact, excessive tilt, out-of-bounds
  - Optional viser live rendering

Tests (5 new in tests/test_rl_env.py)
  - gymnasium Env contract (shapes, dtypes, finiteness)
  - Reward bounds at near-hover (±20)
  - Hover baseline no-crash (ThrustBodyRatesAction)
  - Determinism with seed (obs + trajectory)
  - All tests skipped gracefully if gymnasium not installed

Dependency: gymnasium is an optional extra (pip install gymnasium).
The package imports gracefully fall back if gymnasium is absent.
@thanhndv212 thanhndv212 merged commit b65fa29 into master Jun 21, 2026
4 checks passed
@thanhndv212 thanhndv212 deleted the feat/rl-env branch June 21, 2026 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant