A reinforcement learning agent that learns to control a 4-way traffic intersection using tabular Q-learning. The agent significantly outperforms a naive fixed-timer controller by dynamically responding to real-time queue imbalances.
Traditional traffic lights switch on fixed cycles regardless of actual traffic conditions. This project trains a Q-learning agent to adaptively control signal phases at a single 4-way intersection, minimising total vehicle waiting time.
| Controller | Strategy |
|---|---|
| Fixed Timer | Switches phase every 10 steps |
| RL Agent | Learns optimal switching policy |
The intersection has 4 queues (North, South, East, West). At each time step:
- The agent picks an action: 0 = N/S green or 1 = E/W green
- Up to 10 cars per active direction clear the intersection
- New cars arrive via a Poisson process (λ=2 per direction)
- A pressure-based reward is computed
reward = (cars_cleared × 3)
- (total_queue_length × 1.5)
- (|NS_queue − EW_queue| × 2)
This incentivises throughput, penalises congestion, and reduces directional imbalance.
To reflect real-world constraints, the agent cannot switch phase until at least 5 consecutive steps have elapsed under the current phase.
smart-traffic-control-rl/
├── environment.py # Intersection simulation & reward function
├── agent.py # Q-learning agent (select, update, save/load)
├── train.py # Training entry point
├── evaluate.py # Evaluation & Fixed vs RL comparison
├── inspect_qtable.py # Visualise & export the saved Q-table
├── test_environment.py # Unit tests (pytest)
├── results/ # Saved Q-tables & plots (gitignored)
├── requirements.txt
├── .gitignore
└── README.md
# 1. Clone the repository
git clone https://github.com/<your-username>/smart-traffic-control-rl.git
cd smart-traffic-control-rl
# 2. Create a virtual environment (recommended)
python3 -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# 3. Install dependencies
pip3 install -r requirements.txtpython3 train.pyOptional flags:
| Flag | Default | Description |
|---|---|---|
--episodes |
2500 |
Number of training episodes |
--max-steps |
80 |
Steps per episode |
--output |
results/q_table.npy |
Path to save the Q-table |
--no-plot |
(off) | Skip saving learning curve |
# Example: longer training run
python3 train.py --episodes 5000 --max-steps 120python3 evaluate.pyOptional flags:
| Flag | Default | Description |
|---|---|---|
--q-table |
results/q_table.npy |
Path to a saved Q-table |
--trials |
300 |
Number of evaluation episodes |
--max-steps |
80 |
Steps per episode |
--no-plot |
(off) | Skip saving bar chart |
The Q-table is saved as a .npy binary file. To view it in a human-readable format:
python3 inspect_qtable.pyThis generates 3 files in results/:
| File | Description |
|---|---|
q_table.csv |
All Q-values as a CSV — open in Excel or Google Sheets |
q_table_actions.png |
Bar chart of preferred actions across all states |
q_table_heatmap.png |
Heatmap of max Q-values across queue states |
It also prints a summary in the terminal:
========== Q-TABLE SUMMARY ==========
Shape : (8, 8, 8, 8, 2)
Total states : 4096
Max Q-value : 4.4330
Min Q-value : -301.8664
Mean Q-value : -11.9761
======================================
pytest test_environment.py -vAfter 2500 training episodes the RL agent consistently outperforms the fixed-timer baseline:
| Metric | Fixed Timer | RL Agent |
|---|---|---|
| Avg waiting cost | 6548.35 (±458.62) | 3493.34 (±315.20) |
| Improvement | — | 46.65% |
Generated plots are saved to results/:
learning_curve.png— reward vs episode (50-episode moving average)comparison.png— bar chart: Fixed vs RL waiting costq_table_actions.png— preferred action distributionq_table_heatmap.png— Q-value heatmap across states
| Component | Value / Method |
|---|---|
| Algorithm | Tabular Q-learning |
| State space | 4 queues × 8 bins each = 8⁴ = 4,096 states |
| Action space | 2 (N/S green, E/W green) |
| Learning rate α | 0.1 |
| Discount factor γ | 0.95 |
| Exploration | ε-greedy, ε: 1.0 → 0.05 (decay 0.993) |
| Min green time | 5 steps |
| Arrivals | Poisson (λ=2 per direction) |
| Max queue depth | 30 cars per direction |