Project Submitted By - Rishit Aggarwal (23BAI10329) Lakshya Mangla (23BAI10814) Tanmay Singh (23BAI10328) by Rishitagg1 · Pull Request #110 · MK-25source/ReRLCode-_26

Rishitagg1 · 2026-04-08T17:48:56Z

DDPG Robot path planning code

This project focuses on designing an intelligent robot navigation system using Reinforcement Learning (RL) in a continuous environment. The objective is to enable a robot to learn the optimal path from a starting position to a target location while avoiding obstacles.

The problem is modeled as a Markov Decision Process (MDP), where the robot acts as an agent that interacts with its environment by taking actions and receiving rewards. Unlike traditional path-planning algorithms, this approach allows the robot to learn through experience rather than relying on predefined rules.

To handle continuous state and action spaces, the project uses the Deep Deterministic Policy Gradient (DDPG) algorithm. DDPG is an advanced Actor-Critic method that combines deep learning with reinforcement learning to produce efficient and stable policies in continuous domains.

Copilot

Pull request overview

Adds a beginner-friendly, single-file DDPG example for continuous-space robot navigation and documents how to run it.

Changes:

Added ddpg_robot.py implementing a minimal continuous 2D environment + DDPG agent training loop.
Added README.md describing the environment, prerequisites, and usage.

Reviewed changes

Copilot reviewed 1 out of 2 changed files in this pull request and generated 7 comments.

File	Description
README.md	Introduces project overview, environment details, and run instructions.
ddpg_robot.py	Implements RobotEnv, replay buffer, actor/critic networks, DDPG training loop.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-08T17:51:08Z

+        self.target_actor = Actor(state_dim, action_dim)
+        self.target_critic = Critic(state_dim, action_dim)
+


Target networks are instantiated but never initialized with the online actor/critic weights. This makes the TD target start from unrelated random networks and can destabilize learning. After creating target_actor/target_critic, copy the actor/critic state into them (hard update) before training begins.

Copilot · 2026-04-08T17:51:08Z

+        # Actor loss
+        actor_loss = -self.critic(s, self.actor(s)).mean()
+
+        self.actor_opt.zero_grad()
+        actor_loss.backward()
+        self.actor_opt.step()


Actor update backpropagates through the critic parameters as well, even though the critic optimizer step isn’t run here. This adds unnecessary compute/memory and leaves critic grads populated until the next critic update. Freeze critic parameters (or temporarily disable requires_grad) during the actor loss backward pass so gradients flow only through the action path.

Copilot · 2026-04-08T17:51:08Z

+    episodes = 200
+
+    for ep in range(episodes):
+        state = env.reset()
+        total_reward = 0
+
+        for _ in range(100):
+            action = agent.select_action(state)


The training loop hardcodes 100 steps per episode, duplicating env.max_steps. This can drift if max_steps is changed and makes the environment config less authoritative. Use env.max_steps in the episode loop to keep these consistent.

Copilot · 2026-04-08T17:51:09Z

+    def select_action(self, state):
+        state = torch.FloatTensor(state).unsqueeze(0)
+        action = self.actor(state).detach().numpy()[0]
+        return action


select_action performs inference without a torch.no_grad() context, which incurs autograd overhead even though the result is detached. Wrap the forward pass in torch.no_grad() (and optionally set the actor to eval mode during action selection) to reduce overhead.

Copilot · 2026-04-08T17:51:09Z

+    def __init__(self, capacity=10000):
+        self.buffer = []
+        self.capacity = capacity
+
+    def add(self, s, a, r, s2, d):
+        if len(self.buffer) >= self.capacity:
+            self.buffer.pop(0)
+        self.buffer.append((s, a, r, s2, d))
+


ReplayBuffer evicts items with list.pop(0), which is O(n) per eviction and will slow down once capacity is reached. Consider using collections.deque with a maxlen (or a ring buffer) to make appends/evictions O(1).

Copilot · 2026-04-08T17:51:09Z

+        # Critic loss
+        target_a = self.target_actor(s2)
+        target_q = self.target_critic(s2, target_a)
+        y = r + self.gamma * target_q * (1 - d)
+
+        critic_loss = nn.MSELoss()(self.critic(s, a), y.detach())


Target Q computation is part of the target value and doesn’t need gradients, but it currently builds a graph and then detaches later. Wrap target_actor/target_critic forward passes used for TD targets in torch.no_grad() to avoid unnecessary graph construction and reduce memory use.

Copilot · 2026-04-08T17:51:09Z

+        for _ in range(100):
+            action = agent.select_action(state)
+            next_state, reward, done = env.step(action)
+
+            agent.buffer.add(state, action, reward, next_state, done)
+            agent.train()


DDPG typically requires explicit exploration noise (e.g., Gaussian noise or OU noise) added to the selected action during training; without it, behavior can be nearly deterministic and learning may stall. Consider adding configurable action noise in the training loop while keeping evaluation noise-free.

Upload DDPG Robot path planning code

2fc141b

Copilot AI review requested due to automatic review settings April 8, 2026 17:48

Copilot started reviewing on behalf of Rishitagg1 April 8, 2026 17:49 View session

Rishitagg1 changed the title ~~DDPG Robot path planning code~~ Project Submitted By - Rishit Aggarwal (23BAI10329) Lakshya Mangla (23BAI10814) Tanmay Singh (23BAI10328) Apr 8, 2026

Copilot AI reviewed Apr 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Submitted By - Rishit Aggarwal (23BAI10329) Lakshya Mangla (23BAI10814) Tanmay Singh (23BAI10328)#110

Project Submitted By - Rishit Aggarwal (23BAI10329) Lakshya Mangla (23BAI10814) Tanmay Singh (23BAI10328)#110
Rishitagg1 wants to merge 1 commit into
MK-25source:mainfrom
Rishitagg1:main

Rishitagg1 commented Apr 8, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Copilot AI Apr 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		self.target_actor = Actor(state_dim, action_dim)
		self.target_critic = Critic(state_dim, action_dim)

Conversation

Rishitagg1 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Rishitagg1 commented Apr 8, 2026 •

edited

Loading