Improving Reward Function #75

NishantChandna1403 · 2025-02-24T18:38:39Z

Improving Reward Function, Adding negative reward for yaw rates to prevent high yaw while training.

Negative Reward For High Yaw rate, To prevent high yaw while training

jjshoots · 2025-02-27T14:50:31Z

LGTM, could you make a version bump of the environments here?
Also, updating from master to fix CI. :)

Thanks so much for the help!

jjshoots · 2025-02-28T02:22:36Z

I see you opened a number of other PRs. I don't think that's necessary, you can place all the changes in this one PR if you'd like.

NishantChandna1403 · 2025-02-28T08:04:51Z

Hey, I am sorry for opening number of PRs, I am not quite experienced with repos, I am still not clear with what it means to make version bumps of environments, Does it mean to create a new environment version to avoid conflicts with previous environment?

NishantChandna1403 · 2025-02-28T08:37:53Z

I have made the updates in the same PR. please let me know if any changes are needed

jjshoots

Thanks for the changes, a few more minor changes, and I think we are good to go.

In particular:

The comments that I have left, essentially, we only want to add continuous reward for the non-sparse reward setting.
Could you update tests to use the version bumped environments here?
Could you fix precommit by doing pip3 install -e .[dev] && pre-commit run --all-files?

jjshoots · 2025-02-28T10:29:40Z

PyFlyt/gym_envs/quadx_envs/quadx_waypoints_env.py

+
+        #Negative Reward For High Yaw rate, To prevent high yaw while training
+        yaw_rate = abs(self.env.state(0)[0][2])  # Assuming z-axis is the last component
+        yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
+        self.reward -= yaw_rate_penalty  # You can adjust the coefficient (0.01) as needed
+


Could you move this into the above if not self.sparse_reward conditional?

jjshoots · 2025-02-28T10:29:51Z

PyFlyt/gym_envs/quadx_envs/quadx_pole_waypoints_env.py

+
+        #Negative Reward For High Yaw rate, To prevent high yaw while training
+        yaw_rate = abs(self.env.state(0)[0][2])  # Assuming z-axis is the last component
+        yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
+        self.reward -= yaw_rate_penalty  # You can adjust the coefficient (0.01) as needed
+


Could you move this into the below if not self.sparse_reward conditional?

jjshoots · 2025-02-28T10:30:01Z

PyFlyt/gym_envs/quadx_envs/quadx_pole_balance_env.py

+
+        #Negative Reward For High Yaw rate, To prevent high yaw while training
+        yaw_rate = abs(self.env.state(0)[0][2])  # Assuming z-axis is the last component
+        yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
+        self.reward -= yaw_rate_penalty  # You can adjust the coefficient (0.01) as needed
+


Could you move this into the above if not self.sparse_reward conditional?

jjshoots · 2025-02-28T10:30:11Z

PyFlyt/gym_envs/quadx_envs/quadx_hover_env.py

+        #Negative Reward For High Yaw rate, To prevent high yaw while training
+        yaw_rate = abs(self.env.state(0)[0][2])  # Assuming z-axis is the last component
+        yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
+        self.reward -= yaw_rate_penalty  # You can adjust the coefficient (0.01) as needed
+


Could you move this into the below if not self.sparse_reward conditional?

jjshoots · 2025-02-28T10:30:27Z

PyFlyt/gym_envs/quadx_envs/quadx_ball_in_cup_env.py

+        #Negative Reward For High Yaw rate, To prevent high yaw while training
+        yaw_rate = abs(self.env.state(0)[0][2])  # Assuming z-axis is the last component
+        yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
+        self.reward -= yaw_rate_penalty  # You can adjust the coefficient (0.01) as needed
+


Could you move this into the below if not self.sparse_reward conditional?

NishantChandna1403 · 2025-02-28T11:31:25Z

I have updated the required steps, kindly check

jjshoots · 2025-02-28T17:40:53Z

LGTM, will merge in the morning for me when the CI passes. :)

jjshoots · 2025-02-28T17:41:09Z

Expect a version bump to pypi in ~16 hours

NishantChandna1403 · 2025-02-28T21:10:05Z

Seems like its still failing due to comments, let me fix that,

NishantChandna1403 · 2025-02-28T21:21:41Z

There were minor issues with extra spaces while editing, I've fixed them.

NishantChandna1403 · 2025-03-01T06:20:39Z

I was facing issues with pre-commit, I have fixed the problems, now there wont be an issue

jjshoots · 2025-03-01T07:52:38Z

That's ok, I can fix it later. :)

Rest assured, this will be merged soon.

Thanks for the contribution!

NishantChandna1403 · 2025-03-01T11:27:37Z

Alright,
Thanks !

NishantChandna1403 added 5 commits February 25, 2025 00:04

Update quadx_ball_in_cup_env.py

cf326f8

Negative Reward For High Yaw rate, To prevent high yaw while training

Update quadx_pole_balance_env.py

d9c82ae

Negative Reward For High Yaw rate, To prevent high yaw while training

Update quadx_hover_env.py

2fa5197

Negative Reward For High Yaw rate, To prevent high yaw while training

Update quadx_pole_waypoints_env.py

a7eca1e

Negative Reward For High Yaw rate, To prevent high yaw while training

Update quadx_waypoints_env.py

17fd3c4

Negative Reward For High Yaw rate, To prevent high yaw while training

NishantChandna1403 added 2 commits February 28, 2025 14:01

Merge branch 'jjshoots:master' into master

0359cb4

Update __init__.py

7c62f8b

Update quadx_pole_balance_env.py

8a3a1be

jjshoots requested changes Feb 28, 2025

View reviewed changes

NishantChandna1403 added 6 commits February 28, 2025 16:15

Update quadx_waypoints_env.py

f070f13

Update quadx_pole_waypoints_env.py

9fc29e7

Update quadx_pole_balance_env.py

587c36c

Update quadx_hover_env.py

b8e7081

Update quadx_ball_in_cup_env.py

63b2f56

Update test_gym_envs.py

0631720

NishantChandna1403 added 5 commits February 28, 2025 23:16

Update quadx_ball_in_cup_env.py

3093415

Update quadx_hover_env.py

7024da6

Update quadx_pole_balance_env.py

008d7e4

Update quadx_pole_waypoints_env.py

57b5147

Update quadx_waypoints_env.py

9ac57e7

NishantChandna1403 added 2 commits March 1, 2025 02:49

Update quadx_waypoints_env.py

fb0976a

Update quadx_pole_waypoints_env.py

c072a2e

NishantChandna1403 requested a review from jjshoots February 28, 2025 21:20

chore: apply pre-commit fixes

46377fc

jjshoots merged commit 552fa42 into jjshoots:master Mar 1, 2025
7 checks passed

Improving Reward Function #75

Improving Reward Function #75

Uh oh!

Conversation

NishantChandna1403 commented Feb 24, 2025

Uh oh!

jjshoots commented Feb 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jjshoots commented Feb 28, 2025

Uh oh!

NishantChandna1403 commented Feb 28, 2025

Uh oh!

NishantChandna1403 commented Feb 28, 2025

Uh oh!

jjshoots left a comment

Choose a reason for hiding this comment

Uh oh!

jjshoots Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jjshoots Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jjshoots Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jjshoots Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

jjshoots Feb 28, 2025

Choose a reason for hiding this comment

Uh oh!

NishantChandna1403 commented Feb 28, 2025

Uh oh!

jjshoots commented Feb 28, 2025

Uh oh!

jjshoots commented Feb 28, 2025

Uh oh!

NishantChandna1403 commented Feb 28, 2025

Uh oh!

NishantChandna1403 commented Feb 28, 2025

Uh oh!

NishantChandna1403 commented Mar 1, 2025

Uh oh!

jjshoots commented Mar 1, 2025

Uh oh!

NishantChandna1403 commented Mar 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jjshoots commented Feb 27, 2025 •

edited

Loading