Skip to content

Conversation

@NishantChandna1403
Copy link
Contributor

Improving Reward Function, Adding negative reward for yaw rates to prevent high yaw while training.

Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
@jjshoots
Copy link
Owner

jjshoots commented Feb 27, 2025

LGTM, could you make a version bump of the environments here?
Also, updating from master to fix CI. :)

Thanks so much for the help!

@jjshoots
Copy link
Owner

I see you opened a number of other PRs. I don't think that's necessary, you can place all the changes in this one PR if you'd like.

@NishantChandna1403
Copy link
Contributor Author

Hey, I am sorry for opening number of PRs, I am not quite experienced with repos, I am still not clear with what it means to make version bumps of environments, Does it mean to create a new environment version to avoid conflicts with previous environment?

@NishantChandna1403
Copy link
Contributor Author

I have made the updates in the same PR. please let me know if any changes are needed

Copy link
Owner

@jjshoots jjshoots left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes, a few more minor changes, and I think we are good to go.

In particular:

  1. The comments that I have left, essentially, we only want to add continuous reward for the non-sparse reward setting.
  2. Could you update tests to use the version bumped environments here?
  3. Could you fix precommit by doing pip3 install -e .[dev] && pre-commit run --all-files?

Comment on lines 186 to 191

#Negative Reward For High Yaw rate, To prevent high yaw while training
yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component
yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the above if not self.sparse_reward conditional?

Comment on lines 228 to 233

#Negative Reward For High Yaw rate, To prevent high yaw while training
yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component
yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the below if not self.sparse_reward conditional?

Comment on lines 190 to 195

#Negative Reward For High Yaw rate, To prevent high yaw while training
yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component
yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the above if not self.sparse_reward conditional?

Comment on lines 121 to 125
#Negative Reward For High Yaw rate, To prevent high yaw while training
yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component
yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the below if not self.sparse_reward conditional?

Comment on lines 259 to 263
#Negative Reward For High Yaw rate, To prevent high yaw while training
yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component
yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate
self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move this into the below if not self.sparse_reward conditional?

@NishantChandna1403
Copy link
Contributor Author

I have updated the required steps, kindly check

@jjshoots
Copy link
Owner

LGTM, will merge in the morning for me when the CI passes. :)

@jjshoots
Copy link
Owner

Expect a version bump to pypi in ~16 hours

@NishantChandna1403
Copy link
Contributor Author

Seems like its still failing due to comments, let me fix that,

@NishantChandna1403
Copy link
Contributor Author

There were minor issues with extra spaces while editing, I've fixed them.

@NishantChandna1403
Copy link
Contributor Author

I was facing issues with pre-commit, I have fixed the problems, now there wont be an issue

@jjshoots
Copy link
Owner

jjshoots commented Mar 1, 2025

That's ok, I can fix it later. :)

Rest assured, this will be merged soon.

Thanks for the contribution!

@NishantChandna1403
Copy link
Contributor Author

Alright,
Thanks !

@jjshoots jjshoots merged commit 552fa42 into jjshoots:master Mar 1, 2025
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants