-
Notifications
You must be signed in to change notification settings - Fork 40
Improving Reward Function #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
Negative Reward For High Yaw rate, To prevent high yaw while training
|
LGTM, could you make a version bump of the environments here? Thanks so much for the help! |
|
I see you opened a number of other PRs. I don't think that's necessary, you can place all the changes in this one PR if you'd like. |
|
Hey, I am sorry for opening number of PRs, I am not quite experienced with repos, I am still not clear with what it means to make version bumps of environments, Does it mean to create a new environment version to avoid conflicts with previous environment? |
|
I have made the updates in the same PR. please let me know if any changes are needed |
jjshoots
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes, a few more minor changes, and I think we are good to go.
In particular:
- The comments that I have left, essentially, we only want to add continuous reward for the non-sparse reward setting.
- Could you update tests to use the version bumped environments here?
- Could you fix precommit by doing
pip3 install -e .[dev] && pre-commit run --all-files?
|
|
||
| #Negative Reward For High Yaw rate, To prevent high yaw while training | ||
| yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component | ||
| yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate | ||
| self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this into the above if not self.sparse_reward conditional?
|
|
||
| #Negative Reward For High Yaw rate, To prevent high yaw while training | ||
| yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component | ||
| yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate | ||
| self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this into the below if not self.sparse_reward conditional?
|
|
||
| #Negative Reward For High Yaw rate, To prevent high yaw while training | ||
| yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component | ||
| yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate | ||
| self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this into the above if not self.sparse_reward conditional?
| #Negative Reward For High Yaw rate, To prevent high yaw while training | ||
| yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component | ||
| yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate | ||
| self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this into the below if not self.sparse_reward conditional?
| #Negative Reward For High Yaw rate, To prevent high yaw while training | ||
| yaw_rate = abs(self.env.state(0)[0][2]) # Assuming z-axis is the last component | ||
| yaw_rate_penalty = 0.01 * yaw_rate**2# Add penalty for high yaw rate | ||
| self.reward -= yaw_rate_penalty # You can adjust the coefficient (0.01) as needed | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you move this into the below if not self.sparse_reward conditional?
|
I have updated the required steps, kindly check |
|
LGTM, will merge in the morning for me when the CI passes. :) |
|
Expect a version bump to pypi in ~16 hours |
|
Seems like its still failing due to comments, let me fix that, |
|
There were minor issues with extra spaces while editing, I've fixed them. |
|
I was facing issues with pre-commit, I have fixed the problems, now there wont be an issue |
|
That's ok, I can fix it later. :) Rest assured, this will be merged soon. Thanks for the contribution! |
|
Alright, |
Improving Reward Function, Adding negative reward for yaw rates to prevent high yaw while training.