建議
- 不要用機器學習,神經網路,直接用手寫固定策略來解決
- 記得先了解 Observation 與 Action ,再開始寫程式
參考
- https://gymnasium.farama.org/environments/classic_control/cart_pole/
- cartpole_human_run.py
import gymnasium as gym
env = gym.make("CartPole-v1", render_mode="human") # 若改用這個,會畫圖
# env = gym.make("CartPole-v1", render_mode="rgb_array")
observation, info = env.reset(seed=42)
for _ in range(100):
env.render()
action = env.action_space.sample() # 把這裡改成你的公式,看看能撐多久
observation, reward, terminated, truncated, info = env.step(action)
print('observation=', observation)
if terminated or truncated: # 這裡要加入程式,紀錄你每次撐多久
observation, info = env.reset()
print('done')
env.close()