NPC Maze Solver with Reinforcement Learning

This project demonstrates how a Non-Player Character (NPC) can learn to find the shortest path through a randomly generated maze using the Q-learning reinforcement learning algorithm. The maze and the learning process are visualized in a web browser using HTML, CSS, and JavaScript.

How to Run the Project

Save the Files: Ensure you have saved the provided code into three separate files in the same directory:
- index.html (for the HTML structure)
- style.css (for the CSS styling)
- script.js (for the JavaScript logic)
Open in Browser: Simply open the index.html file in any modern web browser (Chrome, Firefox, Safari, etc.).
Start Learning: Once the page is loaded, you will see a randomly generated 10x10 maze with a 'S' (Start) and 'F' (Finish) marker. Click the "Start Learning" button below the maze.
Observe the Learning Process:
- The NPC (represented by a yellow circle) will start exploring the maze.
- The "Episode" counter will increment from 0 to 100, indicating the number of training iterations.
- "Total Reward (Episode)" shows the reward obtained by the NPC in the current episode.
- "Langkah (Episode)" displays the number of steps taken by the NPC in the current episode.
- "Epsilon" shows the current exploration-exploitation rate. It will decrease over time, encouraging the NPC to exploit its learned knowledge.
- "Rata-rata Reward (50 Episode Terakhir)" and "Rata-rata Langkah (50 Episode Terakhir)" display the average reward and steps over the last 50 training episodes, giving an indication of learning progress.
View the Best Path: After 100 training episodes, the "Start Learning" button will change to "Learning Finished". The maze will then display the best path found by the NPC in yellow. Information about the best episode (based on the highest total reward achieved) will also be shown below the statistics.

How the Project Works

Maze Generation: At the start of each learning session, a 10x10 maze with random obstacles is generated. The 'S' marks the starting position (0, 0), and 'F' marks the goal position (9, 9).
Q-Learning Algorithm: The NPC learns to navigate the maze using the Q-learning algorithm, a type of reinforcement learning:
- States: The NPC's position (row and column) in the maze.
- Actions: The possible movements the NPC can take (up, down, left, right).
- Rewards: The NPC receives rewards for its actions:
  - A small negative reward (-0.1) for each step taken to encourage finding the shortest path.
  - A large positive reward (+10) for reaching the goal ('F').
  - A negative reward (-1) for hitting a wall ('#') or making an invalid move.
- Q-Table: The NPC maintains a Q-table, which stores the expected future reward for taking a specific action from a specific state. Initially, all Q-values are zero.
- Exploration vs. Exploitation (Epsilon-Greedy): During training, the NPC uses an epsilon-greedy strategy. With a probability of epsilon, it explores the maze by taking random actions. With a probability of 1-epsilon, it exploits its learned knowledge by choosing the action with the highest Q-value for the current state. Epsilon decreases over time to favor exploitation as the NPC learns.
- Q-Value Updates: After each action, the Q-value for the previous state-action pair is updated based on the reward received and the maximum Q-value of the new state.
Learning Process:
- The NPC goes through 100 episodes of exploring the maze.
- In each episode, the NPC starts at 'S' and tries to reach 'F'.
- It takes actions, receives rewards, and updates its Q-table.
- Over time, the Q-values for actions that lead to the goal will increase, and the NPC will learn the optimal policy (strategy) to reach the goal quickly.
Finding the Best Path: After the training is complete, the findBestPath() function uses the learned Q-table to determine the shortest path from 'S' to 'F' by always choosing the action with the highest Q-value at each step.
Visualization: The HTML, CSS, and JavaScript work together to display the maze and the NPC's movement during training. The best path found after training is highlighted in yellow. The statistics displayed provide insights into the learning progress of the NPC.

example visualisation :

WhatsApp.Video.2025-06-04.at.07.35.16_cb228eff.mp4

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
README.md		README.md
index.html		index.html
script.js		script.js
style.css		style.css

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NPC Maze Solver with Reinforcement Learning

How to Run the Project

How the Project Works

About

Uh oh!

Releases

Packages

Languages

Janlearns/NPC_maze_project

Folders and files

Latest commit

History

Repository files navigation

NPC Maze Solver with Reinforcement Learning

How to Run the Project

How the Project Works

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages