|
1 | | -# Project Navigation - Udacity Deep Reinforcement Learning |
| 1 | +[//]: # (Image References) |
| 2 | + |
| 3 | +[video_random]: https://github.com/vivekthota16/Project-Navigation-Udacity-Deep-Reinforcement-Learning/blob/master/Training-Results/random_agent.gif "Random Agent" |
| 4 | + |
| 5 | +[video_trained]: https://github.com/vivekthota16/Project-Navigation-Udacity-Deep-Reinforcement-Learning/blob/master/Training-Results/trained_agent.gif "Trained Agent" |
| 6 | + |
| 7 | +# Project 1: Navigation - Udacity Deep Reinforcement Learning |
| 8 | + |
| 9 | +### Introduction |
| 10 | + |
| 11 | +For this project, you will train an agent to navigate (and collect bananas!) in a large, square world. |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +| Random agent | Trained agent | |
| 16 | +:-------------------------:|:-------------------------: |
| 17 | +![Random Agent][video_random] | ![Trained Agent][video_trained] |
| 18 | + |
| 19 | +A reward of +1 is provided for collecting a yellow banana, and a reward of -1 is provided for collecting a blue banana. Thus, the goal of your agent is to collect as many yellow bananas as possible while avoiding blue bananas. |
| 20 | + |
| 21 | +The state space has 37 dimensions and contains the agent's velocity, along with ray-based perception of objects around agent's forward direction. Given this information, the agent has to learn how to best select actions. Four discrete actions are available, corresponding to: |
| 22 | +- **`0`** - move forward. |
| 23 | +- **`1`** - move backward. |
| 24 | +- **`2`** - turn left. |
| 25 | +- **`3`** - turn right. |
| 26 | + |
| 27 | +The task is episodic, and in order to solve the environment, your agent must get an average score of +13 over 100 consecutive episodes. |
| 28 | + |
| 29 | +### Getting Started |
| 30 | + |
| 31 | +1. Download the environment from one of the links below. You need only select the environment that matches your operating system: |
| 32 | + - Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux.zip) |
| 33 | + - Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana.app.zip) |
| 34 | + - Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86.zip) |
| 35 | + - Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Windows_x86_64.zip) |
| 36 | + |
| 37 | + (_For Windows users_) Check out [this link](https://support.microsoft.com/en-us/help/827218/how-to-determine-whether-a-computer-is-running-a-32-bit-version-or-64) if you need help with determining if your computer is running a 32-bit version or 64-bit version of the Windows operating system. |
| 38 | + |
| 39 | + (_For AWS_) If you'd like to train the agent on AWS (and have not [enabled a virtual screen](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-on-Amazon-Web-Service.md)), then please use [this link](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/Banana_Linux_NoVis.zip) to obtain the environment. |
| 40 | + |
| 41 | +2. Place the file in this folder, unzip (or decompress) the file and then write the correct path in the argument for creating the environment under the notebook `Navigation_solution.ipynb`: |
| 42 | + |
| 43 | +```python |
| 44 | +env = env = UnityEnvironment(file_name="Banana.app") |
| 45 | + |
| 46 | +``` |
| 47 | + |
| 48 | +### Description |
| 49 | + |
| 50 | +- `dqn_agent.py`: code for the agent used in the environment |
| 51 | +- `model.py`: code containing the Q-Network used as the function approximator by the agent |
| 52 | +- `dqn.pth`: saved model weights for the original DQN model |
| 53 | +- `ddqn.pth`: saved model weights for the Double DQN model |
| 54 | +- `ddqn.pth`: saved model weights for the Dueling Double DQN model |
| 55 | +- `Navigation.ipynb`: notebook containing the solution |
| 56 | + |
| 57 | +### Instructions |
| 58 | + |
| 59 | +Follow the instructions in `Navigation_solution.ipynb` to get started with training your own agent! |
| 60 | +To watch a trained smart agent, follow the instructions below: |
| 61 | + |
| 62 | +- **DQN**: If you want to run the original DQN algorithm, use the checkpoint `dqn.pth` for loading the trained model. Also, choose the parameter `qnetwork` as `QNetwork` while defining the agent and the parameter `update_type` as `dqn`. |
| 63 | +- **Double DQN**: If you want to run the Double DQN algorithm, use the checkpoint `ddqn.pth` for loading the trained model. Also, choose the parameter `qnetwork` as `QNetwork` while defining the agent and the parameter `update_type` as `double_dqn`. |
| 64 | +- **Dueling Double DQN**: If you want to run the Dueling Double DQN algorithm, use the checkpoint `dddqn.pth` for loading the trained model. Also, choose the parameter `qnetwork` as `DuelingQNetwork` while defining the agent and the parameter `update_type` as `double_dqn`. |
| 65 | + |
| 66 | +### Enhancements |
| 67 | + |
| 68 | +Several enhancements to the original DQN algorithm have also been incorporated: |
| 69 | + |
| 70 | +- Double DQN [[Paper](https://arxiv.org/abs/1509.06461)] [[Code](https://github.com/dalmia/udacity-deep-reinforcement-learning/blob/master/2%20-%20Value-based%20methods/Project-Navigation/dqn_agent.py#L94)] |
| 71 | +- Prioritized Experience Replay [[Paper](https://arxiv.org/abs/1511.05952)] [[Code]()] (To be worked out) |
| 72 | +- Dueling DQN [[Paper](https://arxiv.org/abs/1511.06581)] [[Code](https://github.com/dalmia/udacity-deep-reinforcement-learning/blob/master/2%20-%20Value-based%20methods/Project-Navigation/model.py)] |
| 73 | + |
| 74 | +### Results |
| 75 | + |
| 76 | +Plot showing the score per episode over all the episodes. The environment was solved in **361** episodes i.e, acheived score of +13 (with Double DQN). |
| 77 | + |
| 78 | +| Double DQN | DQN | Dueling DQN | |
| 79 | +:-------------------------:|:-------------------------:|:-------------------------: |
| 80 | + |  |  |
| 81 | + |
| 82 | + |
| 83 | +### Challenge: Learning from Pixels |
| 84 | + |
| 85 | +In the project, your agent learned from information such as its velocity, along with ray-based perception of objects around its forward direction. A more challenging task would be to learn directly from pixels! |
| 86 | + |
| 87 | +To solve this harder task, you'll need to download a new Unity environment. This environment is almost identical to the project environment, where the only difference is that the state is an 84 x 84 RGB image, corresponding to the agent's first-person view. (**Note**: Udacity students should not submit a project with this new environment.) |
| 88 | + |
| 89 | +You need only select the environment that matches your operating system: |
| 90 | +- Linux: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana_Linux.zip) |
| 91 | +- Mac OSX: [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana.app.zip) |
| 92 | +- Windows (32-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana_Windows_x86.zip) |
| 93 | +- Windows (64-bit): [click here](https://s3-us-west-1.amazonaws.com/udacity-drlnd/P1/Banana/VisualBanana_Windows_x86_64.zip) |
| 94 | + |
| 95 | +Then, place the file in this folder, and unzip (or decompress) the file. Next, open `Navigation_Pixels.ipynb` and follow the instructions to learn how to use the Python API to control the agent. |
| 96 | + |
| 97 | +(_For AWS_) If you'd like to train the agent on AWS, you must follow the instructions to [set up X Server](https://github.com/Unity-Technologies/ml-agents/blob/master/docs/Training-on-Amazon-Web-Service.md), and then download the environment for the **Linux** operating system above. |
| 98 | + |
| 99 | +### Dependencies |
| 100 | + |
| 101 | +Use the `requirements.txt` file (in the [main](https://github.com/vivekthota16/Project-Navigation-Udacity-Deep-Reinforcement-Learning) folder) to install the required dependencies via `pip`. |
| 102 | + |
| 103 | +``` |
| 104 | +pip install -r requirements.txt |
| 105 | +
|
| 106 | +``` |
0 commit comments