Week 13 Homework: Reinforcement Learning with Neural Nets

  due date: Tue Apr 18 by 9:00 PM
  email to: mcb419@gmail.com
  subject: hw13
  email contents:
    1) jsbin.com link to your project code
    2) answer all the questions at the bottom of this page in the email
  

Introduction

This week we combine reinforcement learning with neural networks. The important changes relative to last week are that
(a) the state space is now continuous, rather than discrete and
(b) Q values are estimated using a neural network, rather than a table.

For this exercise, we will use a simple video game scenario, where the goal is to move a paddle left and right catch green pellets and avoid red pellets. We follow the general approach given in this week's reading assingment: Mnih V, et al. (2015) Human Level Control Through Deep Reinforcement Learning. Nature 518, 539-533. We will use the neural net library and reinforcement learning module documented here: ConvNetJS: Deep Q Learning Demo

Pellets: red value = -1; green value = +1
Left/Right Sensors: at each end of the paddle; provide information about distance to red and green pellets
Sensor values: computed as (30/dist); e.g., an object 30 pixels away has a value of 1
Actions: the bot has 3 actions; 0 = move left, 1 = move right; 2 = stop
Network input: the four sensor values [leftRed, leftGreen, rightRed, rightGreen]
Network output: estimated Q values for the 3 possible actions
Network internal layers: you decide
 

Instructions

First, select "randAction" and click the "run series" button. Look in the results table and you should see a value around 8-9.

Before you can use reinforcement learning, you have to design your neural network by editing the code in the function resetBrain found in sketch.js. In particular, you'll need to specify one or more hidden layers, the number of neurons per layer, the type of activation function in each layer ('relu', 'sigmoid', or 'tanh'). You may also want to change the associated parameters (epsilon, gamma, learning rate, batch_size, l2_decay, etc.) Note that batch_size has a big effect on the update rate... if you set it too large your simulation will run very slowly. You'll probably want to repeat the training series multiple times until the performance is no longer improving. Then you should select "testing" and then click "run series" to measure the final performance.

NOTES:
Your target performance should be above 20.
Save your best network directly into your HTML file by first clicking the "Save Network" button below, then copy-and-paste the network description from the textbox into the corresponding textarea section near the bottom of your HTML file.

Questions:

(provide answers in the body of your email, or in the HTML file, whichever you prefer)
  1. Did you remember to save your best network into your HTML file before submission?
  2. What was the average fitness that you achieved for your best network?
  3. How many training trials did it take to achieve this performance?
  4. Briefly describe the network architecture that you found to be most effective(e.g., #layers, #neurons, activation functions).
  5. Describe any changes that you made to the other learning parameters (the "opt" values).

Results Table

ControllerFitness
mean (std dev)

Load / Save network

These buttons will load/save the network architecture temporarily using the text box below. To save this information permanently, you would need to copy and paste the textbox contents into the appropriate section of the index.html file.