Training locally for AWS DeepRacer and a Udacity Challenge with rewards

August started in the world of AWS DeepRacer with a high C - Friendly folks at Udacity created this challenge that you might want to get your hands dirty with: AWS DeepRacer Scholar...

4 months ago, comments: 0, votes: 4, reward: $0.00

August started in the world of AWS DeepRacer with a high C - Friendly folks at Udacity created this challenge that you might want to get your hands dirty with: AWS DeepRacer Scholarship Challenge. TLDR: race in Virtual League, if in the overall standings of August, September and October you end up in their top 200, you get a free course for Nanodegree in Machine Learning.

Frequently Asked Questions graphic

Since as a community we have doubled in members count last week, many eager members are dreaming of their career in autonomous learning and are asking questions. I decide to go through two channels: getting-started and local-training to assemble a bit of FAQs and a bit of hints on getting started with local training. This post will be quite chaotic, it's goal is to preserve some things, explain some other things and start a base to be reused once the community page is here.

What is it all about?

A quick reminder:

  • AWS DeepRacer is a model car taught to drive around a track using reinforcement learning,
  • AWS DeepRacer League is a series of events where one can take place in either physical or virtual races and best performers win their place in the league finals which take place at AWS re:invent 2019 in Las Vegas, NV in December 2019,
  • I am a racer with some level of success in the League and one of the first members of the AWS DeepRacer Community,
  • AWS DeepRacer Console is a service which makes starting your adventure with reinforcement learning effortless, but it has its drawbacks,
  • The AWS DeepRacer Community is a group of racers (including winners) who contribute to the whole idea while learning from each other and competing for the top spots,
  • One of the great outcomes is a local training environment which is still being actively developed, but while it's in its very early stage, it gives additional power and can limit your costs
  • Udactiy is a very popular online learning platform that gives a chance to win one of their courses through competing in AWS DeepRacer League

Udacity

The course from Udacity is a slightly modified course that AWS have published in April. It's a nice material to watch and Udacity have made it a bit more involving with questions and exercises. Also having met Blaine Sundrud who is a cool guy, watching it again makes it even more enjoyable.

The course focuses on getting started using the AWS DeepRacer Console. If you want to see how much it changed the game, compare times at AWS Summits before AWS London with the rest. It's not that it gives you superpowers - it gives you the simplicity you need to get started and enough context to switch over if you want more advanced solutions. The console makes it very easy to start another training.

Just too easy, so I really recommend the Udacity challenge, but with some awareness and caution.

Costs

The pricing page for AWS DeepRacer linked in the course shows some nice maths and stuff to explain the costs, it mentions some complexity requiring higher costs. All is carefully crafted in there. And you have Free Tier 10 hours training, how awesome! So... how many hours do you need to finish training?

The answer would be: many. You might be able to get around the track with a slow and stable function after a couple hours training, but it might not be enough for competitive racing. I'm not even sure how much an hour of training costs this month in the console. The last time I trained in the console was in June on Kumo Torakku and it was right before it turned out that a single hour could cost as much as $10. At that point I was after 12 hours training that got me a decent time already, but blocked me from further improvements.

Just to be clear: I know it's less now. The team have improved the setup to reduce the costs, but still 10 hours is not much even though I have heard of people completing the evaluation after 90 minutes training. Udacity have kind of tried to address it by putting an extra exercise in there: Lesson 6 Exercise III Task II involves using the Jupyter notebook to set up SageMaker, RoboMaker and a VPC. VPC was the pricey bit in the old setup. Also I don't know how much changed, but the notebook has been around since December. A word of caution: when I first tried using it I didn't know it cost money to leave it hanging in AWS. $30 in a monthe is not much of a pain, but it is some money. If you use it through AWS, remember to shut it down.

I'm still racing but haven't run a single training through the console in the last two months. This is thanks to...

The Community

The AWS DeepRacer Community that has grown around the toy and races has many members who contribute to improve cost effectiveness, training progress and overall fun.

Chris Rhodes, third in AWS Summit Sydney, prepared a set of tools for AWS DeepRacer local training that one can use to train using either the CPU or the GPU. Having a box prepared for training means much lower costs and training around the clock. Just today I've spent training my model for 12 hours just to see what happens and had no bill from AWS at all. What is more, the models can get submitted to the race without issues. Chris uses wiki for instructions

Alex Schultz has built upon Chris' repo and created a reasonably user friendly set of scripts to start it up and do the training called DeepRacer for dummies. They are currently cooperating to integrate some of the work Alex' scripts are doing into Chris' repo so that less needs to be done.

Furthermore, the guys at Autonomous Race Car Community are building a window application on top of the Alex' repo. This is in an early stage, but stay tuned.

New-starters

We've had so many people join our Slack channel with so many more or less basing questions, some of them very challenging, others very repetitive. I decided to try and assemble something to read so that we can help you get started with local training.

Important advice: Please take notes of what you're doing. At first you will not be sure what it was, you will do many pretty random things to get things up and running. Your notes will help you understand, but they will also help the community prepare a good written knowledge base and improve the tools. Remember this is a new thing. It's so new that most of this setup didn't exist three months ago. It's all done after hours and every little contribution will move it towards some maturity.

There are many reasons to prepare a local training environment:

  • costs - as mentioned above. You can do what the AWS DeepRacer Console lets you do, but cheaper,
  • power - local training lets you configure more things than remote one,
  • flexibility - you get to see more of the training artifacts and you can get back to them more easily.

This article/guide/manual/FAQs sucks

I could reorganize, clean up etc. but there is no time. This is needed now. This will act as a base for reuse on the community website part about local training.

How local is your local?

We are referring to local training most of the time, but it doesn't have to be very local. Interestingly, running an EC2 or a Spot instance is already cheaper than using the AWS DeepRacer Console. Many environments have been tried already, some worked, some didn't:

  • Ubuntu Linux 18.04 - this is the default one. Things just work. Mostly. I use Ubuntu and am happy with it, I've set the training up three times so far. I'll share some preparation advice below, but Tony Markham from the community Slack has created a great step by step guide from installing Ubuntu to local DeepRacer training](https://github.com/TonyMarkham/LocalDeepRacerUbuntuInstallation) for the less experienced users,
  • Other Linux distros - I believe I've heard people make it work on Arch, Manjaro and CentOS. I don't think there are any notes for that. Ask if you face problems, take notes, share them,
  • AWS EC2 Instances - there's a number of folks running them, you can be fairly confident you'll get some help for them on the channel. Jarrett Jordaan (jarrett on the community Slack) is your go-to guy for an AMI. His repo is a work in progress. The more you ask the more you can help get it ready. It also has been reported by Demeanour that Ubuntu version of deeplearning AMI has all the required tools, just gnome-terminal is missing - there may be some updates and consolidation of information in this area soon. I have no experience in those, so I'll leave you with the expert,
  • AWS EC2 Spot Instances - they come with greater savings, but you can lose one every now and then in the middle of the training. You don't lose the data though. You can get back to training later. Some instances are difficult to get a hold of. Talk to Jarrett, ask in the channel on the community Slack
  • Windows - It does not work. People have tried and there were issues. RayG has assembled his own notes with description of the problems he faced: look for file SettingUpDeepRacerLocalTrainingOnWin10.txt in search on the community Slack. You can team up with RayG and progress this. I know he went for dual boot and is very happy with his Ubuntu. I read that with most recent WSL2 more can be done in Windows and docker, but GPU is not yet available and no one has tried it yet. Be the first,
  • Macs - some people started the training effortlessly, some are banging their head agains the wall. I've seen these notes on setting up AWS DeepRacer local training on mac - could be useful, also Kevin from the community Slack prepared a reviewed and cleaned up version - he will appreciate all feedback. Ask in the local-training channel, maybe there was progress. Many people switched to Ubuntu. Also RayG says Apple no longer supports Nvidia drivers in the latest MacOS, AMD still supported, possibly
  • Nvidia Jetson - RichardFan (I think) got this nice toy from Nvidia and tried to set it up. It has an ARM processor which meant hardly any dependencies were available straight away. After a lot of fight he switched onto EC2 I believe,
  • Google Cloud - Shivam Garg and since then many others have managed to get it to work. Under the free account GPU is not available, when you upgrade your account you still have the credits that you can use and you get to use the GPU, but double check that you will get to use the credits on that (possibly yes). Finlay Mcrae from the community wrote a thing to get GPU training going on GCP, Paul M. Reese from the Udacity Challenge Slack wrote a very detailed article on setting up local DeepRacer training on Google Cloud Project which you might find handy if you need more details,
  • Google Colab - f-racer is wondering if it can be used. Team up with him and you can try. Make notes please.

If you decide to go for the local-local setup, you might want to read this:

  • you don't need a very powerful machine. You can go slow with your CPU - it will take long to train to something usable, but it may be fine. You're probably here to learn so it's fine,
  • if you have a graphics card, there is a large group of Nvidia cards users and a group of one using AMD. While there are more folks with Nvidia, the AMD guy is your very favourite Chris, the author of deepracer local training repo, so there is some chance for support for either one,
  • if you have an Nvidia, check its compute capability - tensorflow magic won't work easily on cards with compute capablility < 3.1. Jouni from the community Slack (the winner of Stockholm Summit race) prepared himself a setup for the other cards, but wasn't able to perform a single training due to lack of RAM on the card (if I understand properly). RAM sounds important, my card has 6GB,
  • box PC > laptop. While you can do local training, your laptop will sound like a hairdryer. I like my laptop and it helped me do awesome stuff. I didn't want it to overheat and get damaged so I magicked up a box PC. Even on CPU only it has a much more efficient cooling,
  • If you already have a computer, it's probably good to just work with it,
  • If you have a disk, it may be not enough. The training can generate as much as 200 GB per day. Sometimes more, sometimes less.

AWS EC2

I don't know much about it, but I started gathering people's experiences with the instances:

  • c4.2xls - Nick Kuhl said it's got the CPU power that the cheaper GPUs are missing
  • there are no 2xls that are GPU accelerated and the 4xls are just too expensive (just a comment I've seen)
  • Jarrett got GPU working on g3s.xlarge, had to use some AWS guide to optimize the GPU. He uses a spot instance at $0.25/h
  • Jochem Lugtenburg used p2.xlarge. He tried g3.4xlarge and it run gazebo in real time for $0.4324/h (spot instance, regular is pricier). For Jarrett it kept stopping as they were in high demand. Jochem also tried g2.2xlarge
  • Bobby Stenly runs a C5.2xLarge instance and the policy training takes him 3-10 minutes (slowing down with time)

People use Spot instances, add some 500GB EBS and set it not to be deleted on termination. See Info about EC2 above for more details on the setup.

Ubuntu prerequisites (or any for that matter)

I made myself some notes about what needs to be set up to on a box for Alex' or ARCC's repo to work:

  1. Docker - do NOT install from default repositories. Instead go to https://docs.docker.com/install/linux/docker-ce/ubuntu/ (other distros described as well) and configure it that way. Remember to follow steps from "Post-installation steps for Linux" in there. You can test with docker run hello-world
  2. Docker compose - Alex used latest syntax of docker-compose.yml so it's preferable that you have that one. I just followed this: https://docs.docker.com/compose/install/. You should be able to run docker-compose version now, my version is 1.24.0. There may be changes in the meantime
  3. Nvidia drivers (if you have and wish to use an Nvidia GPU) - I followed instructions from MVPS and verified using nvidia-smi
  4. Nvidia docker (if you have and wish to use an Nvidia GPU) - this is a tool that installs a runtime for your docker which utilizes your local gpu. A big small note here: this is likely to change soon. In the latest Docker Engine GPUs are supported out of the box, but docker compose does not support that so there isn't an easy way to set it up and use it. Luckily the setup just works with this integration. I followed the deprecated section of nvidia-docker-2 readme (Important: remove --upgrade-only from the command) and verified running docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi

If you use AMD, you will need to read in the Chris' repo to find advice. If you want to use Chris' repo directly, you'll have to set up some python virtual envs that Alex packaged into another docker container or two. It's all in the readme there.

What's in the local training setup

There are a couple things in there:

  • Minio is a local storage implementing S3 communication protocol. You can use tools from AWS to talk to it. In Chris' config you start up a binary in your system, in Alex' config it's a docker conainer,
  • SageMaker is a python utility that starts up a Redis to store the training data and a Tensorflow-based software that does the training for you - this one uses your GPU if you have one, builds new models based on the data in Redis and saves them in your S3 bucket,
  • RoboMaker is a set of tools that involve a Gazebo simulation engine where the simulated training is executed. It reads data from the bucket, performs a simulation, stores it in Redis from which SageMaker takes them

This is pretty much it. Chris uses python virtualenv to run Sagemaker on your system, Alex wraps around it with another container with all the tools needed.

An added extra is the logs analysis tool that AWS have provided and I have expanded slightly. I used virtualenv to run it, Alex wrapped around it with a container.

If you use the ARCC's clone of the Alex' repo, you will also have a chance to run a window application that is intended to act as a local console. I haven't used it yet so cannot tell much about it. It looks promising.

FAQs

I decided to go through the history of Slack channel to find more interesting or repetitive questions and their answers for your ease. I will try to expand the answers or write my own if I feel it could be different

While we're talking about questions, a couple community guidelines here:

  • use search in Slack. It works well, might find you an answer faster and will preserve more history in Slack (we're on free offering that limits the history),
  • read the readme and wiki in the project that you're using for the training. Have a look at https://arcc-race.github.io/deepracer-wiki/#/
  • if you're in doubt if your question is stupid or not, here's how I measure that: the only stupid questions are the ones not asked but you might be asked to improve it (which is fine),
  • do your best to provide as much useful info as possible for others so that you can get the best response possible,
  • even as a beginner you may know something others don't. Do your best to respond to others' questions.

Troubleshooting

If things may go wrong, things will go wrong. It will make it much easier to investigate if you do follow some basic steps to help narrow down the problem.

Check if you have everything installed
Run the basic checks (execute them in your terminal):

  • docker run hello-world - it may show some info about pulling the image and then should end with a message "Hello from Docker! (...)". If you're new with Docker, you can have a read of this message to understand what just happened
  • docker-compose version - it should print out something like docker-compose version 1.24.1, build 4667896b. The version may differ. If it's 1.20 or lower, I think Alex' files won't start for you because of an unsupported version
  • nvidia-smi - it should print out a table from nvidia with info about your gpu. This is only needed if you intend to use a GPU for training. This means you have a GPU up and running on your computer
  • docker run --runtime=nvidia --rm nvidia/cuda:9.0-base nvidia-smi - this should print the same thing as above, but through a docker container. This is only needed if you intend to use a GPU for training. If you get info about unrecognized runtime nvidia, nvidia-docker2 is missing

If any of the above fail, scroll up to "Ubuntu prerequisites" and review them.

Try to narrow down where the issue occurs
The most common case nowadays is that the training doesn't start when using Alex' repo. You run ./start.sh and either something happens or not. Let's try and work out what could be happening.

  • what is the output of the start script? Did you get any errors?
  • the script normally opens two new terminal windows, one with logs and one that runs the vncviewer with gazebo and rviz in it did you get them?
  • in a terminal window, execute docker ps - this shows running containers and might help narrow down the issue,
  • in the above you should have a number of containers listed, among them some are more interesting than others (I'm referring to them based on the NAMES column from the command above):
    • minio - our S3 bucket lookalike - this one usually starts and causes no issues,
    • robomaker - starts up robomaker, gazebo, rviz,
    • rl_coach - this starts up the code to spin up a sagemaker container,
    • tmpsomething - this runs image crr0004/sagemaker-rl-tensorflow:tag which runs sagemaker. The tag may be nvidia for GPU, amd for another GPU or console for CPU,
  • check logs for errors. Logs can be checked by running docker logs containername. Use container names from the docker ps output. Browse the logs for errors,
  • I'm not sure how quickly the containers get cleaned up, but sometimes docker logs works right after the failure but not later. Try running it right after the issue occurs

This is by no means exhaustive list, but may help you narrow down your issue. For instance we have had a number of problems where robomaker would get up, rl_coach would terminate and the training would not proceed, sometimes the vncviewer would also shut down after a couple minutes. It was caused by the nvidia-docker2 not installed which caused SageMaker container to fail to start. It could be seen in rl_coach logs as an error about unrecognized runtime nvidia.

I want to ask others for help
Sure. Just try to provide the output of the above checks, which repo you are using to run and whether you're trying to run on CPU or GPU. I strongly advise that you paste text in ticks or triple ticks instead of making screenshots of logs - this will make searching easier.

I found a bug
File it as an issue against the repository it's happening in. This is the only way to make sure it does not get swallowed by The Vast Void of Slack History.

This whole setup sucks!
I'm sure you'll be able to do better. Raise issues on GitHub, raise pull requests, fork repos and fix them, write your own solution. Share.

I cannot code, I cannot fix things
On the contrary. One thing you've noticed (and I know you have because you're reading the troubleshooting section) is that we, the coders or other tech witches, suck at is documentation. We try, we really try to do it well, but writing a line of code will be of more value to us than writing clear instructions on how to use it.
You've gone through a path of pain, sweat and tears to set up so much so far and I salute you. Now you can contribute so that others feel less pain. Write instructions, contribute to wikis, FAQs, READMEs etc. Submit issues to repos. You can make the DeepRacer world a better place. You can fix it.

Vncviewer/gazebo shuts down after a bit of training
If the reward function raises an exception (has an error in execution), the whole gazebo/robomaker/vncviewer shuts down. Check the logs for robomaker if that is the case. If not, do the full checking as described above.

There is an error in rviz in vnc viewer
This is caused by incomplete configuration of it and is not causing issues for the training. Sagar from the community Slack provided hints on how to get it configured:

  • In Global Options - Fixed Frame select chassis (it's a dropdown)
  • Press Add in lower left part of the RViz window and select Camera
  • Set Camera - Image Topic to something starting with /camera/zed/rgb (it's a dropdown)

You should see what your car sees when driving around the track. Well done on checking such detailed things :)

My issue is not listed here
Use search in Slack, then investigate, then ask for help.

Training

Which project should I use to set up the training?
Yes.
Each project has something useful. Chris set everything up and exposed most of the geeky stuff to us all. Alex hid some of it to make it easier to set up. ARCC guys are pushing it even further. Some people are trying to remove docker from it all.
All projects have some pains of an early stage and low maturity. I'd say Alex' repo or ARCC one are the best ones to use now.

Can I run the training in Emacs through Sendmail?
You can use whatever custom setup you want. Just remember: if you prepare it, you maintain it. Every update to the environment (bug fixes, tracks etc.) may break something and the more users are involved, the more maintained the project is. You may consider whether the benefits of having it your way are worth the effort needed to propagate the changes.
That said, if the benefits are there, ask in the community Slack, you might get others to help.

I ran training for a full hour, but I didn't get a full lap
How much is enough training to get a lap is a difficult question. For the Shanghai track the minimum I heard worked was 90 minutes on the training track that got a lap on the evaluation one. Not the fastest one, but worked. I got my first lap with zero training on Shanghai track - I used my AWS Summit model trained only on reinvent track.
I generally suggest trying submitting for evaluation even if you're not getting full training laps - they are different tracks and things may work on one better than on the other.
There's also too much training. Some ideas to consider:

  • what if your reward function doesn't promote best behaviour?
  • what if your car settles on least effort mediocre result that yields a good reward?
  • what if you overfit the car to make decisions based on the background, not the track?

Is there a time/step limit for an episode? What if the car just stops?
There are three conditions I know of that stop the car:

  • not a single wheel is touching the tarmac (this is the definition of off-track)
  • the car makes no progress (stays in place) for a bit
  • the car completes the track
    The car will not just stop if you don't have an action with zero speed value. That said it can get blocked.
    In the New York City track there was a bridge that could block the car - this would normally stop the training episode. But under a specific angle the car would not go off the track, would not stop completely either. It would make tiny progress that would get it stuck for a while. I don't know how to kill this, but I secured my reward function so that if I have steps with progress change under a certain value, I would mark the training as broken and significantly reduce the reward.
    Every now and then I get a case where gazebo doesn't refresh, but the training continues.

Is my usage of AWS DeepRacer Console covered by Free Usage Tier?
I don't know. Billing is a bit of black magic to me. Just remember that less than 10 hours training could be covered. If you intend to use the console, I would recommend agreeing on a certain budget that would not ruin you, monitor the costs in Billing Dashboard and Cost Explorer (every day!) and remember: When the fun stops, stop.If you get refunds from Free Usage Tier at the end of the month - cool, freebies. If you get charged but it turns out you used up your Free Usage Tier things, I don't think there would be much negotiation options. Remember AWS people keep saying that it's not about selling their services and they are fine with using local training setup.

My car is trying to deliver parcels in Shanghai! (drives off track and stays there)
Make sure you have a contract with Amazon for deliveries. Smells like a business opportunity.
After the car completes a set of episodes for the iteration, SageMaker starts learning and the car's behaviour is not too deterministic. Normally it slowly drives somewhere either along the track or off it. In London Loop I've seen it get past the hills and fall into the void. Pretty damn spectacular.
Once the learning phase is over, it gets placed on track again and returns to training.
Some people have reported the car drive off the track and never reset, I'm not sure what can be done about that as I have never managed to investigate it.

Do I need to train against the Shanghai training track?
No, you don't. Using many tracks may mean your car will not get overly fitted to a single one. Also some tracks may share certain characteristics of the track and you might want to learn against them because of that.

How to manage history, cloning, changes to hyperparams, reward functions etc?
There are many naming options that people use. Some name it with some details from training, others just add symbols. I normally have a two level versioning - first number describes a training stream and the other describes an increment within it.
Some people use git to manage it all, I take notes that include non-training specific info as well.
I'm sure you have a better way, please share.
Chris' repo has the dr_util.py that does some work in that area, Alex' repo is awaiting one, but lets you upload a model.

Is it a good idea to use the pre-trained models for training?
You can if you're not planning to perform well. It will give you some start, but bear in mind the models are slow and not the most stable ever. They are there so that the console is not empty at start, they are also used at the summits when someone wants to see a car on the track and talk to the pit crew while holding a tablet. Your first model from scratch is quite likely to outperform them, and it will be your own so it's even better.

Is there a systematic way of training or is it more like just try what seems to make sense, train and see the time?
Both can work. With local training it is much easier (cheaper) to just try things and reach interesting conclusions. Both can fail as well. I recommend getting familiar with logs analysis to be able to measure progress in training. While I have prepared tools for analysing AWS DeepRacer logs based on the tools provided by AWS, I know people further expand them or replace completely with their own solutions. Whichever way you go, just remember that your model is only as good as your data can prove and evaluation can confirm.

Why are all files owned by root? Why do I need sudo?
This is related to Docker which operates on this level. I know it could be done better, I'm not sure if Docker delivers it. All files are created as root files for Alex' repo and you will need sudo to work with them. Some files in Chris' repo are the same.

I'm starting the training and am getting "Found a lock file ..., waiting
You were unlucky. Truly unlucky. You killed your training in the middle of writing files. Just remove the .lock file listed, make sure checkpoint file in there points at the last complete set of Step files (there should be three of them) and you'll be good to go, even no restart required.

I'm starting the training and am getting Received termination signal from trainer. Goodbye.
This happens when your training NaNs. It writes a file .finished in checkpoints folder (location depends on your setup). If it exists on startup, the robomaker stops before it starts. Remove the file and restart.

I received NaN in training. What to do?
Well, your training reached a point in which it cannot optimize into a different point. Not in this config. There are ways to delay a NaN:

  • move a couple checkpoints back
  • decrease learning rate
  • change track
  • change reward function

Sometimes they work, sometimes they don't' Remember about removing the .finished file as described in question above.

Can I stop and continue training?
Yes, the repo of your choices will have instructions on how to use a model as a base for new training. While you do so, you can change a couple things:

  • reward function
  • track
  • hyperparameters
  • action space values (but it's a risky area, you may improve, you may break things big time) - but never change the amount of actions!

What is the Real Time Factor and how does it influence the training?
What it is: it informs you about the difference between the simulation time and real time. If it's 0.9 then in the 10 seconds of real time 9 seconds of simulation time elapsed. It influences your training in a way as steps take longer. That said I have had values between 0.8 and 0.9 and it was fine. 0.9 feels fine in general. Your model may go quicker in evaluation in the race.

What tracks can I use for training?
There are many, some are working, some are not. They are baked into simulation/aws-robomaker-sample-application-deepracer/simulation_ws/src/deepracer_simulation/worlds in Chris' repo and we do our best to add them as they come. Jarrett from the community Slack has tried to load some and not all worked. Alex thinks there's a bit of a mess with the data in there and proper copying could be used. Chris is working on a manual to get the training data from AWS. RichardFan has had some success fetching the data from AWS as well. There is a potential in exploring them more. Have I mentioned that AWS reward people with credits for good articles?

Action space

Can I submit models with custom action-space?
Yes you can.

In local training the model_metadata.json in your bucket/custom_files folder is your buddy, set your values and make sure you set continuous indices.

in the console it requires a bit more magic. Start a training for a minimum time (a minute?) with an action space that has as many actions as you want, then go to its folder in the bucket, take the model_metadata.json, alter values, upload and clone the model. Going forward it will use those values. The reason for a short training is making sure the model doesn't get trained for the initial values so that you don't have to train it away.

I've set my action spaces to a single speed 1 m/s. Will my car be only going 1 m/s?
The speed value in action spaces is actually not speed but throttle. Also we believe it got implemented using the wrong unit and doesn't really mean 1 m/s. We don't know what the interpretation should be for this value.

Does the action space need to be evenly spaced and all combinations of speed and turns?
If you want, you can have an action space of one action. If it's turning 0 and speed 100, it might perform well on a straight line track.
You can have any number of actions with any combinations of turning and throttle, they need to be numbered from 0 to N-1 where N is the count of actions.
They don't have to be aligned in any specific way. I'm not sure what will happen if two actions are identical.

How many actions can I have in the action space?
As many as you like, just number them properly. Many actions may lead to problems with converging.

Reward function

How do I reward/penalize terminal states (off-track/lap complete)?
There are parameters like all wheels on track and distance from the centre. That said the training code detects if at least one wheel is touching the tarmac. If not, it will stop the training episode. It is a penalty on its own for not completing the lap.
Complete lap can be detected using progress: progress equal 100.0 means complete lap.
Read more the docs to learn more about parameters available in the reward function.

I set my reward to return 1 and I got around the track. What the actual flip?
This is a valid reward. Your car will get best rewards for completing a track making the most steps possible (lowest velocity, greatest distance). It's pretty much same as rewarding higher speeds - the car will go fast, but will try the greatest distance. If you go too fast, it might learn to spin and go back and forth to gather even more.

I've read that the training reward function is not run in the real world. What is? Is the generated policy just another term for an internal reward function?
Your reward function is used to train a neural network model which translates the 150x250 grayscale pixels into actions. This model is used in the car, so there is no point for the reward function.
When evaluating the logs have some reward function applied. It's not used for anything but preserving the logs structure. It reflects the distance from the centre of the track.

I want to include moon phases in my reward function (how complicated should the reward function be?)
If you put sophisticated things in your reward function, or too many factors, the car may get confused. If you put too few, the car may not progress. In many cases the training may succeed with regards to reward, but not the performance that you want. Think of it as a mean 13-year-old that always complies with your rules but only in a way that will annoy you the most.
We've had good players that would come and nail it using a very simple function. It's doable but may take longer.
Whichever you go for, verify progress and reevaluate.

Hyperparameters

How to adjust the hyperparameters for the initial training and how to progress them?
I understand hyperparameters as magic applied through a combination of gut feeling and experience. I'm not sure there is a formula. I usually start with defaults (maybe changing the discount rate a bit) and then I keep decreasing the learning rate when I start to see a wobble in my reward per iteration graph.

The official documentation say that the discount rate 0.9 means reward is analyzed 10 steps ahead, 0.99 - 100 steps and 0.999 - 1000 steps. I may be talking absolute rubbish here but it feels we shouldn't be looking that far ahead when the car completes the lap in more or less 150-160 steps. I understand that the number of steps(s) for discount rate(dr) is result of a formula s=1/(1-dr). So dr=1-1/s. I usually set it to 0.9875 which would mean 80 steps. This should be enough to get past a turn, a straight or a sequence of turns close to each other - the way I see it.

Some people lower the learning rate to 0.0001 and then drop even further. I usually start with default 0.0003 and way for the reward graphs per iteration to start zigzagging, then lower it.

What will happen if my hyperparameters go wrong?
If you for instance set a low discount factor or high entropy, your training may go wrong and the reward function will drop and go flat. Try avoiding such cases, either start your training over from the previous pretrained state or fetch the last not broken checkpoint from RoboMaker and use that as the starting point.

Evaluation

Unable to finish 1 lap - what does it mean?
It may be a good thing. You've successfully submitted a model, it got loaded and evaluated. Now make sure you get around the track. Never stop with one attempt - keep submitting every 30 minutes. Each submission is 5 lap attempts.

Why the training and evaluation track are so different?
This is to prevent you from overfitting the model. It makes it more challenging to get around the track with a good result.

Can I submit a locally trained model to virtual race?
Yes you can. Follow the repo that you're using to know which way to do it. Chris' repo comes with dr_utils.py (which has --help) and Alex' repo comes with scripts/training/upload-snapshot.sh. I personally prefer dr_utils.py as I wrote it, but Alex used aws command line util which works better in difficult networking conditions as the ones I'm facing. A hybrid approach would be awesome, I'll work on that at some point.

In general the rules are as follows: Submission requires a set of files in a folder associated with an existing training in your AWS DeepRacer Console. You can create a training named let's say "Local-Training-Submissions" that you can reuse for this. Such training has a folder in your DeepRacer bucket with the training date in the name which has two folders: ip with entry files to your training (sadly no reward file, would be useful for bookkeeping), but it doesn't matter in evaluation/submissions. The other one, model, is the one that matters. It has many files, but of those only five are important for submissions:

  • model_metadata.json - your action space
  • X_Step-Y.ckpt.Z - three files where X is the checkpoint number, Y is some number I don't care about (and which can be changed to anything as long as it's also in the checkpoint file) and Z is one of three extensions: index, meta and data-00000-of-00001
  • checkpoint - text file with information about the most recent checkpoint and other, irrelevant ones it needs at least two lines:
model_checkpoint_path: "X_Step-Y.ckpt"
`all_model_checkpoint_paths: "X_Step-Y.ckpt"
  • The model_X.pb file doesn't matter. It is used when loading the model onto a car only.

Once your five files are in the folder, if you submit the model, those files will be used regardless of what originally went into the training. The utility scripts in repos make it convenient to do, they only require a configured AWS CLI, the name of the bucket and the name of the folder in which the model is to land. And permissions to add/delete files in there.

Can I evaluate locally?
Somehow, yes. Sagar reported that adding environment variable NUMBER_OF_TRIALS in docker/.env file in Alex' repo makes the evaluation script work. I haven't tried it though. It may be soon added to Alex' repo, watch out for updates.

Hardware

What card to get to do training?
That's a tricky one. It seems that VRAM is more important than the amount of SuperExtraMegaEinsteInUltraCurieCores or whatever makes you sigh in awe when reading the specs. I personally got a GeForce 1660 Ti which has 6GB of VRAM which is utilised fully when training. Some people got Geforce 2070 Ti or something and it seemed similar - full VRAM usage, computing power - not so much.
It doesn't have to be latest greatest then to make you happy in terms of training. You may need to wait a couple seconds longer for your training to complete. This article from April 2019 says anything above 1060 should be fine. Also aim for 6GB VRAM or more.

How much disk space do I need?
Remember, with speed comes data, lots of it. The training can generate 200GB a day, sometimes more. Make sure you secure space and clean up when possible. Jeff Klopfenstein said he set his Ubuntu partitions to 160 GB (both data and the system) and it allows him to train for about 6 hours uninterrupted.
I'm currently using 500 GB drive and I clean it up every two days or so.

I have this laptop, can I use it?
Yes, you can use it. Yes, it will sound like a hairdryer. No, mobile graphics cards aren't as good as full size ones.

Can I use this old GPU?
You can try. Many people struggled and even though they rebuilt the software to support them, the memory wasn't enough to support training even with tiny batch size.

Can I use external GPU?
Chris Thompson from the community Slack recently reported plans to try. If you want to as well, get in touch with him.

How to confirm I'm using GPU?
Run nvidia-smi while training, there should be a python3.6 process in there consuming a lot of memory.

Summary

I've seen some incredible effort from the new members of the community to get their training up and running. Many issues haven't been resolved yet, we're trying to work it out.

If you would like to suggest a change or an update to this post, get in touch with me in the community Slack.

The August race is going great with over 500 entries in the first ten days! I'm hoping to see you on the track.