I have spent a lot of time thinking about the log analysis solutions in the last 10 months. I would like to present to you the new log analysis solution to which I have transformed my notebooks that I have been promoting last year.
AWS DeepRacer
AWS Deepracer is one of the Amazon Web Services machine learning devices aimed at sparking curiosity towards machine learning in a fun and engaging way.
Machine learning requires a lot of preparatory work to be able to apply its concepts. In DeepRacer AWS has done it all for you so that you can start training your car with minimum knowledge, then transfer the outcome onto a physical 1/18th scale car and have it race around the track. Then you can work your way back to understand what the hell just happened and what made it so awesome.
Let's top it up with competitions. How about challenging your friends? Or better, qualifying for the finals during an expenses-covered trip to AWS re:Invent conference in Las Vegas? That is something to fight for.
You can learn more about AWS DeepRacer on the official Getting Started page
Log Analysis
With time what is good for a day of fun becomes not enough for competing. Training won't improve the times and your car keeps trying to flee the racing track. Log analysis is here to help you ask the right questions and find the answers to them.
In the last year I've spent long hours first using the AWS DeepRacer log analysis tool, then expanding and improving it within the AWS DeepRacer Community to end the season with a community challenge to encourage contributions.
I've started last year with some tiny knowledge of Python and managed to learn how to use Jupyter Notebook and Pandas and to build enough knowledge and confidence to present this work at AWS re:Invent 2019:
Changes
As my knowledge grew, I felt more and more that it had to change.
Jupytext
It struck me during the log analysis challenge - we received ten great contributions that I only needed to merge to the git repo. Well, "only".
Jupyter Notebook is a great way to present work outcomes, the fact that it stores the outputs means that one can simply view the document without the need to evaluate the results. The emphasis on the visual side leads to problems in source control. It's not the first tool in the world with this problem - visual editors are just not great at generating content that's easy to handle by source control.
Jupyter Notebook uses a text format called json to store the results all the visual content is in it, all the images, all the metadata of the document. A tiny change visually can put the text file on its head. Rerunning the code, even on the same input data, leaves altered image outputs and metadata. I had to find a way to solve this.
I couldn't find a way to make the notebook format better but I managed to find an alternative approach. Jupytext was something that I found thanks to Florian Wetschoreck's posts on LinkedIn. It's a tool that integrates with Jupyter Notebook and enables storing the documents in parallel in the ipynb file as well as a py file. The regular Python file has a simplified format in python which can be the recreated into the regular Notebook, but also it's much easier to work with in version control. Instead of trying to find a change in a completely restructured json, I have a nice diff from a version control system. As an outcome I don't really have to worry about the notebook - I can simply regenerate it and commit to the repository after the merge.
Deepracer-utils
My first batch of changes to the original log analysis tool was taking out as much source code as possible. Methods defined in the notebook have made it swell in content which doesn't necessarily help you improve your racing. While it does expose you to how to start working with the data, it can overwhelm those who want a more in-depth understanding of their racing.
The intuitive first step was to put all that code in separate files just like you are tempted to clean up your room by stuffing the mess under the bed and pulling things out as needed. I realised it needed more structure and a way to enable others to use the methods without having to copy the files over.
I have moved the code to an external dependency: deepracer-utils. I have also reorganised it a bit into objects instead of just serving a big pile of methods. It was a great experience to prepare a Python project "the way it should be done".
This way we also gain a place to put various utilities which until now were scattered across various repositories such as model uploads to S3. If at some point AWS introduce an API for DeepRacer, the ability to improve racers' experience will be enormous.
Deepracer-analysis
With code moved into a separate project, all that's left to do is to clone th aws-deepracer-workshop repository. But not the original - the community fork. Then go to log-analysis. Oh, first check out the enhance-logs branch. Are you sure you're on the community repo, not breadcentric or ARCC?
I have decided to move the log analysis into a separate Community DeepRacer analysis repository: clone it, follow the instructions from readme, use it. I have ported the two notebooks that I've been maintaining to work with deepracer-utils - Training_analysis.ipynb and Evaluation_analysis.ipynb. I have decided to leave the original log analysis notebook behind to avoid confusion - I've been having it in there intact and it was becoming yet another thing to remember not to use when people were asking for help. While it has certain functions that are not yet introduced to the two moved notebooks I think I can live with it. They can be introduced in more notebooks in the new repo.
I have introduced some minor improvements in places which raised most questions - more plots now infer their size and don't require manual steering. I have also modified the actions breakdown graph so that the action space is detected automatically (only used actions, if you have an action that doesn't get used at all, it won't be listed).
Finally I have applied a few changes from the original repository that we have fallen behind with. This includes a nicer plot of track waypoints and changing units of coordinates system from centimetres to meters. I only reverted the change for a reward graph as it is broken in the original tool:
This graph should show awards granted depending on the place of the vehicle on the track. To do that in code you create something like an image - an array with all the coordinates on track where you store the rewards being granted. So why do you get some blobs of bright areas? Well, I told you the units have changed from centimetres to meters. Previously for a track of size 10x8 meters you would have 10*100*8*100
places to store the reward values. Now you have 10*8
. You must admit that's a bit of a loss of precision. I have changed units to meters an this is the only graph in which I go back to centimetres to avoid the precision loss. The graphs should look more like this one:
Next steps
There are a few things I want to get done:
- rethink logs fetching and reading - AWS have introduced logs storage on S3, local training environments store their logs in various locations. I would like to do it in a way that will not be overly complicated
- apply changes from the log analysis challenge - I have not accepted a single merge request, it's time to fix it
- reorganise the notebooks so that they are easier to start working with and help ramp up the users' skills so that they can expand the log analysis on their own
In the upcoming days I will be publishing a blog post on https://blog.deepracing.io to present the new log analysis. This post will be linked to describe the changes applied - I don't want to explain the changes over there, just focus on how to get going.
AWS Machine Learning Community
AWS recognising the AWS DeepRacer Community was quite rewarding, we started cooperating with AWS to make the product better, to improve the experience and to work around limitations that could get in between the curious ones and the knowledge waiting to be learned.
We have joined forces with folks from other areas of interest and rebranded the Slack channel to AWS Machine Learning Community. Our main focus is still DeepRacer. If you would like to join and have some fun together, head over to http://join.deepracing.io (you will be redirected to Slack).