- there are two models:
- State constructor (get environment state from video)
- control model (generates actions from environment state)
- we train a custom CNN (Convolutional Neural Network) + LSTM (long short term memory) hybrid model to reconstruct environment state from states encountered during control model training.
- The environment state can be directly retreived from a simulated environment, but cant be collected from a physical one making this important
- The LSTM allows us to remember past information, thus allowing us to remember where cubes are and where other robots are
- The CNN allows us to get numeric data from image inputs, thus allowing us to input each frame captured by the camera to the LSTM
- We train a control model based on SimBa1 using PPO (Proximal Policy Optimization)
- This uses a custom environment and reward function
TO DO:
- make the environment - https://github.com/Unity-Technologies/ml-agents/blob/develop/docs/Learning-Environment-Create-New.md
- verify that the code makes sense