This project trains a reinforcement learning agent using Proximal Policy Optimization (PPO) from the Stable-Baselines3 library. The project includes automatic model saving, uploading models to a remote server, and resuming training from previously saved models.
- Uses PPO algorithm from Stable-Baselines3
- Supports TensorBoard logging
- Automatically saves models at regular intervals
- Uploads saved models to a remote server via pre-signed URLs
- Resumes training from the latest saved model
- Python 3.7 or later
pip
for package management
-
Clone the repository:
git clone https://github.com/ailiveco/local-trainings.git cd local-trainings
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install the required packages:
pip install -r requirements.txt
Update the following constants in main.py
as needed:
-
API Key: Replace
AILIVE_SECRET_APIKEY
with your valid API key.AILIVE_SECRET_APIKEY = "EXAMPLEKEY-zero-walking"
-
Save Interval: Adjust the frequency of model saving.
SAVE_INTERVAL = 500_000 # Save every 500,000 steps
-
Total Training Timesteps: Set the total number of timesteps for training.
TOTAL_TIMESTEPS = 10_000_000 # Train for 10 million steps
-
Run the main script to start training:
python main.py
-
TensorBoard logs are saved in the specified directory. To visualize logs:
tensorboard --logdir=./sessions/<agent_name>/<skill_name>/tensorboard
-
Models are saved in the
models/
directory inside the session folder.
.
├── main.py # Main script for training the agent
├── requirements.txt # List of dependencies
├── README.md # Project documentation
└── sessions/ # Directory for logs and saved models
└── <agent_name>/ # Agent-specific folder
└── <skill_name>/ # Skill-specific folder
├── tensorboard/ # TensorBoard logs
└── models/ # Saved models
Fetches a pre-signed URL for uploading models to a remote server.
Uploads a model file to the server using the pre-signed URL.
Saves the model locally and uploads it to the server.
Loads the latest saved model for resuming training.
Sets up the environment, initializes the PPO model, and manages the training loop.
- Ensure your API key is valid and matches the expected format.
- TensorBoard must be installed to use logging features.
- The
Humanoid-v5
environment is used as an example; you can replace it with any supported environment.
This project is licensed under the MIT License.