Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MaskablePPOPlayer #297

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Add MaskablePPOPlayer #297

wants to merge 1 commit into from

Conversation

zarns
Copy link
Contributor

@zarns zarns commented Nov 16, 2024

Supersedes #287

I added the SubprocVecEnv to allow multiple games to be played at once, so training data is captured about 5x faster. I trained for 4.96 days straight (100,000,000 timesteps) with this configuration and the model.zip file is 1525MB (too big to upload to git, unfortunately). After 5 days of training, PPOPlayer has an 8% win rate against AB-pruning and an 11% win rate against ValueFunctionPlayer. Attached is the wandb graph output. You can see that the episode_reward_mean is not slowing down, but it's simply not training fast enough on my RTX 4070 to realistically surpass the AB-pruning player. Perhaps the model has too many layers, slowing down training, but I've played around quite a bit with different hyperparameters and model sizes and this is the best I've come up with.

The features_extractor CNN doesn't seem to help much in training shorter runs even with much smaller model sizes. I'm starting to think stablebaselines isn't the best way to go. AlphaZero uses a combo of MCTS with this actor/critic neural net, and maybe we need to pursue recreating it for Catan.

Note that if you want to pull the branch and play around with it, you'll have to delete the model.zip before each run to reset the architecture.

image

Copy link

netlify bot commented Nov 16, 2024

👷 Deploy request for catanatron-staging pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit bc2ec10

@zarns
Copy link
Contributor Author

zarns commented Nov 16, 2024

Looks like the build fails anyway bc the sb3_contrib requirements aren't met. We could just leave this as an open pull request too, I guess

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant