Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Config of RL Model #71

Open
Tracked by #31
fabioseel opened this issue Jan 13, 2025 · 0 comments
Open
Tracked by #31

Discussion: Config of RL Model #71

fabioseel opened this issue Jan 13, 2025 · 0 comments
Labels
Feature A new capability in the library Major A large issue that may require a signficant commit

Comments

@fabioseel
Copy link
Contributor

fabioseel commented Jan 13, 2025

Should model be completely defined by config or should the RL Model automatically add simple linear layers such as the Action Parameterization?

Ok, so here's my thoughts on this after developing for a while:

Pro complete definition by config

  • cleaner / easier to understand
  • simpler weights file (as the state_dict will not have additional keys) for reuse
  • more consistent loss definition (target circuit)

Contra complete definition by config

  • weight file not reusable for other framework out of the box
    • model parts will need to be ignored upon load (how to define this? nicely? in config?)

How to deal with the samplefactory enforced head / core / tail structure?

I see the following options:

  • implement an autodiscovery of where to 'split' the circuit into these three parts.
    • would be the easiest to use - as long as it works. And that's a big IF.
  • somehow define (eg through the config) which part contains which circuits
    • quite strict binding to samplefactory model style, which we don't really want
  • modify samplefactory code
    • with the introduction of a custom learner this could be possible. Perhaps one could even implement it in a way that it is backwards compatible for samplefactory and integrate it into the library?
    • risks: lose some of the optimization performance, as the library does some 'magic' I did not dive into yet for the individual parts. In particular the core allows to use an RNN and the learner applies some optimizations etc for that. While I think this is probably the best solution considering what we get, it's also the most complicated and we might deviate more from 'standard' samplefactory
@fabioseel fabioseel mentioned this issue Jan 13, 2025
10 tasks
@fabioseel fabioseel changed the title decide / discuss: should model be completely defined by config or should the RL Model automatically add simple linear layers such as the Action Parameterization? Discussion: Config of RL Model Jan 13, 2025
@fabioseel fabioseel added this to the Sample Factory + RL milestone Jan 13, 2025
@fabioseel fabioseel added Feature A new capability in the library Major A large issue that may require a signficant commit labels Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature A new capability in the library Major A large issue that may require a signficant commit
Projects
None yet
Development

No branches or pull requests

1 participant