You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is it fair to have the grade depend on just a (small) part of the lecture?
Force the students to engage with the project sooner, not in the last minute
Have a preparatory homework on RL, and have the RL lectures earlier in the semester, and perhaps more of them.
Provide agents from past years as test competitors (shortly before submission?) or another form of benchmarking for students to decide more easily if an approach is worthwhile pursuing.
Allow teams to submit two agents (one safe bet using straightforward ML methods, one more fancy using DeepRL).
Environment
Automate acceptance testing via github CI
Environment could provide:
a feature telling which agent won the round
generally more information about the opponents (e.g. their current score)
possibility to switch between training and evaluation more easily, e.g. using callable events without training
initialization at other than the standard starting state -> Add custom --scenario and modify build_arena
adjustable crate density, board size, and starting corner for training via command line option
possibility to pause an episode for inspection and later resume (instead of restart) -> use step debugging
or: mention in the instructions that such things can be implemented during method development and should be undone for the final training/testing
Potential bugs:
Environment should call the function game_events_occured() also for the last step before the game ends.
"We observed, that the crate distribution is not completely random and that in general
fewer crates are placed in the bottom right corner. Thereby, more free tiles can be found
on the bottom right, giving the agent, that starts in this corner, an advantage because the
probability to kill itself is considerably smaller. Our agent, for example, can play the game
very well, when starting on the bottom right, but is rather bad when starting elsewhere."
Modify environment for training
shorter thinking intervals or other speed-ups
switch-off multi-threading for easier debugging and profiling
provide a passive environment that the agent can call (instead of the other way around)
allows easy parallel execution of several environments for faster training
GUI and logging are not needed during training
makes the training procedure compatible with TFPyEnvironment in https://github.com/tensorflow/agents, or use gym for compatibility with keras_rl
Use IntEnums instead of strings to speed-up comparisons (?)
Be closer to the original version of the game (e.g. drop several bombs simultaneously)
Project instructions
Remind students that the University logo must not be used in the report.
Specify in more detail the grading criteria and requirements.
Most articles in RL are about neural networks => point out the pre-NN literature (this would also put the project more in line with the rest of the lecture) and other recommended reading (e.g. about reward shaping).
Split assignment into more fine-grained subtasks, e.g. task 1a: free coins under fixed crates
Add more documentation about the game environment
Describe bomb behavior accurately (bombs are only dangerous for one step!).
Collect crucial information (e.g. adjustable parameters, required Python version, number of coins created) in a table
Explain that self-play is best realized by multiple copies of the same agent (possibly plus some randomization to make behavior more diverse).
Clarify that the environment can (and should!) be changed for debugging, profiling, and training -- just don't forget to undo the changes later on
timeout may be set to "infinity" to avoid interference with the debugger
board size, crate density etc. can be changed to create additional intermediate tasks
implement stop-and-resume for inspection
Explain symmetry of the game
reduce search space by exploiting symmetries
implement reward asymmetries to avoid undecided agents in symmetric situations
Generally, give a few more tips on promising approaches.
Provide Latex template for the report
Make more clear that a mentoring tutor is available for questions
Hardware
Google Colab difficulties:
default Python version is only 3.7 => lots of extra work to install everything from scratch every time
only 2 hours of consecutive computing time
The text was updated successfully, but these errors were encountered:
I updated the main routines of the program. Step debugging and error raising is now transparent (no more threading, errors are just raised unless suppressed).
General
Environment
initialization at other than the standard starting state-> Add custom--scenario
and modifybuild_arena
possibility to pause an episode for inspection and later resume (instead of restart)-> use step debugginggame_events_occured()
also for the last step before the game ends.fewer crates are placed in the bottom right corner. Thereby, more free tiles can be found
on the bottom right, giving the agent, that starts in this corner, an advantage because the
probability to kill itself is considerably smaller. Our agent, for example, can play the game
very well, when starting on the bottom right, but is rather bad when starting elsewhere."
shorter thinking intervals or other speed-upsTFPyEnvironment
in https://github.com/tensorflow/agents, or use gym for compatibility with keras_rlitems_fast.py
,environment_fast.py
,agents_fast.py
,bomberman_adapter.py
) in https://gitlab.com/koetherminator/fml-projectUse IntEnums instead of strings to speed-up comparisons (?)Project instructions
Hardware
The text was updated successfully, but these errors were encountered: