2021 Feedback and Suggestions #14

ukoethe · 2021-05-16T22:37:50Z

General

Is it fair to have the grade depend on just a (small) part of the lecture?
Force the students to engage with the project sooner, not in the last minute
Have a preparatory homework on RL, and have the RL lectures earlier in the semester, and perhaps more of them.
Provide agents from past years as test competitors (shortly before submission?) or another form of benchmarking for students to decide more easily if an approach is worthwhile pursuing.
Allow teams to submit two agents (one safe bet using straightforward ML methods, one more fancy using DeepRL).

Environment

Automate acceptance testing via github CI
Environment could provide:
- a feature telling which agent won the round
- generally more information about the opponents (e.g. their current score)
- possibility to switch between training and evaluation more easily, e.g. using callable events without training
- ~~initialization at other than the standard starting state~~ -> Add custom --scenario and modify build_arena
- adjustable crate density, board size, and starting corner for training via command line option
- ~~possibility to pause an episode for inspection and later resume (instead of restart)~~ -> use step debugging
- or: mention in the instructions that such things can be implemented during method development and should be undone for the final training/testing
Potential bugs:
- Environment should call the function game_events_occured() also for the last step before the game ends.
- "We observed, that the crate distribution is not completely random and that in general
  fewer crates are placed in the bottom right corner. Thereby, more free tiles can be found
  on the bottom right, giving the agent, that starts in this corner, an advantage because the
  probability to kill itself is considerably smaller. Our agent, for example, can play the game
  very well, when starting on the bottom right, but is rather bad when starting elsewhere."
Modify environment for training
- ~~shorter thinking intervals or other speed-ups~~
- switch-off multi-threading for easier debugging and profiling
- provide a passive environment that the agent can call (instead of the other way around)
  - allows easy parallel execution of several environments for faster training
  - GUI and logging are not needed during training
  - makes the training procedure compatible with TFPyEnvironment in https://github.com/tensorflow/agents, or use gym for compatibility with keras_rl
  - example implementation (files items_fast.py, environment_fast.py, agents_fast.py, bomberman_adapter.py) in https://gitlab.com/koetherminator/fml-project
~~Use IntEnums instead of strings to speed-up comparisons (?)~~
Be closer to the original version of the game (e.g. drop several bombs simultaneously)

Project instructions

Remind students that the University logo must not be used in the report.
Specify in more detail the grading criteria and requirements.
Most articles in RL are about neural networks => point out the pre-NN literature (this would also put the project more in line with the rest of the lecture) and other recommended reading (e.g. about reward shaping).
Split assignment into more fine-grained subtasks, e.g. task 1a: free coins under fixed crates
Add more documentation about the game environment
- Describe bomb behavior accurately (bombs are only dangerous for one step!).
- Collect crucial information (e.g. adjustable parameters, required Python version, number of coins created) in a table
- Explain that self-play is best realized by multiple copies of the same agent (possibly plus some randomization to make behavior more diverse).
Clarify that the environment can (and should!) be changed for debugging, profiling, and training -- just don't forget to undo the changes later on
- timeout may be set to "infinity" to avoid interference with the debugger
- board size, crate density etc. can be changed to create additional intermediate tasks
- implement stop-and-resume for inspection
Explain symmetry of the game
- reduce search space by exploiting symmetries
- implement reward asymmetries to avoid undecided agents in symmetric situations
Generally, give a few more tips on promising approaches.
Provide Latex template for the report
Make more clear that a mentoring tutor is available for questions

Hardware

Google Colab difficulties:
- default Python version is only 3.7 => lots of extra work to install everything from scratch every time
- only 2 hours of consecutive computing time

The text was updated successfully, but these errors were encountered:

fdraxler · 2022-02-17T17:06:39Z

I updated the main routines of the program. Step debugging and error raising is now transparent (no more threading, errors are just raised unless suppressed).

fdraxler · 2022-02-17T17:50:19Z

The crate distribution was already homogeneous.

fdraxler · 2022-02-18T09:09:30Z

I added the missing call to game_events_occurred().

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

2021 Feedback and Suggestions #14

2021 Feedback and Suggestions #14

ukoethe commented May 16, 2021 •

edited by fdraxler

Loading

fdraxler commented Feb 17, 2022

fdraxler commented Feb 17, 2022

fdraxler commented Feb 18, 2022

2021 Feedback and Suggestions #14

2021 Feedback and Suggestions #14

Comments

ukoethe commented May 16, 2021 • edited by fdraxler Loading

General

Environment

Project instructions

Hardware

fdraxler commented Feb 17, 2022

fdraxler commented Feb 17, 2022

fdraxler commented Feb 18, 2022

ukoethe commented May 16, 2021 •

edited by fdraxler

Loading