Merge pull request #286 from danipeix13/master

Final changes
robocomp · Sep 11, 2022 · ef958a1 · ef958a1
2 parents a3f01c2 + 3985c6b
commit ef958a1
Show file tree

Hide file tree

Showing 6 changed files with 24 additions and 22 deletions.
diff --git a/gsoc/2022/posts/daniel_peix/4-2d_DQN.md b/gsoc/2022/posts/daniel_peix/4-2d_DQN.md
@@ -27,6 +27,7 @@ The reward function is one of the most important aspects of a Reinforcement Lear
 In order to create a decent reward function, the 2d distance is used. With two thresholds, the environment can check is the gripper is above te cube, far away from it or any other possible situation. Depending on which situation is taking place, the reward wil be different.
 
 ## Results
-TODO: YouTube links
+Start of the training: https://youtu.be/fx7trTHZsjk
+End of the training: https://youtu.be/bgIHhoa8vrQ
 
 __Daniel Peix del Río__
diff --git a/gsoc/2022/posts/daniel_peix/5-3d_DQN.md b/gsoc/2022/posts/daniel_peix/5-3d_DQN.md
@@ -22,6 +22,7 @@ In this case, the size of the observation needs to change. As we are using 3 dim
 This reward function is not as trivial as the one done for two dimensions. In this case, there is more information available from the environment, so the reward function might be a little more complicated. The data which is used in this new reward function is: 2D distance, 3D distance and the gripper's 'fingers' data. Using two values for the distance (2D and 3D) allows us to give more importance to the 2D distance over the 3D one, because is crucial to be first above the cube.
 
 ## Results
-TODO: YouTube links
+Start of the training: https://youtu.be/zBbi9Xjelkg
+End of the training: https://youtu.be/T5mk46UGFe8
 
 __Daniel Peix del Río__
diff --git a/gsoc/2022/posts/daniel_peix/6-4d_DQN.md b/gsoc/2022/posts/daniel_peix/6-4d_DQN.md
@@ -22,6 +22,8 @@ In this case, the size of the observation needs to change. As we are now using t
 This reward function is quite similar to the 3-dimensional one, just adding the griper's 'hand' info. The data which is used in this new reward function is: 2D distance, 3D distance, the gripper's 'fingers' data and the gripper's 'hand' data.
 
 ## Results
-TODO: YouTube links
+Start of the training: https://youtu.be/TjRCTKmOpRg
+Middle of the training: https://youtu.be/VOYvWodl6Ik
+End of the training: The computer clogged because CoppeliaSim used all the RAM
 
 __Daniel Peix del Río__
diff --git a/gsoc/2022/posts/daniel_peix/7-Conclusions_and_ideas.md b/gsoc/2022/posts/daniel_peix/7-Conclusions_and_ideas.md
diff --git a/gsoc/2022/posts/daniel_peix/7-Conclusions_and_improvements.md b/gsoc/2022/posts/daniel_peix/7-Conclusions_and_improvements.md
@@ -0,0 +1,15 @@
+# Post 7: Conclusions and improvements
+
+## Conclusions
+Throughout this project, I have expanded my knowledge of reinforcement learning. It was a field of which I only knew some very basic concepts, so I have learned quite a few things that I did not know before. 
+
+I have also understood the functioning of the two algorithms used: the Q algorithm and DQN. 
+
+The most important conclusion I draw from this whole project is that the reward function is the most important part of the training process. With a reward function that is able to describe the policy or behavior that the robot has to learn, you will surely always get the desired results or, at least, results with a fairly high quality. Conversely, a poorly designed reward function will prevent good results from being obtained during the training process.
+
+## Improvements
+During training and as detailed in the post "4D DNQ", the computer suffers a lot because CoppeliaSim uses a large part of the RAM memory, and can even clog the computer. 
+
+In order to solve this problem, one idea would be to store every thousand episodes the target neural network in a file and load it in the next episode. Even if the training is paused when restarting CoppeliaSim is needed, the neural network will not start from scratch, but will have the values it had before restarting the simulator. 
+
+__Daniel Peix del Río__
diff --git a/gsoc/2022/posts/index.md b/gsoc/2022/posts/index.md
@@ -21,6 +21,8 @@ Mentors: Mario Haut, Pilar Bachiller
 3. [DQN algorithm](/web/gsoc/2022/posts/daniel_peix/3-DQN)
 4. [2D DQN: First approach](/web/gsoc/2022/posts/daniel_peix/4-2d_DQN)
 5. [3D DQN: Collisions](/web/gsoc/2022/posts/daniel_peix/5-3d_DQN)
+6. [4D DQN](/web/gsoc/2022/posts/daniel_peix/6-4d_DQN)
+7. [Conclusions and improvements](/web/gsoc/2022/posts/daniel_peix/7-Conclusions_and_improvements)
 
 ## Sushant Sreeram Swamy