UCB_MARL

Note: All code and demonstrations are used for the submitted paper:

Ruyu Luo, Wanli Ni, Hui Tian, Julian Cheng, and Kwang-Cheng Chen, "Joint Trajectory and Radio Resource Optimization by Multi-Agent Reinforcement Learning for Autonomous Mobile Robots in Industrial Internet of Things", submitted to IEEE TCom, Dec. 2022.

In this paper, we present the simulation codes of multi-agent reinforcement learning (MARL) with upper-confidence bound (UCB) exploration.

Parameter Settings

Here are the setting of our simulations.

Notation	Simulation Value	Physical Meaning
$M_k$	$3$	the number of SNs in each group
$x _{\max} $	$30 {\ \rm m}$	the maximal x-axis size of the moving area
$y _{\max} $	$30 {\ \rm m}$	the maximal y-axis size of the moving area
$H_0$	$1 {\ \rm m}$	the antenna height of robots
$H_m$	${0, 1, 2}$	the antenna height of SNs
$\sigma^2$	$-100 \ {\rm dBm}$	the power of the AWGN
$P_{\max}$	$23 \ {\rm dBm}$	the maximum transmit power
$\beta_{0}$	$-30 \ {\rm dB}$	the large-scale channel power gain at the reference distance $d_0 = 1 \ {\rm m} $
$\alpha$	$2.2$	the path loss exponent
$G$	$10 \ {\rm dB}$	the Rician factor
$\Delta_{s}$	$1.5 \ {\rm m}$	the grid size

Representative visualization results

The visualization of the proposed MARL can be seen in Visual_MARL.

Here are four demonstrations for different stages in the MARL training process.
- the beginning of training
- 800 rounds of training
- 1600 rounds of training
- the end of training

Source Codes

Here is a simple introduction to the code used in our paper.

Figure_1_reward_comparison (Reward comparison between different algorithms)
- Centralized_QL
  - RL_brain.py: Centralized tabular Q-learning agent with e-greedy exploration
  - main.py: Main code of two robots that train with global information, connections between the environment and learning agents without experience exchange
- UCB_MARL_environment1
  - RL_brain.py: One learning agent with upper-confidence bound (UCB) exploration
  - main.py: Main code of four robots, connections between the environment and learning agents, where two robots without interference at the strictly same environment exchange experience.
- UCB_MARL_environment2
  - RL_brain.py: One learning agent with upper-confidence bound (UCB) exploration
  - main.py: Main code of four robots, connections between the environment and learning agents, where two nearby robots with interference at the same SN deployment exchange experience.
Figure_2_convergence_comparison (Convergence comparison between different $H$)
- UCB_MARL
  - RL_brain.py: One learning agent with upper-confidence bound (UCB) exploration
  - main.py: Main code of six robots, connections between the environment and learning agents
- e-greedy_MARL
  - RL_brain.py: Tabular Q-learning agent with e-greedy exploration
  - main.py: Main code of two robots that train locally without experience exchange, connections between the environment and learning agents
Figure_3_robot_trajectory (Robot trajectory under different $\kappa$)
- RL_brain.py: One learning agent with upper-confidence bound (UCB) exploration
- plot_figure.py: The trajectories with different reward policy
- [robot trajectory.py](https://github.com/lry-bupt/UCB_MARL/blob/main/Figure_3_robot_trajectory/robot trajectory.py): Main code of four robots with experience exchange, connections between the environment and learning agents
Figure_4_relation_between_R_T (Relation between average sum rate and arrival time)
- RL_brain.py: One learning agent with upper-confidence bound (UCB) exploration
- main.py: Main code of two robots with experience exchange, connections between the environment and learning agents
Figure_5_R_versus_P (Convergence comparison between different $H$)
- NOMA
  - RL_brain.py: One learning agent with upper-confidence bound (UCB) exploration
  - main.py: Main code of robots with experience exchange, connections between the environment and learning agents that communicate with non-orthogonal multiple access (NOMA)
- OMA
  - RL_brain.py: One learning agent with upper-confidence bound (UCB) exploration
  - main.py: Main code of robots with experience exchange, connections between the environment and learning agents that communicate with orthogonal multiple access (OMA)

References

[1] R. Luo, W. Ni and H. Tian, "Visualizing Multi-Agent Reinforcement Learning for Robotic Communication in Industrial IoT Networks," admitted by IEEE INFOCOM Demo, Mar. 2022.

[2] R. Luo, H. Tian and W. Ni, “Communication-Aware Path Design for Indoor Robots Exploiting Federated Deep Reinforcement Learning,” in Proc. IEEE PIMRC, Helsinki, Finland, Sept. 2021, pp. 1197-1202.

[3] C. Jin et al., “Is Q-learning Provably Efficient?” in Proc. NeurIPS, Montr´eal, Canada, Dec. 2018, pp. 4868-4878.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UCB_MARL

Parameter Settings

Representative visualization results

Source Codes

Figure_1_reward_comparison (Reward comparison between different algorithms)

Figure_2_convergence_comparison (Convergence comparison between different $H$)

Figure_3_robot_trajectory (Robot trajectory under different $\kappa$)

Figure_4_relation_between_R_T (Relation between average sum rate and arrival time)

Figure_5_R_versus_P (Convergence comparison between different $H$)

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
Figure_1_reward_comparison		Figure_1_reward_comparison
Figure_2_convergence_comparison		Figure_2_convergence_comparison
Figure_3_robot_trajectory		Figure_3_robot_trajectory
Figure_4_relation_between_R_T		Figure_4_relation_between_R_T
Figure_5_R_versus_P		Figure_5_R_versus_P
README.md		README.md

niwanli/UCB_MARL

Folders and files

Latest commit

History

Repository files navigation

UCB_MARL

Parameter Settings

Representative visualization results

Source Codes

Figure_1_reward_comparison (Reward comparison between different algorithms)

Figure_2_convergence_comparison (Convergence comparison between different $H$)

Figure_3_robot_trajectory (Robot trajectory under different $\kappa$)

Figure_4_relation_between_R_T (Relation between average sum rate and arrival time)

Figure_5_R_versus_P (Convergence comparison between different $H$)

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages