Skip to content

This repository is for comparing the prevailing adaptive control method in both control and learning communities.

License

Notifications You must be signed in to change notification settings

jc-bao/policy-adaptation-survey

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🚁 Policy Adaptation Survey

🤖 Environment

Quadrotor transportation Cartpole Hover Brax html
quadrotor cartpole hover image

🐣 Train

Torch-based environemnt

cd adaptive_control_gym/controller/rl
# Run the train function with the parsed arguments
python train.py \
    --use_wandb $USE_WANDB \
    --program $PROGRAM \
    --seed $SEED \
    --gpu_id $GPU_ID \
    --act_expert_mode $ACT_EXPERT_MODE \
    --cri_expert_mode $CRI_EXPERT_MODE \
    --exp_name $EXP_NAME \
    --compressor_dim $COMPRESSOR_DIM \
    --search_dim $SEARCH_DIM \
    --res_dyn_param_dim $RES_DYN_PARAM_DIM \
    --task $TASK \
    --resume_path $RESUME_PATH \
    --drone_num $DRONE_NUM \
    --env_num $ENV_NUM \
    --total_steps $TOTAL_STEPS \
    --adapt_steps $ADAPT_STEPS \
    --curri_thereshold $CURRI_THERESHOLD
  • use_wandb: A boolean flag indicating whether to use the Weights & Biases service for logging and visualization. Default is False.
  • program: A string specifying the name of the program. Default is 'tmp'.
  • seed: An integer specifying the random seed to use. Default is 1.
  • gpu_id: An integer specifying the ID of the GPU to use. Default is 0.
  • act_expert_mode: An integer specifying the expert mode for the actor network. Default is 0.
  • cri_expert_mode: An integer specifying the expert mode for the critic network. Default is 0.
  • exp_name: A string specifying the name of the experiment. Default is an empty string.
  • compressor_dim: An integer specifying the dimension of the compressor network. Default is 4.
  • search_dim: An integer specifying the dimension of the search network. Default is 0.
  • res_dyn_param_dim: An integer specifying the dimension of the residual dynamic parameter network. Default is 0.
  • task: A string specifying the task to perform. Can be 'track', 'hover', or 'avoid'. Default is 'track'.
  • resume_path: A string specifying the path to a saved checkpoint to resume training from. Default is None.
  • drone_num: An integer specifying the number of drones to use. Default is 1.
  • env_num: An integer specifying the number of environments to use. Default is 16384.
  • total_steps: An integer specifying the total number of training steps to perform. Default is 8e7.
  • adapt_steps: An integer specifying the number of adaptation steps to perform. Default is 5e6.
  • curri_thereshold: A float specifying the curriculum threshold. Default is 0.2.

Brax-based environemnt

cd adaptive_control_gym/envs/brax
python train_brax.py
python play_brax.py --policy-type ppo --policy-path '../results/params' # visualize

Examples

# train with RMA
python train.py --exp-name "TrackRMA"  --task track --act-expert-mode 1 --cri-expert-mode 1 --use-wandb  --gpu-id 0
# train Robust policy
python train.py --exp-name "TrackRobust"  --task track --use-wandb --gpu-id 0

🕹 Play with environment

# go to environment folder
cd adaptive_control_gym/envs

# run environment
python quadtrans.py 
    --policy_type pid  # 'random', 'pid'
    --task "avoid"  # 'track', 'hover', 'avoid'
    --policy_path 'ppo.pt' # for PPO only
    --seed 0
    --env_num 1
    --drone_num 1
    --gpu_id -1 # use CPU
    --enable_log  true # log parameter to csv and plot
    --enable_vis  true # use meshcat to visualize
    --curri_param 1.0 # 0.0 for simple case
task=aviod curri_param=1.0 task=aviod curri_param=0.0 task=track task=hover
image image image image

🐒 Policy

  • Classic
    • LQR
    • PID
    • MPC
  • RL
    • PPO
    • RMA

About

This repository is for comparing the prevailing adaptive control method in both control and learning communities.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published