Wenyu Han, Haoran Wu, Eisuke Hirota, Alexander Gao, Lerrel Pinto, Ludovic Righetti, Chen Feng,
We propose to study a new learning task, mobile construction, to enable an agent to build designed structures in 1/2/3D grid worlds while navigating in the same evolving environments. Unlike existing robot learning tasks such as visual navigation and object manipulation, this task is challenging because of the interdependence between accurate localization and strategic construction planning. In pursuit of generic and adaptive solutions to this partially observable Markov decision process (POMDP) based on deep reinforcement learning (RL), we design a Deep Recurrent Q-Network (DRQN) with explicit recurrent position estimation in this dynamic grid world. Our extensive experiments show that pre-training this position estimation module before Q-learning can significantly improve the construction performance measured by the intersection-over-union score, achieving the best results in our benchmark of various baselines including model-free and model-based RL, a handcrafted SLAM-based policy, and human players.
We recommend user to create a virtual environment for running this project. We list details of the environment setup process as follows:
conda create -n my-conda-env python=3.7
conda activate my-conda-env
Note: pytorch is needed, so you need to install it based on your own system conditions. Here we use Linux and CUDA version 11.7 as an example.
pip3 install torch torchvision torchaudio
pip install -r requirements.txt
Our environment is developed based on the OpenAi Gym. You can simply follow the similar way to use our environment. Here we present an example for using 1D static task environment.
from DMP_Env_1D_static import deep_mobile_printing_1d1r ### you may need to find the path to this environment in [Env] folder
env = deep_mobile_printing_1d1r(plan_choose=2) ### plan_choose could be 0: sin, 1: Gaussian, and 2: Step curve
observation = env.reset()
fig = plt.figure(figsize=(5, 5))
ax = fig.add_subplot(1, 1, 1)
ax.clear()
for _ in range(1000):
action = np.random.randint(env.action_dim) # your agent here (this takes random actions)
observation, reward, done = env.step(action)
env.render(ax)
plt.pause(0.1)
if done:
break
plt.show()
All scripts for each method are in script/ folder where subfolder contains policies for 1D, 2D, and 3D tasks. You can find all hyperparameters used for each case in the config/ folder which has the same structure as script/ folder. The scripts for simulation environments are in Env/ folder. You can easily reproduce the experiments by running the algorithm scripts with its corresponding hyperparameters in the YML files. For example, if I want to train the DQN policy on 2D variable dense task:
cd script/DQN/2d/
python DQN_2d_dynamic.py ../../../config/DQN/2D/dynamic_dense.yml
We also provide a multiprocess script for batch simulation.
python multiprocess.py --env 1DStatic --plan_type 0 --num_envs 5
To cite our paper:
@inproceedings{
anonymous2023learning,
title={Learning Simultaneous Navigation and Construction in Grid Worlds},
author={Anonymous},
booktitle={Submitted to The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=NEtep2C7yD},
note={under review}
}
The research is supported by NSF CPS program under CMMI-1932187. The authors gratefully thank our human test participants and the helpful comments from Bolei Zhou, Zhen Liu, and the anonymous reviewers, and also Congcong Wen for paper revision.