Deep Reinforcement Learning and Imitation Learning with sim2sim exploration and behavior cloning

The PDF containing the assignment, along with detailed result explanations, will be published soon.

This repository contains the code and models for a "Deep Reinforcement Learning Assignment (University MSc)". The assignment explores deep reinforcement learning (DRL) techniques in sim2sim and imitation learning.

Simulation of the Ant-v4 model, that was trained using PPO with the following hyperparameters: batch size of 256, gamma value of 0.99, and 900,000 timesteps.

Part 1 - SIM2SIM - GENERALIZATION OF TRAINED POLICIES TO DIFFERENT ENVIRONMENT DYNAMICS

Part 1 focuses on generalizing trained policies to different environment dynamics, in this case: torso mass, using the Proximal Policy Optimization (PPO) algorithm. Hyperparameter tuning with Optuna is used for effective exploration. And HuggingFace (2023) state-of-the-art baseline hyperparameters are tested.

Part 2 - SIM2SIM - IMITATION LEARNING - LEARNING FROM EXPERT DEMONSTRATIONS / BEHAVIOR CLONING

Part 2 investigates behavior cloning (BC) for imitation learning, where a DRL agent is trained on the "Ant-v4" Gym environment using PPO. The impact of expert data and policy network size on BC agent performance is analyzed. The results highlight the importance of sim2sim and imitation learning in training robust policies that generalize well. This work contributes to advancing DRL understanding and offers insights into the optimization and generalization of sim2sim and imitation learning.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
env		env
figures		figures
imitation		imitation
models		models
mujoco		mujoco
requirements		requirements
LICENSE		LICENSE
README.md		README.md
logo.jpeg		logo.jpeg
part1_task1.py		part1_task1.py
part1_task1_render.py		part1_task1_render.py
part1_task2.py		part1_task2.py
part1_task3.py		part1_task3.py
part2_task1.py		part2_task1.py
part2_task1_render.py		part2_task1_render.py
part2_task2.py		part2_task2.py
part2_task3.py		part2_task3.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Reinforcement Learning and Imitation Learning with sim2sim exploration and behavior cloning

Simulation of the Ant-v4 model, that was trained using PPO with the following hyperparameters: batch size of 256, gamma value of 0.99, and 900,000 timesteps.

Part 1 - SIM2SIM - GENERALIZATION OF TRAINED POLICIES TO DIFFERENT ENVIRONMENT DYNAMICS

Part 2 - SIM2SIM - IMITATION LEARNING - LEARNING FROM EXPERT DEMONSTRATIONS / BEHAVIOR CLONING

About

Releases

Packages

Languages

License

basverkennis/DRL-PPO-sim2sim-imitationlearning

Folders and files

Latest commit

History

Repository files navigation

Deep Reinforcement Learning and Imitation Learning with sim2sim exploration and behavior cloning

Simulation of the Ant-v4 model, that was trained using PPO with the following hyperparameters: batch size of 256, gamma value of 0.99, and 900,000 timesteps.

Part 1 - SIM2SIM - GENERALIZATION OF TRAINED POLICIES TO DIFFERENT ENVIRONMENT DYNAMICS

Part 2 - SIM2SIM - IMITATION LEARNING - LEARNING FROM EXPERT DEMONSTRATIONS / BEHAVIOR CLONING

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages