HRL-BC

Learning long-horizon robotic manipulations

Learning manipulation tasks


Manipulation tasks such as preparing a meal or assembling furniture remain highly challenging for robotics and vision. The supervised approach of imitation learning can handle short tasks but suffers from compounding errors and the need of many demonstrations for longer and more complex tasks. Reinforcement learning (RL) can find solutions beyond demonstrations but requires tedious and task-specific reward engineering for long-horizon, multi-step problems. In this work we address the difficulties of both methods and explore their hierarchical combination. To this end, we propose HRL-BC, a hierarchy of RL policies operating on pre-trained skills, that can learn long-horizon manipulations using no intermediate rewards and no demonstrations of full tasks. Our method allows efficient training of basic skills from few visual demonstrations by taking advantage of recent CNN architectures and data augmentation. Combining our hierarchical approach with the learned basic skills, we manage to learn long-term policies for manipulation tasks such as making a simple breakfast.

The policies have been trained using MImE, a simple interface based on pybullet simulator that provides tools to create complex manipulation tasks with a UR5 robotic arm. It is composed of 4 environments ranging from simple to more complex tasks: UR5-Pick, UR5-Pour, UR5-Bowl and UR5-Breakfast.

MImE Environments


UR5-Pick
UR5-Pour
UR5-Bowl
UR5-Breakfast

MImE is composed of 4 robotic environments for manipulation. In every environments, the agent controls the robot end-effector and observes the environment through a camera placed in front of the robot. The goal of the agent is to output the correct sequence of actions to perform the task at hand. In UR5-Pick, a cube is on the table and the goal is to grasp a cube and lift it in the air. In UR5-Pour, a bottle is chosen among a set of Shapenet bottles and set on the table. The agent has to pour the bottle content into a bowl without spilling drops. In UR5-Bowl, a cube and a bowl are on the table, the agent has to put the cube into the bowl. In UR5-Breakfast, the goal is to prepare a simple breakfast, a bottle and a cup are on the table and the goal is to pour the two containers in the bowl without spilling drops.

Download


Data

Coming soon ...

Paper

Coming soon ...

Code

Coming soon ...