GEMBench tasks visualizer
Real robot experiments
Seen task variations
We train our 3D-LOTUS model on a set of task in the real world robot and evaluate on the same tasks but different objects placement:"stack the yellow cup on top of the pink cup"
"put the yellow cup on top of the pink one"
"place the navy cup onto the yellow cup"
"pick up and set the navy cup down into the yellow cup"
"put the frog toy in the top part of the drawer"
"take the frog toy and put it in the top compartiment of the drawer"
"take the pink mug and put it on the middle part of the hanger"
"put the pink mug on the middle part of the hanger"
"put the strawberry in the box"
"take the strawberry and put it inside the box"
"put the peach in the box"
"take the peach and put it inside the box"
Unseen task variations
We then train our improved 3D-LOTUS++ model on the previous set of tasks and leverages an LLM and VLM models to generalize to unsen tasks variations by interacting with new objects and instructions:"put the banana in the box"
"take the banana and put it inside the box"
"put the lemon in the box"
"take the lemon and put it inside the box"
"put the tuna can in the box, then put the corn in the box"
"pick the tuna can and place it on the box, then place the corn in the box"
"put the grape on the yellow plate, then put the banana on the pink plate"
"put the grape on the yellow plate, then put the banana on the pink plate"
"stack the black cup into the orange one"
"pick the black cup and put it in the orange cup"
"keeping the yellow cup on the table, stack the red one onto it"
"pick the red cup and put it in the yellow cup"
"place the yellow cup inside the red cup, then the cyan cup on top"
"place the yellow cup inside the red cup, then the cyan cup on top"