Grasp, See and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior

Kechun Xu1     Zhongxiang Zhou1     Jun Wu1      Haojian Lu1      Rong Xiong1     Yue Wang
1Zhejiang University
TRO 2024
Paper Appendix Code Video

Grasp, See, and Place. The robot is given the initial and goal scenes for the task of object rearrangement. Aiming at improving task-level performance with perception noise, we first derive the decoupled structure by analysis. Guided by the decoupled prior, we incorporate human behavior and task-level rewards into the general framework of GSP. In general, GSP contains two loops: the inner loop actively sees the grasped object for high self-confident matching, and the outer loop conducts the grasp and place planning.


Abstract

We focus on the task of unknown object rearrangement, where a robot is supposed to re-configure the objects into a desired goal configuration specified by an RGB-D image. Recent works explore unknown object rearrangement systems by incorporating learning-based perception modules. However, they are sensitive to perception error, and pay less attention to task-level performance. In this paper, we aim to develop an effective system for unknown object rearrangement amidst perception noise. We theoretically reveal that the noisy perception impacts grasp and place in a decoupled way, and show such a decoupled structure is valuable to improve task optimality. We propose GSP, a dual-loop system with the decoupled structure as prior. For the inner loop, we learn a see policy for self-confident in-hand object matching. For the outer loop, we learn a grasp policy aware of object matching and grasp capability guided by task-level rewards. We leverage the foundation model CLIP for object matching, policy learning and self-termination. A series of experiments indicate that GSP can conduct unknown object rearrangement with higher completion rates and fewer steps.


Video


Overview

System Overview. Given the RGB-D images of the current and goal scenes, the grasp policy jointly considers object matching and candidate grasps to determine a selected grasp pose. After picking up an object, object matching is conducted between the grasped object and the goal objects. If the matching is self-confident, the object is rearranged to the planned place pose based on occupancy checking. Otherwise, active perception is triggered to predict the delta orientation of the end effector. Then the robot rotates the in-hand object to a new view until a confident matching is achieved. Overall, our method decomposes the object rearrangement process into two loops: an inner loop for see and an outer loop for grasp and place planning.

BibTeX

@article{xu2024grasp,
      title={Grasp, See and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior},
      author={Xu, Kechun and Zhou, Zhongxiang and Wu, Jun and Lu, Haojian and Wang, Yue and Xiong, Rong},
      journal={arXiv preprint arXiv:2402.15402},
      year={2024}
    }

Acknowledgements

We are very grateful to Zizhang Li for insightful discussions and proofreading. This work was supported by the National Key R&D Program of China (2023YFB4705001) and the National Nature Science Foundation of China under Grant 62173293.