We focus on the task of unknown object rearrangement, where a robot is supposed to re-configure the objects into a desired goal configuration specified by an RGB-D image. Recent works explore unknown object rearrangement systems by incorporating learning-based perception modules. However, they are sensitive to perception error, and pay less attention to task-level performance. In this paper, we aim to develop an effective system for unknown object rearrangement amidst perception noise. We theoretically reveal that the noisy perception impacts grasp and place in a decoupled way, and show such a decoupled structure is valuable to improve task optimality. We propose GSP, a dual-loop system with the decoupled structure as prior. For the inner loop, we learn a see policy for self-confident in-hand object matching. For the outer loop, we learn a grasp policy aware of object matching and grasp capability guided by task-level rewards. We leverage the foundation model CLIP for object matching, policy learning and self-termination. A series of experiments indicate that GSP can conduct unknown object rearrangement with higher completion rates and fewer steps.
System Overview. Given the RGB-D images of the current and goal scenes, the grasp policy jointly considers object matching and candidate grasps to determine a selected grasp pose. After picking up an object, object matching is conducted between the grasped object and the goal objects. If the matching is self-confident, the object is rearranged to the planned place pose based on occupancy checking. Otherwise, active perception is triggered to predict the delta orientation of the end effector. Then the robot rotates the in-hand object to a new view until a confident matching is achieved. Overall, our method decomposes the object rearrangement process into two loops: an inner loop for see and an outer loop for grasp and place planning.
@article{xu2024grasp,
title={Grasp, See and Place: Efficient Unknown Object Rearrangement with Policy Structure Prior},
author={Xu, Kechun and Zhou, Zhongxiang and Wu, Jun and Lu, Haojian and Wang, Yue and Xiong, Rong},
journal={arXiv preprint arXiv:2402.15402},
year={2024}
}
We are very grateful to Zizhang Li for insightful discussions and proofreading. This work was supported by the National Key R&D Program of China (2023YFB4705001) and the National Nature Science Foundation of China under Grant 62173293.