The Re-entanglement Dual of Disentanglement for Scalability

We have published two works on 6D pose estimation of rigid objects from RGB images (EEGP-AAE, DISP6D). While the task of 6D pose estimation is very useful in fields like robotics and AR/VR, we have made some interesting observations technically that should generalize to other domains/tasks.

In particular, the scalability problem demands processing many objects with one model, and processing novel objects not seen during model training. A natural way for scalability is through disentanglement, where the model learns to uncover the true dimensions of data and therefore can scale/extrapolate to new samples within the parametric domain spanned by the dimensions. In the case of 6D pose estimation, disentanglement means that given an image depicting an object, the dimensions of object shape/appearance and 6D pose should be uncovered, so that multiple objects with similar poses can be telled apart, and novel objects can be processed by referring to similar objects seen during training.

In most practical situations, however, disentanglement into canonical and orthogonal dimensions is impossible, as the dimensions are entangled in very complex manners such that one distorts the other. For example, in the case of 6D pose, a bottle without handle and a cup with handle have different symmetries that equalize object poses differently; similarly in the case of space-time, heavy objects distort the four-dimensions differently than light objects.

While the process of disentanglement seems too complex, the key idea we propose is to learn the (re-)entanglement process, and search for the best parameters along dimensions whose entangled result matches the observation. In the case of 6D pose, we search along the pose space and entangle it with object shape to produce the object-distorted pose closest to observation. Furthermore, entanglement is a process well-studied in physics and ML. We have taken a simple tensor-product model to implement entanglement.

This dual approach to disentanglement by re-entanglement has enabled scalable processing of novel objects across categories in DISP6D. It is likely that the idea can be helpful in other tasks involving disentanglement learning as well.