ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

1 University of Science and Technology of China, 2 Shanghai Artificial Intelligence Laboratory,
3 The Chinese University of Hong Kong, 4 The University of Hong Kong

TL;DR: We propose ObjectGS, an object-aware Gaussian Splatting framework that unifies 3D scene reconstruction with semantic understanding in open-world scenes.



ObjectGS models individual objects as different sets of Gaussian primitives. Therefore, ObjectGS seamlessly supports object-independent rendering, scene editing, and mesh extraction.

Abstract

3D Gaussian Splatting is renowned for its high-fidelity reconstructions and real-time novel view synthesis, yet its lack of semantic understanding limits object-level perception. In this work, we propose ObjectGS, an object-aware framework that unifies 3D scene reconstruction with semantic understanding. Instead of treating the scene as a unified whole, ObjectGS models individual objects as local anchors that generate neural Gaussians and share object IDs, enabling precise object-level reconstruction. During training, we dynamically grow or prune these anchors and optimize their features, while a one-hot ID encoding with a classification loss enforces clear semantic constraints. We show through extensive experiments that ObjectGS not only outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks, but also integrates seamlessly with applications like mesh extraction and scene editing.

Target

In the open-world setting, ObjectGS enables 3D object awareness during reconstruction, allowing it to achieve highquality scene reconstruction and understanding simultaneously.

Motivation

(a) Considering semantic information during reconstruction can help better model objects. (b) Existing semantic modeling methods often lead to semantic ambiguity during alpha blending, whereas our classification-based semantic modeling eliminates this problem by independently accumulating the semantics of different objects through ID encoding.

Method Overview

The overall architecture of ObjectGS. We first use a 2D segmentation pipeline to assign object ID and lift it to 3D. Then we initialize the anchors and use them to generate object-aware neural Gaussians. To provide semantic guidance, we model the Gaussian semantics and construct classification-based constraints. As a result, our method enables both object-level and scene-level reconstruction.

Results

Rendered RGB images and 2D semantics



Comparisons of open-vocabulary segmentation results



Comparisons of 3D semantics (might take a few minutes to load)

BibTeX

BibTex Code Here