ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting

Ruijie Zhu^1,2, Mulin Yu², Linning Xu³, Lihan Jiang^1,2, Yixuan Li³,
Tianzhu Zhang¹, Jiangmiao Pang², Bo Dai⁴

¹ University of Science and Technology of China, ² Shanghai Artificial Intelligence Laboratory,
³ The Chinese University of Hong Kong, ⁴ The University of Hong Kong

Paper arXiv Code

TL;DR: We propose ObjectGS, an object-aware Gaussian Splatting framework that unifies 3D scene reconstruction with semantic understanding in open-world scenes.

ObjectGS models individual objects as different sets of Gaussian primitives. Therefore, ObjectGS seamlessly supports object-independent rendering, scene editing, and mesh extraction.

Abstract

3D Gaussian Splatting is renowned for its high-fidelity reconstructions and real-time novel view synthesis, yet its lack of semantic understanding limits object-level perception. In this work, we propose ObjectGS, an object-aware framework that unifies 3D scene reconstruction with semantic understanding. Instead of treating the scene as a unified whole, ObjectGS models individual objects as local anchors that generate neural Gaussians and share object IDs, enabling precise object-level reconstruction. During training, we dynamically grow or prune these anchors and optimize their features, while a one-hot ID encoding with a classification loss enforces clear semantic constraints. We show through extensive experiments that ObjectGS not only outperforms state-of-the-art methods on open-vocabulary and panoptic segmentation tasks, but also integrates seamlessly with applications like mesh extraction and scene editing.

Target

In the open-world setting, ObjectGS enables 3D object awareness during reconstruction, allowing it to achieve highquality scene reconstruction and understanding simultaneously.

Motivation

(a) Considering semantic information during reconstruction can help better model objects. (b) Existing semantic modeling methods often lead to semantic ambiguity during alpha blending, whereas our classification-based semantic modeling eliminates this problem by independently accumulating the semantics of different objects through ID encoding.

Method Overview

The overall architecture of ObjectGS. We first use a 2D segmentation pipeline to assign object ID and lift it to 3D. Then we initialize the anchors and use them to generate object-aware neural Gaussians. To provide semantic guidance, we model the Gaussian semantics and construct classification-based constraints. As a result, our method enables both object-level and scene-level reconstruction.

Rendered RGB images and 2D semantics

LERF-Mask
3D-OVS

Comparisons of open-vocabulary segmentation results

Show Both
ObjectGS
Gaussian Grouping

Comparisons of 3D semantics (might take a few minutes to load)

BibTeX

@article{zhu2025objectgs,
        title={ObjectGS: Object-aware Scene Reconstruction and Scene Understanding via Gaussian Splatting}, 
        author={Ruijie Zhu and Mulin Yu and Linning Xu and Lihan Jiang and Yixuan Li and Tianzhu Zhang and Jiangmiao Pang and Bo Dai},
        journal={arXiv preprint arXiv:2507.15454},
        year={2025}
      }