Fueling the AI Revolution: Scalable 3DGS Data Collection for Machine Learning

By Yaroslav Parkhisenko

The landscape of digital world generation is shifting rapidly. Since 2024, TVIS has been at the forefront of this evolution, pioneering mass data collection using the 3D Gaussian Splatting (3DGS) format.

This technology has quickly become a standard for video content and world generation, robots and self driving cars systems training. From tech giants using it to map the real world, to innovators generating synthetic environments, the demand for high-quality, real-world data to feed these models is skyrocketing.

At TVIS, we have developed a scalable system to meet this demand measured in thousands of scenes a month. Here is a look at how we are capturing the world for the next generation of Machine Learning (ML).

High-Fidelity Data at Speed

To train robust world-generation models, you need a massive volume of scenes. Traditional methods often struggle to keep up with the pace required for mass collection.

Our approach enables the rendering of significant surface areas in 3D with six degrees of freedom (6DoF) navigation. By combining advanced scanning technology with use of laser XGRIDS scanners and drones with 8k cameras that fly semi-automatically, we can generate scenes significantly faster than purely photogrammetric DSLR camera-based methods. This efficiency is critical for gathering the quantity of representative scenes required for effective machine learning.

Bridging the Gap: Real World vs. User Demand

A major challenge in data collection for ML is that what users want to generate doesn’t always exist in abundance in the real world.

For example, popular prompts for video generation often include “Cyberpunk City Street at Night,” “Abandoned Industrial Factory,” or “Post-Apocalyptic Street”. While we cannot scan a fictional cyberpunk city, we explicitly factor in this gap between real-world availability and user demand.

Our data collection strategy focuses on capturing nineteen categories of scenes in various countries across the continents. It would feed these concepts, covering a wide range of environments from industrial ruins and historical architecture to vast natural landscapes.

Global Scale, Local Precision

To date, TVIS has achieved a collection velocity of thousands of scenes per month, driven by our proprietary automation software. This system leverages LLMs and computer vision to streamline operations, while a real-time online dashboard allows us to monitor collection progress instantly.

We mobilize geographically distributed teams across Europe, Asia, and the Americas to execute this vision. To manage such massive scale effectively, we utilize a rigorous scouting and site reservation process, maintaining the high discipline required to meet and exceed customer expectations.

Our collection process is governed by strict quality controls to ensure high-quality results. We ensure equal representation of real-world objects to create balanced training databases.

Rich Metadata for Better Models

Data without context is limited in its utility for machine learning. We provide a standardized method for describing every scene we collect to facilitate model fine-tuning.

Each scene is delivered with rich attributes, including:

Categorization: Detailed breakdowns of object types, from modern streets to rural farmhouses.
Lighting Conditions: Critical descriptors such as “overcast,” “hard light,” or “at dusk” to aid in rendering accuracy.
Automated & Verified Tagging: We leverage advanced tools to generate detailed descriptions and tags, which are then manually verified for accuracy.

Partnering for the Future

Whether for virtual production in LED studios or training the next large world model, TVIS provides the raw material necessary to build digital realities. Also we generate 3DGS from our source data. We work closely with top level clients to develop product requirements, perform test collections, and ensure that our data represents the diversity of cultures and environments found in the real world.

Interested in learning more about our infrastructure? Watch our overview here.

TVIS: Large 3DGS data vendor