This Home Interior Dataset is a small example of the synthetic data provided by our Unity Computer Vision Datasets offering. Our team of experts works with customers worldwide to generate custom datasets at any scale, tailored to their specific requirements.
This dataset is an example of what a consumer electronics or smart home technology company could capture using a camera or robotic system.
How do I get the sample dataset?
Click the button below to download this sample dataset.
It features two different types of homes (multi-level townhome and traditional two-story home) with major living areas, kitchens and bedrooms. The dataset includes:
- 1,000 synthetic RGB images
- 1,000 synthetic images with instance segmentation
- 1,000 synthetic images with semantic segmentation
- JSON metadata for each RGB image to locate 2D and 3D bounding boxes
- Unity Computer Vision Dataset Visualizer, a Python-based tool that allows you to visualize datasets created using Unity Computer Vision tools
What can be done with a full home synthetic dataset?
This type of dataset may be used for a range of computer vision applications for smart homes, such as:
- Detecting animals, people and other items on smart cameras
- Home security
- Interior navigation for robots
- Smart appliances (e.g., refrigerators) and smart home hubs
- Interior design recommendations
How are the images labeled?
All objects in the Unity scene have known 2D coordinates in the image. The Unity Perception Package takes advantage of the predetermined environment layout to label objects in various ways, simultaneously as the images are being generated. This example includes images that show 2D and 3D bounding boxes as well as instance and semantic segmentation.
This RGB image shows the visual quality and accuracy possible with Unity Computer Vision Datasets.
2D bounding boxes
The 2D Bounding Boxes precisely locate and label objects in screen space for recognition.
2D bounding boxes
3D bounding boxes
The 3D Bounding Boxes provide precise coordinates in world space of object locations.
3D bounding boxes
This image shows the instance segmentation of the dataset, where every labeled object is uniquely identified.
Semantic segmentation provides a clear and precise mask to identify every instance of a class of objects, such as types of chairs or furniture.
What parameters were used to help diversify the scene?
In order to generate a diverse dataset, multiple parameters are adjusted automatically in order to provide variety in the scene. The house plans were selected to provide a wide variety in spatial configurations, windows and doors, ceiling heights, stairs, kitchen layouts and lighting.
Afternoon sun angle
Sunset sun angle
Camera position variations
Centered and straight
Low and angled
What is included in the JSON file?
The dataset includes labeling based on a standard COCO format. Additionally, the Perception Package used for this dataset generation outputs additional JSON files that include metadata describing camera intrinsics, formatting, and labeling such as the overlaying of 2D and 3D bounding boxes, plus object count and other reference metrics. Details of the format can be found on our Synthetic Dataset Schema page.
What can I do with the dataset visualizer?
Unity Computer Vision Dataset Visualizer is a Python-based tool that allows you to visualize and explore datasets created using Unity Computer Vision tools.
The main features include:
- Ability to easily switch datasets by selecting a dataset folder
- Grid view of all frames in the dataset with the ability to change zoom level
- Individual frame view along with the JSON data associated with each frame
- Labeler (ground truth) overlay on frames, both in the grid view and individual frame view. Supported types of ground truth include 2D and 3D bounding boxes, semantic and instance segmentation, and keypoints.
- Ability to turn each type of ground truth overlay on or off
Requirements for Dataset Visualizer
- Windows 10 or OSX
- Chrome, Firefox or Safari 14 and newer (older versions of Safari are not supported)
- Python 3.7 or 3.8 (Note: This application is not compatible with Python 3.9)
How long did it take to generate this dataset?
Generating the actual 1,000-frame dataset took less than 8 minutes.
What can I do with this dataset?
This is a sample of what a full dataset would be and does not have the quantity or diversity of images required to incorporate in a production machine learning model. It will give you the following:
- Confidence in the visual quality possible with Unity-generated datasets
- An example of the labels we can generate from the data
- Ability to experiment with ingesting synthetic data into a machine learning pipeline
What types of assets were used to create this dataset?
The content team modeled the houses from scratch with all components set up for domain randomization, including interior/exterior doors, kitchen cabinetry and appliances, windows, and even the wall paint.
We licensed the furniture from content partners and prepared it for labeling and domain randomization.
How did these assets get into Unity?
These are purely virtual models. Most of the modeling was done in Autodesk Maya with materials created in Adobe Substance Designer.
How were these assets placed in the scene?
The house interior was assembled like a typical Unity scene as a collection of imported meshes and Prefabs. Furniture is placed using a structured grammar-based placement system, developed by Unity’s computer vision engineers. The project uses the High-Definition Render Pipeline (HDRP) with a combination of real-time and baked lighting.
Everything in the house was set up with labels and randomizers from the Unity Computer Vision Perception Package.
Door Prefab, showing components, materials, and randomization