Unity Computer Vision Datasets (Retail sample)

May 12, 2021 Natalia De Los Rellez

This Sample Retail Dataset is a small example of the synthetic data offered with our Unity Computer Vision Datasets offering. Our team of experts works with customers worldwide to generate custom datasets at any scale, tailored to their specific requirements.

This dataset is an example of what a retailer could capture using a camera or robotic system for shelf assessment.

How do I get the sample dataset?

Fill out this form to download this sample dataset. It includes:

  • 1,000 synthetic RGB images
  • 1,000 synthetic images with instance segmentation
  • 1,000 synthetic images with semantic segmentation
  • JSON metadata for each RGB image to locate 2D and 3D bounding boxes

Read on for more information about this sample dataset.

What can be done with a full retail synthetic dataset?

This type of dataset may be used for a range of computer vision applications in retail, such as:

  • Verifying planogram-based product placement
  • Assessing stock depletion
  • Checking shelf tag positioning
  • Assessing product organization

How are the images labeled?

All objects in the Unity scene have known 2D coordinates in the image. The Unity Perception Package takes advantage of the predetermined environment layout to label objects in various ways, simultaneously as the images are being generated. This example includes images that show 2D and 3D bounding boxes as well as instance and semantic segmentation.

RGB imagery

This RGB image shows the visual quality and accuracy possible with Unity Computer Vision Datasets.

RGB image

2D bounding boxes

The 2D Bounding Boxes precisely locate and label objects in screen space for recognition.

2D bounding boxes

3D bounding boxes

The 3D Bounding Boxes provide precise coordinates in world space of object locations.

3D bounding boxes

Instance segmentation

This image shows the instance segmentation of the dataset, where every labeled object is uniquely identified.

Instance segmentation

Semantic segmentation

Semantic segmentation provides a clear and precise mask to identify every instance of a class of objects, such as bags of certain chips or all of the boxes of a certain type of cereal.

Semantic segmentation

What parameters are used to help diversify the scene?

In order to generate a diverse dataset, multiple parameters are adjusted automatically in order to provide variety in the scene.

Variations in shelving

Tan shelving

Dark shelving

Variations in lighting

Bright lighting

Dim lighting

Camera position variations

Centered and straight

Low and angled

Variations in product facing

Forward facing

Backwards facing

What is included in the JSON file?

The dataset includes labeling based on a standard COCO format. Additionally, the Perception Package used for this dataset generation outputs additional JSON files that include metadata describing camera intrinsics, formatting, and labeling such as the overlaying of 2D and 3D bounding boxes, plus object count and other reference metrics. Details of the format can be found on our Synthetic Dataset Schema page.

How long did it take to generate this dataset?

Start to finish, this project took a few weeks to complete if you include the time it took to create all the assets. Generating the actual 1,000-frame dataset took less than 5 minutes.

What can I do with this dataset?

This is a sample of what a full dataset would be and is likely not able to be used to train a functioning machine learning model. It will give you the following:

  • Confidence in the visual quality possible with Unity-generated datasets
  • Insight into the included metadata and segmentation imagery
  • Ability to experiment with ingesting synthetic data into a machine learning pipeline


The following sections dive into the details on asset creation in Unity.

What types of assets were used to create this dataset?

We used grocery assets that were mostly collected previously for our SynthDet project. In total, there are 63 diverse 3D models that are placed on the shelves. These assets are based off of real-world items that would be found in a retail store.

Examples of 3D assets used in this dataset

We also created custom price tags for every product that accurately reflect the item they are paired with, and each has a unique UPC barcode. This is important as many retail applications of computer vision assess planogram placement accuracy, and correlate between visual data and scanned barcodes.

Example of a price tag

Finally the scene includes custom shelving that was created in Autodesk Maya and brought into Unity for a realistic looking store section.

Scene setup showing lighting and camera placement

How did these assets get into Unity?

There are a few ways that the assets used in this dataset were brought into Unity. First, there are purely virtual models. These models were designed on the computer and do not have a real-world counterpart. For the price tags, Adobe Illustrator was used to create the price tags from scratch and a barcode generator was used to make accurate barcodes. As previously mentioned, the shelving was made in Autodesk Maya and imported into Unity.

For objects that have a real-world counterpart, such as all the retail items, these were brought into Unity using a couple different methods based on the complexity of the object. For simple rectangular objects with flat surfaces, a flatbed scanner was used to scan each side of the object. When it came to more complex animals, our internal solutions team was able to use a photogrammetry solution to accurately create 3D models of these objects. 

Asset production

How were these assets placed in the scene?

Products are placed on the shelves using a custom placement randomizer which checks for the upward facing shelf surfaces. As each shelf is a separate object, the randomizer determines product placement by the shelf position and acceptable surface. This controls product fronting and straightening, misplaced products, spacing, and associated shelf tag placement.

Once the products are placed on the shelves, each frame is captured by the camera. The camera randomizes within a set volume, shown below in transparent yellow. This varies the position, rotation, angle, height, and focal distance of the camera, while still focusing on the shelving.

Camera randomization area

Get started with synthetic data today! Learn more about Unity Computer Vision or contact us to talk to our computer vision experts about purchasing a bespoke dataset of your own.

Return to top

Previous Article
Unity Computer Vision Datasets (Home interior sample)
Unity Computer Vision Datasets (Home interior sample)

Get a free synthetic data sample, generated using Unity Computer Vision.

Next Video
AI for social good Fast-tracking COVID-19 and malaria testing with synthetic data
AI for social good Fast-tracking COVID-19 and malaria testing with synthetic data

Gain insight into how Audere, a digital health nonprofit, is using synthetic data to confront challenges an...


Download retail sample dataset today

First Name
Last Name
Job Title
I understand that by checking this box, I agree to have Marketing Activities directed to me by Unity
I acknowledge the Unity Privacy Policy
I have read and agree to the Unity Terms of Service
Thank you!
Error - something went wrong!