This Sample Retail Dataset is a small example of the synthetic data offered with our Unity Computer Vision Datasets offering. Our team of experts works with customers worldwide to generate custom datasets at any scale, tailored to their specific requirements.
This dataset is an example of what a retailer could capture using a camera or robotic system for shelf assessment.
How do I get the sample dataset?
Fill out this form to download this sample dataset. It includes:
- 1,000 synthetic RGB images
- 1,000 synthetic images with instance segmentation
- 1,000 synthetic images with semantic segmentation
- JSON metadata for each RGB image to locate 2D and 3D bounding boxes
Read on for more information about this sample dataset.
What can be done with a full retail synthetic dataset?
This type of dataset may be used for a range of computer vision applications in retail, such as:
- Verifying planogram-based product placement
- Assessing stock depletion
- Checking shelf tag positioning
- Assessing product organization
How are the images labeled?
All objects in the Unity scene have known 2D coordinates in the image. The Unity Perception Package takes advantage of the predetermined environment layout to label objects in various ways, simultaneously as the images are being generated. This example includes images that show 2D and 3D bounding boxes as well as instance and semantic segmentation.
This RGB image shows the visual quality and accuracy possible with Unity Computer Vision Datasets.
2D bounding boxes
The 2D Bounding Boxes precisely locate and label objects in screen space for recognition.
2D bounding boxes
3D bounding boxes
The 3D Bounding Boxes provide precise coordinates in world space of object locations.
3D bounding boxes
This image shows the instance segmentation of the dataset, where every labeled object is uniquely identified.
Semantic segmentation provides a clear and precise mask to identify every instance of a class of objects, such as bags of certain chips or all of the boxes of a certain type of cereal.
What parameters are used to help diversify the scene?
In order to generate a diverse dataset, multiple parameters are adjusted automatically in order to provide variety in the scene.
Variations in shelving
Variations in lighting
Camera position variations
Centered and straight
Low and angled
Variations in product facing
What is included in the JSON file?
The dataset includes labeling based on a standard COCO format. Additionally, the Perception Package used for this dataset generation outputs additional JSON files that include metadata describing camera intrinsics, formatting, and labeling such as the overlaying of 2D and 3D bounding boxes, plus object count and other reference metrics. Details of the format can be found on our Synthetic Dataset Schema page.
How long did it take to generate this dataset?
Start to finish, this project took a few weeks to complete if you include the time it took to create all the assets. Generating the actual 1,000-frame dataset took less than 5 minutes.
What can I do with this dataset?
This is a sample of what a full dataset would be and is likely not able to be used to train a functioning machine learning model. It will give you the following:
- Confidence in the visual quality possible with Unity-generated datasets
- Insight into the included metadata and segmentation imagery
- Ability to experiment with ingesting synthetic data into a machine learning pipeline
The following sections dive into the details on asset creation in Unity.
What types of assets were used to create this dataset?
We used grocery assets that were mostly collected previously for our SynthDet project. In total, there are 63 diverse 3D models that are placed on the shelves. These assets are based off of real-world items that would be found in a retail store.
Examples of 3D assets used in this dataset
We also created custom price tags for every product that accurately reflect the item they are paired with, and each has a unique UPC barcode. This is important as many retail applications of computer vision assess planogram placement accuracy, and correlate between visual data and scanned barcodes.
Example of a price tag
Finally the scene includes custom shelving that was created in Autodesk Maya and brought into Unity for a realistic looking store section.
Scene setup showing lighting and camera placement
How did these assets get into Unity?
There are a few ways that the assets used in this dataset were brought into Unity. First, there are purely virtual models. These models were designed on the computer and do not have a real-world counterpart. For the price tags, Adobe Illustrator was used to create the price tags from scratch and a barcode generator was used to make accurate barcodes. As previously mentioned, the shelving was made in Autodesk Maya and imported into Unity.
For objects that have a real-world counterpart, such as all the retail items, these were brought into Unity using a couple different methods based on the complexity of the object. For simple rectangular objects with flat surfaces, a flatbed scanner was used to scan each side of the object. When it came to more complex animals, our internal solutions team was able to use a photogrammetry solution to accurately create 3D models of these objects.
How were these assets placed in the scene?
Products are placed on the shelves using a custom placement randomizer which checks for the upward facing shelf surfaces. As each shelf is a separate object, the randomizer determines product placement by the shelf position and acceptable surface. This controls product fronting and straightening, misplaced products, spacing, and associated shelf tag placement.
Once the products are placed on the shelves, each frame is captured by the camera. The camera randomizes within a set volume, shown below in transparent yellow. This varies the position, rotation, angle, height, and focal distance of the camera, while still focusing on the shelving.
Camera randomization area