Nilvana Vision Studio - Dataset Management

Please follow the instructions to manage the dataset with nilvana vision studio.

Dataset Management

The field of machine learning has a famous saying: Garbage in, garbage out. This conveys the important idea that the quality of the data is the most critical success factor, and you have to distinguish from the big data to find the ones that are right and valid. However, managing, differentiating, and sharing the data with your partners is not always easy. In Vision Studio, we help reduce the difficulty of managing and sharing data.

Dataset Adding

In addition to uploading the unannotated image data, we also support a variety of annotation formats to establish the dataset. You only need to upload the corresponding compressed data file according to our instructions, then you can load the annotated data in your hand and start enjoying the powerful functions provided by Vision Studio.

Dataset Merging

In many cases, you may need to cleanly break your data into several datasets, so that it can be flexibly used in the future. We provide the Merge Dataset tool that allows you to merge the required datasets according to the model training requirements. During the merging, the system will remove the duplicate data and save the annotated data. In addition to the Merge Dataset tool, you can also manage various experimental combinations. If you are not sure how to use this tool, please refer to "Nilvana Vision Studio - Dataset Versioning".

Dataset Sharing

Annotation is a time-consuming process in machine learning, and some partners will be needed to help with the annotation work, as well as reviewing the annotation results. In order to achieve real-time collaboration, you can add internal system members when creating the dataset or invite external members who have not yet registered. Once members join the dataset, they can work together to complete dataset annotation.

Dataset Statistics

As stated in the beginning, the quality of the data is the most critical success factor. We provide dataset statistics to help you identify the more influential data types. A histogram can show how many annotations are included in each image to see whether the objects you're annotating are concentrated in the context or just in a small number.

Most model training only accepts square image data. Size distribution and width-height distribution will allow you to examine the median of image size distribution and the ratio of image width and height distribution, which will help you decide how to appropriately adjust the image size. For example, the majority of images with a median of 500x375 and a high ratio of width to height distribution may represent that the height should be lower than 375 when adjusting the size. In this way, the data can be reduced to be overstretched or zoomed out, resulting in excessive distortion of the object and affecting the data quality. We provide the Preprocessing tool to help you quickly resize your images. If you are not sure how to use this tool, please refer to " Nilvana Vision Studio - Preprocessing".

Dataset statistics are an auxiliary tool, and this important basis for making decisions relies on how well you use these statistics. Our carefully developed Vision Studio can accelerate your experiment speed and help you find the most consistent data and model.

Still need help? Contact Us Contact Us