As with any other Figure Eight job, it is recommended to do frequent, systematic, and thorough audits of the data. This is a must-do for estimating the accuracy of the data being fed into your algorithm and will highlight the areas of improvement for the job.
Follow the below step-by-step guide in annotating images.
Understand your data accuracy needs
For semantic segmentation jobs with an extensive ontology list, it may not be possible to audit every single class -- imagine checking 35 elements per image for 50 images: each audit would take days.
- Instead, it is recommended to identify the top three to five classes in the ontology. If training an autonomous vehicle, think pedestrians and road surface. If training an object classifier to identify kittens in scenes, audit the kittens and not the puppies.
- This allows the core classes to be audited for accuracy.
- Note: Define accuracy goals and metrics for those fields. For example, you may want to have 95% accuracy in annotating pedestrians, and 80% on annotating the sky.
Know where to go to see your finalized images
In a semantic segmentation job, the output data is a raster mask of the image that is not perfectly human-readable. The output data is encoded in a black to red image with each pixel representing a class in the ontology.
Visualizing the completed images in a finished or running job can be done via the Data page. This can only be done with units that have collected at least one judgment.
- The unit IDs are links that lead to a view state of the image with an annotation.
- In this view, the following settings are important when conducting the audit:
- The image and the annotation
- The ability to interact with the image by changing the opacity slider, erasing or modifying annotations, etc.
- The ability to pop out the tool using the green button in the top right to see the image and its annotation in full screen
Take a special note of the url on this page, it should look something like this:
Randomly sample 100 rows of recently finished data
First, determine the right sample size for the audit, but at a minimum, 30 images are recommended for each run of data to audit. There may be a need to use more images in each audit if the audited classes occur infrequently.
The fast and easy to conduct random sampling is:
- Download the full report from the job
- Create a new column next to the unit IDs and use the RAND function of Excel (or any other spreadsheet software) to generate a random number
- Order the sheet based on the random values in the new column
- Select the first 30-100 rows of unit IDs for the audit
- To set up the audit worksheet follow these steps:
- Paste the randomly sampled unit IDs to audit in Column A.
- Create a URL to the Data Page Unit Preview mentioned above for each unit to audit using the CONCAT function in Google Sheets (or Excel). Paste the first part of the URL which is always the same with the unit IDs.
- Once there’s a link to each unit that is being audited, create a column for every field in the ontology that is being audited.
- It is recommended to create a “comments” column to jot down any notes on that unit.
Conduct the audit
Now score every image on every class you want to audit on and assign the following scores:
- “0” - Giving a zero is appropriate when
- The class was not annotated or misclassified (e.g.: they marked the kitten as a dog)
- There are many instances of the class (many leaves on the tree) and fewer than half were correct
- “1” - Giving a perfect score of 1 is appropriate when nothing was missed, and the annotation was done cleanly and precisely. In other words, you wouldn’t have done anything differently if you had been annotating the image yourself.
- “.1” to “.9” - feel free to assign partial credit when you’re not sure. For example, if there are two trees, and one was done perfectly and the other not as well, you may give a 0.5, indicating 50% accuracy on that class
You can calculate accuracy for each field by averaging all the values in each column you are scoring. You can also average all the scores for each image, so you can see the count of perfect or imperfect images.
There are many other metrics you can derive by building off this framework of auditing. For example:
- Recall - Percentage of entities that should be labeled that were labeled.
- Precision - Percentage of correctly-labeled entities