Test Questions are rows with known answers and are randomly inserted throughout your job. There are methods for introducing varied levels of leniency and customization on a per test question basis within the Text Annotation tool. Things to note:
- Annotate the text according to the rules specified in the job's instructions.
- If the data is pre-annotated and is wrong, correct the annotations in the same way that contributors should in the task
- Multiple acceptable classes can be assigned to each token, although each contributor can only provide one class per token
- Each token can be part of multiple spans
- Token threshold can be adjusted on the left-hand side.
- The default setting will require 100% accuracy if there are between 1 and 4 tokens annotated. With 5+ tokens annotated, the default introduces leniency.
- If no annotations are needed, include an option to hide the annotation tool
*Note: The 'Convert finalized rows to Test Questions' feature is not currently available for the text annotation tool.
Glossary
- Span - a set of tokens (1 or more) with an assigned class label - the output of a model or a contributor judgment
- Merge Span - The act of adding one more tokens to a span
Figure 1. Merging the span together - Split Span - The act of taking a span made of 2 or more tokens and dividing them back into their own tokens
Figure 2. Splitting the span
Understanding How Contributors Are Evaluated Against Test Questions
Figure 3. How Contributors are Evaluated
There are two aspects of each test question which the contributors are evaluated against.
- Spans (all or nothing)
- In order to pass the test question, contributors must correctly merge all tokens that are merged in the spans of the test question answer
- If the contributor fails to merge two tokens that are meant to be merged, or merges two tokens that should not be merged, the test question will be missed regardless of the classes assigned to the tokens
- Classes (token threshold)
- The contributor must correctly annotate the number of tokens specified by the test question’s token threshold
- The default setting will require 100% accuracy if there are between 1 and 4 tokens annotated. With 5 or more tokens annotated, we introduce leniency as the default by requiring 75% of the tokens (rounded down) to be correct.
- Note: You may always customize how many tokens a contributor needs to pass
- Note: You may always customize how many tokens a contributor needs to pass
Figure 4. Token Threshold
Test Question Answers | Spans | Tokens |
Each Token is part of a single Span
|
✅To pass, spans provided by contributors must match 100% |
✅To pass, contributor must correctly assign classes to>= tokens |
Tokens are part of >1 span
|
❌Spans will not be evaluated |
✅To pass, contributor must correctly assign classes to >= X tokens
*X= token Threshold
|
*Note: Only tokens with assigned classes in the test question answer will be compared against the contributor’s answer. If the “none” class is used when creating the test question, then false positives (any class assignment other than “none”) will be considered incorrect.