Figure Eight makes it easy to leverage the power of our contributors to carry out data transformations, cleansing, and analysis. Once you’ve loaded data to a job, either by uploading a file (.CSV, .TSV, .XLSX, .ODS) or pulling from a data feed (RSS, Atom, XML, JSON), you can manage it from the Data tab.
Fig. 1: Data page
Data Page Options
The Data Page options are as follows:
Fig. 2: Data Page options
Convert Uploaded Test Questions
The Convert Uploaded Test Questions option allows you to create test questions from a data file. This button will convert the units of your source data that have test question values into test question units automatically. This can save you time by uploading a source data file and a test question data file together, instead of separately.
In cases where multiple values are stored in the cells of the same column, you can use the Split Column function to parse the data into two or more columns by specifying a delimiter (most typically a newline character).
Fig. 3: Split Column modal in Data Page
Add More Data
Whether you’re uploading data to a job for the first time or supplementing the data you’ve already loaded, you can select Add More Data in the Data menu to add new data to your job.
Note: .CSV files are the preferred file type for data imports, though .TSV, .XLSX, and .ODS file formats are also supported. All data must be UTF-8 encoded.
Tip: Each column header serves as a unique identifier for the data it contains. Avoid duplicating column headers anywhere within the data of a given job unless you intend to supplement a given column with new data. In this case, it’s important that the column headers of your new data source are identical to the existing column headers.
You’ll see a few additional columns prepended to the data that you’ve uploaded to your job. Each new column is explained below.
This column contains a unique ID for each row of data. This also links to that row’s unit page, which will contain contributor response statistics once your job has begun collecting judgments.
This column displays the status of each row. Here are the most common states that you’ll see:
New - this is the initial state that a unit is in, upon upload to the platform. Units in the New state are not yet eligible to receive judgments because they have not yet been launched.
Judgable - once a unit has been launched, it is eligible to receive judgments and is placed into the Judgable state, where it will remain until it has collected enough trusted judgments to satisfy the job’s requirements.
Golden - a Golden unit is a Test Question, which means that its judgments are used for quality control.
Finalized - when a unit has collected enough trusted judgments to satisfy the job’s requirements, it will move into the Finalized state. Once finalized, a unit will not collect any more judgments unless a contributor’s judgments become untrusted, necessitating the collection of additional judgments for all units to which that contributors has submitted responses.
This column contains a tally of the total judgments collected for each unit. The tally includes both trusted and untrusted judgments.
The agreement column displays a score between 0 and 1, inclusive, indicating contributor consensus across all questions that aggregate responses for that unit. The calculation is as follows, with an example scenario below.
For each question with aggregation, the number of instances of the most common answer is divided by the number of total responses to that question, producing a score between 0 and 1 for that question. That score is added together with the scores for the other aggregated questions. The total score is then divided by the number of aggregated questions, producing the overall agreement score displayed.
Example: Let’s say a unit has 2 questions, both with aggregated responses, and 4 contributors submit judgments for that unit. Here are their responses:
Contributor 1: B
Contributor 2: B
Contributor 3: B
Contributor 4: D
Contributor 1: A
Contributor 2: C
Contributor 3: C
Contributor 4: B
Question 1: the most common response was B, which 3 of the 4 contributors submitted. Number of most common responses (3) divided by number of total responses (4) = .75
Question 2: the most common responses was C, which 2 of the contributors submitted. Number of most common responses (2) divided by number of total responses (4) = .5
(.75 + .5) divided by number of aggregated questions (2) = .63. This unit would have an agreement score of .63
*Note: agreement factors in all questions for each unit that have aggregated responses, including logic-dependent questions. Hence, if a contributor doesn’t submit a response to a specific question due to the logic dictated by their other responses, their lack of a response will factor into that question’s agreement score as a 0 in the agreement calculation.
Working With Your Data
When you’ve finished adding your data, Figure Eight displays 25 units on each page. Switch between pages using the controls at the bottom of the data file.
Fig. 4: Data Page Navigation
Sort units according to the values in a given column by clicking the column’s header to toggle between ascending and descending order.
If you would like to delete certain rows of data from your uploaded dataset, you may select the checkbox next to the data row and then select Delete from the Data Page bar. You may only delete units that are in a "New" unit state.