Follow

Guide to Workflows

Workflows can be used to automate the transformation of unlabeled data across multiple jobs on Figure Eight. This platform feature connects jobs in either linear or branched configurations and will route rows to specific destinations based on answer-based routing rules

Note: Workflows currently support use cases that handle aggregated responses. Jobs containing cml:text, cml:textarea, cml:checkboxes are not officially supported.

 

Glossary 

  • Workflow
    • The configuration of multi-step data annotation process consisting of operators 
  • Workflow Graph 
    • The visualization of workflow in the canvas UI. 
  • Operator 
    • A step in the Workflow that produces an output that can be routed to another step in the workflow. 
  • Routing Rule
    • A label, annotation, filter, confidence threshold or any other attribute of the report that can be used to route data to another operator
  • Branch
    • The split of data caused by applying routing rules to multiple operators. 
  • Workflow report
    • The combined results of all operators

Best Practices Before You Start

  • Mockup and conduct a test run of the jobs you want to include in your workflow before automating them. 
  • Make copies of the jobs that will be included in the workflow. 
    • Make sure the job being copied from has "agg" set in the report options for all fields. This will ensure proper routing
  • Add Test Questions 
    • Jobs should always contain test questions to ensure quality results. It is critical to do this before launching your workflow. Add test questions to the jobs in your workflow once you have finalized your workflow design using any of the following techniques:
      1. Create Test Question from High Confidence Rows. See this guide for more information. 
      2. Copy an existing job that contains test questions

 

Creating a Workflow 

In this guide, we'll set up and launch a multi-step image categorization workflow using existing jobs

  • 1. Go to the Workflows page. From here you can view existing workflows, or create new workflows. 
    create.gif
    Figure 1. Creation of New workflow

  • 2. Design a workflow on the connect tab
    • Add your first operator to the workflow by clicking the empty tile 
      • Note: In workflows, Jobs are synonymous with operators 
    • You'll see the operator panel open which contains your jobs.
      • There are some restrictions on which jobs are eligible for use in a workflow: 
        • You will not be able to use any jobs you do not own in your workflow
        • You may only add unlaunced jobs. Jobs running or already in use by other workflows will not be available. 
        • You may only use a job once in a single workflow. 
    • Configuring Routing Rules 
      • After adding two operators, you will be asked to configure routing rules. There are two options available when routing rows: Route All Rows or Route by Column Headers 
        • Routing All Rows - route all rows directly to the next operator. When using this option you will not be able to use branching. 
        • Route by Column Headers - route rows based on filter.
          • First, select a column header you want to use to filter on
          • Second, select a filter type:
            • Equals
            • Does not equal
            • Contains
            • Is greater than
            • Is greater than or equal to 
            • Is less than
            • Is less than or equal to 
          • Third, enter the answer value of the question set in the column header
          • Note: This should exactly match the 'value' specified in your CML
        • If desired, add multiple conditions to the rule to fine tune how rules are handled.
          • Note: you cannot add both 'AND' and 'OR' conditions to the same rule at the same time



add_operators.gif
Figure 2. Configuring Routing Rules Between Jobs


Screenshot_2019-06-01_09.00.21.png
Figure 3. Example CML With 'values' To Be Used In The Routing Rules

 

    • Setting up branching operators 
      • Create workflow branches to route the output of one operator to several destination operators
      • Branching becomes available once you add a second-level operator to our workflow. 
        • Note: Currently, we do not support the merging of output of branches back together into one operator. 

Screenshot_2019-06-01_09.08.35.png

Fig 4. Two Level Workflow Structure With Branching

  • 3. Source data
    • Uploading data to a workflow is similar to uploading data to a job. All data in a workflow must pass through the first operator, so in a sense, you can think of uploading data to a workflow as uploading data to that first operator
    • Dataset Requirement
      • Should contain the liquid tags referenced in your operators. In the upload modal, we'll display these tags detected from the first operator as a reminder of the data being labelled. 
      • As with jobs, there is a 250k row limit for workflow uploads

data.gif

Fig 5. Adding Data To A Workflow

 

  • 4. Review the workflow before launch
    • On the launch page, we will display a summary of the operators in your workflow and highlight a few important items:
      • Price per Judgment and Judgments per Row for each operator
      • Estimated Maximum Cost
        • This is intended to provide a max contributor cost estimate if all uploaded rows run through all operators 
      • Available Funds
  • 5. Launch the workflow 
    • As with jobs, we recommend testing your workflow with 100 rows before ordering a large number of rows. To do this, select "Order rows and Launch". 
      • Note: Data operators and routing rules cannot be edited after a workflow is launched
  • 6. Workflow Reports 
    • After the initial test run, you'll want to review your workflow report which contains data from all operators based on your filtering rules. Please review your test run to ensure all rows routed correctly before launching the remaining rows in the workflow. 
    • The workflow report will contain some new columns not found in the job report: 
      • data_line_id
        • This identifier will follow the row from the first operator to be finalized workflow report. It is a lot like unit_id and can be used to track results and where a row was routed. 
      • row_ingested_at
        • This is the timestamp at which the row was uploaded to the workflow. 
      • j{job_id}:{column_header}:agg
        • For every operator in your workflow, you will see a column containing the Job ID and the question being answered. This column will contain the value of the aggregated answer chosen in the operator 
      • j{job_id}:{column_header}:confidence
        • This will be the confidence of the aggregated answer.
    • For image annotation jobs will also see the following columns:
      • j{job_id}:{annotation}
        • This will contain the aggregated annotation for the row

Screen_Shot_2019-06-24_at_10.05.23_AM.png

Fig 6. Example Workflow Report


Was this article helpful?
1 out of 1 found this helpful


Have more questions? Submit a request
Powered by Zendesk