Case Study: Artimys - Data Categorization

Train Machine Learning Models

Result: 4-5 fold increase in several KPIs


“The results are fantastic.”

— Bob Dillon, CEO, Artimys


The Company

Protecting Children from Online Bullying

Today’s kids are more plugged into the Internet than ever before and are exposed to real dangers online. Artimys Language Technologies helps parents protect their children from bullies, sexual predators and warns parents if it detects signs of suicidal behavior from their children. With Artimys’s services, parents have the peace of mind that their children can be safe online.


The Challenge

Machine Learning is Difficult with Highly Subjective and Evolving Language

Artimys monitors the Internet in real-time using algorithms to detect language that poses a threat to a child online. The language patterns associated with suicide risk and sexual predation are more easily detected in online messages. However, bullying language is highly subjective and continuously evolving. The company was having difficulty building algorithms that could accurately and systematically identify true online bullying. For example, the use of profanity can lead to false-positives that the algorithm thinks is bullying, but upon review, are not genuine indicators of bullying language.

Artimys needed pristine ground-truth data around bullying language to train its machine learning models. That is to say, it needed pre-labeled data that required human analysis and interpretation – at a massive scale – so the machine could learn and create a model to detect patterns within certain ambiguous bullying phrases. 


The Solution

A Platform and On-demand Workforce that Evaluates Massive Data Sets to Train Algorithms

Artimys’s training-data creation process starts with 2 million messages drawn from Twitter and other online message boards. Next, Artimys feeds 40,000 high-likelihood conversation snippets directly into CrowdFlower’s platform, allowing for the rapid creation of accurate labels by thousands of online workers. In a matter of hours, CrowdFlower workers provide more than 150,000 responses in qualifying the 40,000-message dataset. 

Artimys uses the resulting dataset to train its algorithmic bully-detection model. The company looked at other solutions, but chose CrowdFlower because it could produce results on a larger scale, faster and with greater ease than the alternatives.


The Results

Accurate Real-time Detection of Online Bullying to Keep Children Safe

“The results are fantastic,” says Bob Dillon, CEO, Artimys. “It’s intoxicating to watch 150,000 judgments finish in a matter of hours. CrowdFlower is to labeling data, as Microsoft Word is to document processing and Excel is to financial analysis. The complexity of loading, hosting and serving up the data, and gathering and aggregating responses is simple using CrowdFlower.” 

Using CrowdFlower, Artimys is able to achieve 4-5 fold increases in several key performance measures of accurate bullying language detection. Specifically, its model’s precision increased by 5.2X, recall improved by 25 percent, and the F1 score increased 4.2X. As a result, the company can confidently stand behind its claim that the Artimys service is adaptive and constantly learning in order to provide the best protection from online bullying for its customers.



Was this article helpful?
0 out of 0 found this helpful

Have more questions? Submit a request
Powered by Zendesk