Supplementary Material for
Early Detection of Sexual Predators in Chats

An important risk that children face today is online grooming, where a so-called sexual predator establishes an emotional connection with a minor online with the objective of sexual abuse. Prior academic reserach has sought to automatically identify grooming in chats, but only after an incidence has already happened, in the context of legal prosecution.

In our work, we instead investigate this problem from the point of view of prevention. We define and study the task of early sexual predator detection (eSPD) in chats, where the goal is to analyze a running chat from its beginning and predict grooming attempts as early and as accurately as possible.

We survey existing datasets and their limitations regarding eSPD, and create a new dataset called PANC for more realistic evaluations. We present strong baselines built on BERT that also reach state-of-the-art results for conventional SPD. Finally, we consider coping with limited computational resources, as real-life applications require eSPD on mobile devices.

View Paper Watch Video

Resources

Laboratory

Our experimental setup. With this you can train and evaluate language models to do research on early sexual predator detection.

Datasets

This is the preprocessing code we used to create the Datasets PANC from the PAN12 and ChatCoder2 datasets. We also provide a script to obtain the Dataset VTPAN from PAN12.

Chat Visualizer

This is a simple program that hosts a website on your computer to visualize given chat logs. It shows chat messages as a heatmap, to visualize the predictions of models after each message.

Android Demo

A demo app which shows how BERTBase and MobileBERT models in general can be used on mobile.

Trained Models

You can download our trained models to verify our results. Note that these models are not ready for real-world use as discussed in the paper.