2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

Toxic Comment Classifier


This model is able to detect 6 types of toxicity in a text fragment. The six detectable types are toxic, severe toxic, obscene, threat, insult, and identity hate. The underlying neural network is based on the pre-trained BERT-Base, English Uncased model and was finetuned on the Toxic Comment Classification Dataset using the Huggingface BERT Pytorch repository.

A brief definition of the six different toxicity types can be found below.

Toxic: very bad, unpleasant, or harmful

Severe toxic: extremely bad and offensive

Obscene: (of the portrayal or description of sexual matters) offensive or disgusting by accepted standards of morality and decency

Threat: a statement of an intention to inflict pain, injury, damage, or other hostile action on someone in retribution for something done or not done

Insult: speak to or treat with disrespect or scornful abuse

Identity hate: hatred, hostility, or violence towards members of a race, ethnicity, nation, religion, gender, gender identity, sexual orientation or any other designated sector of society

Model Metadata

Domain Application Industry Framework Training Data Input Data
Natural Language Processing (NLP) Text Classification General PyTorch Toxic Comment Classification Dataset Text


This model achieves a column-wise ROC AUC score of 0.98355 (private score) in the Kaggle Toxic Comment Classification Competition. This implementation is trained with a maximum sequence length of 256 instead of 512 to have higher inference speed. For most applications outside of this Kaggle competition, a sequence length of 256 is more than sufficient.



Component License Link
Model GitHub repository Apache 2.0 LICENSE
Finetuned Model Weights Apache 2.0 LICENSE
Pre-trained Model Weights Apache 2.0 LICENSE
TensorFlow Model Code (3rd party) Apache 2.0 LICENSE
PyTorch Model Code (3rd party) Apache 2.0 LICENSE
Toxic Comment Classification Dataset CC0 LICENSE

Options available for deploying this model

This model can be deployed using the following mechanisms:

  • Deploy from Dockerhub:

    docker run -it -p 5000:5000 codait/max-toxic-comment-classifier
  • Deploy on Red Hat OpenShift:

    Follow the instructions for the OpenShift web console or the OpenShift Container Platform CLI in this tutorial and specify codait/max-toxic-comment-classifier as the image name.

  • Deploy on Kubernetes:

    kubectl apply -f https://raw.githubusercontent.com/IBM/MAX-Toxic-Comment-Classifier/master/max-toxic-comment-classifier.yaml

    A more elaborate tutorial on how to deploy this MAX model to production on IBM Cloud can be found here.

  • Locally: follow the instructions in the model README on GitHub

Example Usage

You can test or use this model

Test the model using cURL

Once deployed, you can test the model from the command line. For example:

curl -d "{ \"text\": [ \"I would like to punch you.\", \"In hindsight, I do apologize for my previous statement.\" ]}" -X POST "http://localhost:5000/model/predict" -H "Content-Type: application/json"

You should see a JSON response like that below:

  "status": "ok",
  "predictions": [
      "toxic": 0.9796434044837952,
      "severe_toxic": 0.07256636023521423,
      "obscene": 0.058431386947631836,
      "threat": 0.8635178804397583,
      "insult": 0.11121545732021332,
      "identity_hate": 0.013826466165482998
      "toxic": 0.00029103411361575127,
      "severe_toxic": 0.00012417171092238277,
      "obscene": 0.0001522742968518287,
      "threat": 0.00008440738747594878,
      "insult": 0.00016013195272535086,
      "identity_hate": 0.00012860879360232502

Test the model in a serverless app

You can utilize this model in a serverless application by following the instructions in the Leverage deep learning in IBM Cloud Functions tutorial.

Resources and Contributions

If you are interested in contributing to the Model Asset Exchange project or have any queries, please follow the instructions here.