IBM Debater® Wikipedia Category Stance


The dataset contains:

  1. 132 concepts
  2. 4603 Wikipedia categories and lists annotated for stance (Pro/Con) towards the concepts

The released data file has 4 columns:

  • Column A: the label
  • Column B: the concept
  • Column C: the page title of the category or list in Wikipedia
  • Column D: the URL of the category/list page

For each category, the label is one of the following:

  1. “-” – The category is not a person group category
  2. “P” – Pro stance (supporting the concept)
  3. “C” – Con stance (opposing the concept)
  4. “?” – The stance cannot be determined based on the category name, or the category is not relevant.
  5. “X” – Unresolved case: each of the 3 annotators gave a different label

Dataset Metadata

Field Value
Format CSV
License CC BY 3.0
Domain Natural Language Processing
Number of Records 4,603 records
Data Split NA
Size 525 KB
Authors Orith Toledo-Ronen, Roy Bar-Haim
Dataset Origin IBM Research
Dataset Version Version 2 – August 1, 2019
Version 1 – August 30, 2016
Data Coverage 132 concepts, 4603 Wikipedia categories and lists annotated for stance (Pro/Con) towards the concepts
Business Use Case Government – Analyze sentiment of political topics and conversations.

Dataset Archive Contents

File or Folder Description
WikipediaCategoriesResults.csv The dataset
WikipediaCategoriesLabeling.docx The guidelines used for labeling the data
LICENSE.txt Terms of Use
ReleaseNotes.txt Release notes file describing the data

Data Glossary and Preview

Click here to explore the data glossary, sample records, and additional dataset metadata.

Use the Dataset

This dataset is complemented by starter notebooks that will help you get started:

Quick access in Python (requires the pardata pypi package):

$ pip install pardata

import pardata
data = pardata.load_dataset('wikipedia_category_stance')
  • Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.


title = "Expert Stance Graphs for Computational Argumentation",
author = "Toledo-Ronen, Orith  and
Bar-Haim, Roy and
Slonim, Noam",
booktitle = "Proceedings of the Third Workshop on Argument Mining ({A}rg{M}ining2016)",
month = aug,
year = "2016",
address = "Berlin, Germany",
publisher = "Association for Computational Linguistics",
url = "",
doi = "10.18653/v1/W16-2814",
pages = "119--123",