IBM Debater® Recorded Debating #1


Engaging in a competitive debate requires Project Debater to effectively rebut arguments raised by the human opponent. The system must listen to an argumentative speech in real-time, understand the main arguments, and produce persuasive counter-arguments.

The nature of the argumentation domain and the characteristics of competitive debates make the understanding of such spoken content challenging. Expressed ideas often span multiple non-consecutive sentences and many arguments are alluded to rather than explicitly stated. Further difficulty stems from the requirement to identify and rebut the most important parts of a speech that is several minutes long. This contrasts with today’s conversational agents, which aim at understanding a single functional command from short inputs. The goal of this dataset is to form a basis for the development of listening comprehension algorithms in this challenging setting.

Release #1 of the dataset contains 60 recorded speeches from 16 controversial topics, and details the recordings process.

The recorded debates are provided in various formats:

  • The recorded audio (wav files)
  • Text produced from the audio using an automatic speech recognition (ASR) system (text files)
  • A manually corrected transcript of the ASR text, created by expert annotators (text files)

Both the ASR and transcript texts are given in their raw form, designating also the time within the audio in which each utterance was said, and in another “NLP-friendly” clean version containing only the spoken words.

Dataset Metadata

Format License Domain Number of Records Size
CC-BY-SA 3.0 Natural Language Processing 60 speeches
16 topics


author = {Shachar Mirkin and Michal Jacovi and Tamar Lavee and Hong-Kwang Kuo and Samuel Thomas and Leslie Sager and Lili Kotlerman and Elad Venezian and Noam Slonim},
title = "{Recorded Debating Speeches}",
booktitle = {Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)},
year = {2018}
  • Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.