Kubernetes with OpenShift World Tour: Get hands-on experience and build applications fast! Find a workshop!

Overview

Engaging in a competitive debate requires Project Debater to effectively rebut arguments raised by the human opponent. The system must listen to an argumentative speech in real-time, understand the main arguments, and produce persuasive counter-arguments.

The nature of the argumentation domain and the characteristics of competitive debates make the understanding of such spoken content challenging. Expressed ideas often span multiple, non-consecutive sentences and many arguments are alluded to rather than explicitly stated. Further difficulty stems from the requirement to identify and rebut the most important parts of a speech that is several minutes long. This contrasts with today’s conversational agents, which aim at understanding a single functional command from short inputs. The goal of this dataset is to form a basis for the development of listening comprehension algorithms in this challenging setting.

Release #2 includes 200 speeches from 50 controversial topics, in the following format:
– The recorded audio (wav files)
– Text produced from the audio using an automatic speech recognition (ASR) system (text files)
– A manually corrected transcript of the ASR text, created by expert annotators (text files)

Both the ASR and transcript texts are given in their raw form, designating also the time within the audio in which each utterance was said, and in another “NLP-friendly” clean version containing only the spoken words.

An additional annotation layer on top of the recorded speeches includes, for each speech, potentially mentioned arguments were extracted from an online resource (iDebate: www.idebate.com). These arguments are annotated as mentioned or not in the speech.

Dataset Metadata

Format License Domain Number of Records Size
WAV
CSV
TXT
CC-BY-SA 3.0 Natural Language Processing 200 speeches
50 controversial topics
756 annotated arguments
3.1GB

Citation

@InProceedings{mirkin-etal-2018-listening,
author = {Shachar Mirkin and Guy Moshkowich and Matan Orbach and Lili Kotlerman and Yoav Kantor and Tamar Lavee and Michal Jacovi and Yonatan Bilu and Ranit Aharonov and Noam Slonim},
title = {Listening Comprehension over Argumentative Content},
booktitle = {Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
year = {2018}
}
  • Project Debater Project Debater is the first AI system that can debate humans on complex topics. The goal is to help people build persuasive arguments and make well-informed decisions. This dataset contributed to training the models in Project Debater.