Forum Classify

Get this dataset

Overview

The dataset consists of 100 discussion threads crawled from Ubuntu Forums discussions. Each message in each individual thread is assigned a dialog label out of following eight classes: question, repeat question, clarification, further details, solution, positive feedback, negative feedback, junk.

Dataset Metadata

Format License Domain Number of Records Size
XML
CC BY-SA 4.0 Natural Language Processing 529 messages
104 MB (compressed)

Citation

@article{ahu61This,
author="Sumit Bhatia
and Prakhar Biyani
and Prasenjit Mitra",
title="Identifying the Role of Individual User Messages in an Online Discussion and its Applications in Thread Retrieval",
journal="Journal of the Association for Information Science and Technology",
volume="67",
year="2015",
pages="276-288",
}