Digital Developer Conference: Hybrid Cloud. On Sep 22 & 24, start your journey to OpenShift certification. Free registration

SimpleQuestions Relation Detection


The SimpleQuestions Relation Detection dataset is a set of relation extraction annotations derived from the SimpleQuestions dataset. Each entry in this dataset follows the order of questions listed in the SimpleQuestions dataset and corresponds to the following format: gold_relations \t negative_relation_pool \t question. The relation ids are mapped in a separate file titled relation.2M.list where the index of the ids starts at 1. The dataset is split into train, validation, and test sets to match the split used by the SimpleQuestions data.

The relationship extraction task deals with generating semantic relationships between entities in a text. Relationships generally connect two entities via a certain affiliation. Examples of entities for instance can be types of people, organizations, or locations while relationships among these entities can be for instance types of spatial, social, or hierarchical relations. The entities “Steve Jobs” and “Apple” for instance may have the relation of “Founder”. Relation extraction is important in the field of machine reading and provides a necessary input into more complicated tasks for computers such as answering questions, acting as conversational agents, or summarizing text.

The original SimpleQuestions dataset was developed by Facebook and consists of 108,442 simple questions written by human English-speaking annotators. Each question is matched with an answer in the form of a fact consisting of a subject, relationship, and object. For more information or for access to the original SimpleQuestions dataset you can visit the dataset’s repository linked below in the Related Links section.

Dataset Metadata

Format License Domain Number of Records Size Originally Published
CDLA-Permissive Natural Language Processing 108,442 questions 7.7MB 2017-05-26

Example Records

40        61 40 117        which genre of album is #head_entity# ?
61        56 702 132 61 117 40 11        what format is #head_entity#
272        7 1 18 272 308        what film is by the writer #head_entity# ?



 title={Improved Neural Relation Detection for Knowledge Base Question Answering},
 author={Yu, Mo and Yin, Wenpeng and Hasan, Kazi Saidul and dos Santos, Cicero and Xiang, Bing and Zhou, Bowen},
 booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},