Overview
The SimpleQuestions Relation Detection dataset is a set of relation extraction annotations derived from the SimpleQuestions dataset. Each entry in this dataset follows the order of questions listed in the SimpleQuestions dataset and corresponds to the following format: gold_relations \t negative_relation_pool \t question
. The relation ids are mapped in a separate file titled relation.2M.list
where the index of the ids starts at 1. The dataset is split into train, validation, and test sets to match the split used by the SimpleQuestions data.
Dataset Metadata
Field | Value |
---|---|
Format | TSV |
License | CDLA-Permissive |
Domain | Natural Language Processing |
Number of Records | 108,442 questions |
Data Split | 77,524 training questions 10,309 validation questions 20,609 test questions |
Size | 7.7 MB |
Dataset Origin | Original SimpleQuestions dataset from Facebook Research, derived annotations by IBM Research |
Dataset Version Update | Version 1 – May 07, 2020 |
Data Coverage | Randomized facts from Knowledge Base Freebase |
Business Use Case | Linguistics: Train a relationship extraction model that can be used to build a family tree graph autobiographical text. |
Dataset Archive Content
File or Folder | Description |
---|---|
train.replace_ne.withpool |
Questions in the training subset |
valid.replace_ne.withpool |
Questions in the validation subset |
test.replace_ne.withpool |
Questions in the testing subset |
relation.2M.list |
Relation id mappings |
LICENSE.txt |
Plaintext version of the CDLA-Permissive license |
README.txt |
Text file with the file names and description |
Data Glossary and Preview
Click here to explore the data glossary, sample records, and additional dataset metadata.
Use the Dataset
This dataset is complemented by a data exploration notebook to help you get started : Try the completed notebook
Citation
@inproceedings{yu2017improved,
title={Improved Neural Relation Detection for Knowledge Base Question Answering},
author={Yu, Mo and Yin, Wenpeng and Hasan, Kazi Saidul and dos Santos, Cicero and Xiang, Bing and Zhou, Bowen},
booktitle={Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={571--581},
year={2017}
}
Related Links
- Simple Questions from Facebook Research underlying dataset used to generate this set of annotations