Overview
ConProp version 1.0 was developed by researchers at IBM Almaden Research Center, San Jose, CA, USA. ConProp consists of proposition bank-style annotations from approximately 1000 English compliance sentences obtained from IBM’s publicly available contracts. These sentences were extracted from contract sections such as Business Partner descriptions, Agreement Terms / Structure, Intellectual Property Protection, Limitation of Liability, Warranty Terms, and many more. Each of the sentences are annotated with a layer of “universal” semantic role labels covering parts of speech, argument labeling, and predicate labeling. This dataset makes for great training data to train a deep neural network to perform Semantic Role Labeling (SRL) on unlabeled legal domain language. Semantic Role Labeling (SRL) is a process in natural language processing that deals with structurally representing the meaning of a sentence.
Dataset Metadata
Field | Value |
---|---|
Format | CoNLL-U |
License | CDLA-Sharing |
Domain | Natural Language Processing |
Number of Records | ~1,000 annotated sentences corresponding to ~50,000 words |
Size | 2.3 MB |
Author | Sanjana Sahayaraj, Yunyao Li, Huaiyu Zhu, Marina Danilevsky, Poornima Chozhiyath Raman, Ramiya Venkatachalam |
Dataset Origin | IBM Research |
Dataset Origin | IBM Research |
Dataset Version Update | Version 1 – September 12, 2019 |
Data Coverage | This dataset contains labeled sentences from IBM’s publicly available contracts. |
Business Use Case | Linguistics: Train a semantic role labeler to provide input for a chatbot model. |
Dataset Archive Contents
File or Folder | Description |
---|---|
contracts_proposition_bank.conllx |
A full version of the raw dataset. |
LICENSE.txt |
Terms of Use |
Data Glossary and Preview
Click here to explore the data glossary, sample records, and additional dataset metadata.
Use the Dataset
This dataset is complemented by a data exploration Python notebook to help you get started:
Citation
[1] Wen-Chi Chou, Richard Tzong-Han Tsai, Ying-ShanSu, Wei Ku, Ting-Yi Sung, and Wen-Lian Hsu. (2016). A semi-automatic method for annotating a biomedical proposition bank. In Proceedings of the workshop on frontiers in linguistically annotated corpora 2006. Association for Computational Linguistics, pages 5–12.
[2] Alan Akbik and Yunyao Li. (2016). K-srl: Instance-based learning for semantic role labeling. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. pages 599–608.
[3] Yuta Tsuboi, Hiroshi Kanayama, Katsumasa Yoshikawa, Tetsuya Nasukawa, Akihiro Nakayama, Kei Sugano, John Richardson. (2014). Transfer of dependency parser from rule-based system to learning-based system, Proceedings of 20th Annual Meeting of the Association of Natural Language Processing (in Japanese), 2014.