2021 Call for Code Awards: Live from New York, with SNL’s Colin Jost! Learn more

Contracts Proposition Bank


ConProp version 1.0 was developed by researchers at IBM Almaden Research Center, San Jose, CA, USA. ConProp consists of proposition bank-style annotations from approximately 1000 English compliance sentences obtained from IBM’s publicly available contracts. These sentences were extracted from contract sections such as Business Partner descriptions, Agreement Terms / Structure, Intellectual Property Protection, Limitation of Liability, Warranty Terms, and many more. Each of the sentences are annotated with a layer of “universal” semantic role labels covering parts of speech, argument labeling, and predicate labeling. This dataset makes for great training data to train a deep neural network to perform Semantic Role Labeling (SRL) on unlabeled legal domain language. Semantic Role Labeling (SRL) is a process in natural language processing that deals with structurally representing the meaning of a sentence.

Dataset Metadata

Field Value
Format CoNLL-U
License CDLA-Sharing
Domain Natural Language Processing
Number of Records ~1,000 annotated sentences corresponding to ~50,000 words
Size 2.3 MB
Author Sanjana Sahayaraj, Yunyao Li, Huaiyu Zhu, Marina Danilevsky, Poornima Chozhiyath Raman, Ramiya Venkatachalam
Dataset Origin IBM Research
Dataset Origin IBM Research
Dataset Version Update Version 1 – September 12, 2019
Data Coverage This dataset contains labeled sentences from IBM’s publicly available contracts.
Business Use Case Linguistics: Train a semantic role labeler to provide input for a chatbot model.

Dataset Archive Contents

File or Folder Description
contracts_proposition_bank.conllx A full version of the raw dataset.
LICENSE.txt Terms of Use

Data Glossary and Preview

Click here to explore the data glossary, sample records, and additional dataset metadata.

Use the Dataset

This dataset is complemented by a data exploration Python notebook to help you get started:


[1] Wen-Chi  Chou,  Richard  Tzong-Han  Tsai,  Ying-ShanSu,  Wei  Ku,  Ting-Yi  Sung,  and  Wen-Lian  Hsu. (2016). A semi-automatic method for annotating a biomedical proposition bank. In Proceedings of the workshop on frontiers in linguistically annotated corpora 2006. Association for Computational Linguistics, pages 5–12.
[2] Alan  Akbik  and  Yunyao  Li.  (2016).    K-srl:   Instance-based learning for semantic role labeling.   In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. pages 599–608.
[3] Yuta Tsuboi, Hiroshi Kanayama, Katsumasa Yoshikawa, Tetsuya Nasukawa, Akihiro Nakayama, Kei Sugano, John Richardson. (2014). Transfer of dependency parser from rule-based system to learning-based system, Proceedings of 20th Annual Meeting of the Association of Natural Language Processing (in Japanese), 2014.