Note: this dataset is hosted on a third-party site and not on the Data Asset Exchange. Clicking on the “Get this dataset” link above will direct you to physionet.org.
This dataset contains medical information and requires the user to complete a training course before accessing the dataset.
Natural Language Inference (NLI) is one of the critical tasks for understanding natural language. The objective of NLI is to determine if a given hypothesis can be inferred from a given premise. NLI systems have made significant progress over the years, and has gained popularity since the recent release of datasets such as the Stanford Natural Language Inference (SNLI) (Bowman et al. 2015) and Multi-NLI (Nangia et al. 2017).
We introduce MedNLI – a dataset annotated by doctors, performing a natural language inference task), grounded in the medical history of patients. We present strategies to: 1) leverage transfer learning using datasets from the open domain, (e.g. SNLI) and 2) incorporate domain knowledge from external data and lexical sources (e.g. medical terminologies). Our results demonstrate performance gains using both strategies.
|Format||License||Domain||Number of Records||Size|
Training (11,232 pairs)
Development (1,395 pairs)
Test (1,422 pairs)
- MedNLI Website website providing more information about MedNLI
- Data Asset eXchange (DAX) Explore useful and relevant data sets for enterprise data science.
- Model Asset eXchange (MAX) A place for developers to find and use free and open source deep learning models.
- Center for Open-Source Data & AI Technologies (CODAIT) Improving the Enterprise AI Lifecycle in Open Source.