Overview
The dataset contains categorised questions which were frequently asked by the public during the COVID-19 pandemic period. It was created to ramp-up a dialogue system that provides answers to questions frequently asked by the public. The dataset made publicly available here in the hopes of further promoting research on semantic utterance classification for goal-oriented dialogue systems.
Dataset Metadata
Field | Value |
---|---|
Format | TSV |
License | CDLA-Sharing |
Domain | Natural Language Processing |
Number of Records | 844 |
Size | 49KB |
Author | Naama Tepper, Esther Goldbraich |
Dataset Origin | IBM |
Dataset Version | Version 1 – Oct 1, 2020 |
Data Coverage | COVID-19 related enquires |
Business Use Case | COVID-19 chatbot |
Dataset Archive Content
File or Folder | Description |
---|---|
LICENSE.txt | Terms of Use |
covid_19_questions.tsv | Full version of raw dataset. |
Data Glossary and Preview
For a full view of this dataset’s metadata, data glossary, and a set of sample records click on the Preview the dataset
button displayed above or follow the link here.
Use the Dataset
This dataset is complemented by starter notebooks that will help you get started:
Citation
@article{Tepper2020balancing,
title={Balancing via Generation for Multi-Class Text Classification Improvement},
author={Tepper, Naama and Golbraich, Esther and Zwerdling, Naama and Kour, George and Anaby-Tavor, Ateret and Carmeli, Boaz},
journal={Findings of EMNLP 2020},
year={2020}
}