The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.
Get this Dataset
|Data Description||Zipped File Name|
|Full (Original) Dataset, 7.2 GB||airline.tar.gz|
|2 Million Row Sample Dataset, 152 MB||airline_2m.tar.gz|
|LAX to JFK Sample Dataset, 58 KB||lax_to_jfk.tar.gz|
|Number of Records||194,385,636 flights
|Dataset Origin||Bureau of Transportation Statistics|
|Dataset Version Update||Version 1 – June 25, 2020
|Data Coverage||Location: United States
Dates: 1987 through 2020
|Business Use Case||Aviation: Predict which flights are likely to arrive on time
Dataset Archive Contents
|File or Folder||Description|
||Random 2 million record sample (approximately 1%) of the full dataset|
||Approximately 2 thousand record sample of flights from LAX to JFK airport|
Data Glossary and Preview
Click here to explore the data glossary, sample records, and additional dataset metadata.
Use the Dataset
This dataset is complemented by data exploration, data analysis, and modeling Python notebooks to help you get started:
- Run the notebooks as a pipeline using the Elyra extension for JupyterLab
- Run the data exploration notebook in Watson Studio
This dataset was compiled from data available on the Bureau of Transportation Statistics website and is US Government work not subject to copyright.