Overview
The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.
Dataset Metadata
Field | Value |
---|---|
Format | CSV |
License | CDLA-Sharing |
Domain | Time Series |
Number of Records | 194,385,636 flights |
Data Split | NA |
Size | 81 GB |
Dataset Origin | Bureau of Transportation Statistics |
Dataset Version Update | Version 1 – June 25, 2020 |
Data Coverage | Location: United States Dates: 1987 through 2020 |
Business Use Case | Aviation: Predict which flights are likely to arrive on time |
Dataset Archive Contents
File or Folder | Description |
---|---|
airline.csv |
All records |
airline_2m.csv |
Random 2 million record sample (approximately 1%) of the full dataset |
lax_to_jfk.csv |
Approximately 2 thousand record sample of flights from LAX to JFK airport |
Data Glossary and Preview
Click here to explore the data glossary, sample records, and additional dataset metadata.
Use the Dataset
This dataset is complemented by data exploration, data analysis, and modeling Python notebooks to help you get started:
- Run the notebooks as a pipeline using the Elyra extension for JupyterLab
- Run the data exploration notebook in Watson Studio
Citation
This dataset was compiled from data available on the Bureau of Transportation Statistics website and is US Government work not subject to copyright.