Airline Reporting Carrier On-Time Performance Dataset


The Reporting Carrier On-Time Performance Dataset contains information on approximately 200 million domestic US flights reported to the United States Bureau of Transportation Statistics. The dataset contains basic information about each flight (such as date, time, departure airport, arrival airport) and, if applicable, the amount of time the flight was delayed and information about the reason for the delay. This dataset can be used to predict the likelihood of a flight arriving on time.

Get this Dataset

Data Description Zipped File Name
Full (Original) Dataset, 7.2 GB airline.tar.gz
2 Million Row Sample Dataset, 152 MB airline_2m.tar.gz
LAX to JFK Sample Dataset, 58 KB lax_to_jfk.tar.gz

Dataset Metadata

Field Value
Format CSV
License CDLA-Sharing
Domain Time Series
Number of Records 194,385,636 flights
Data Split NA
Size 81 GB
Dataset Origin Bureau of Transportation Statistics
Dataset Version Update Version 1 – June 25, 2020
Data Coverage Location: United States
Dates: 1987 through 2020
Business Use Case Aviation: Predict which flights are likely to arrive on time

Dataset Archive Contents

File or Folder Description
airline.csv All records
airline_2m.csv Random 2 million record sample (approximately 1%) of the full dataset
lax_to_jfk.csv Approximately 2 thousand record sample of flights from LAX to JFK airport

Data Glossary and Preview

Click here to explore the data glossary, sample records, and additional dataset metadata.

Use the Dataset

This dataset is complemented by data exploration, data analysis, and modeling Python notebooks to help you get started:


This dataset was compiled from data available on the Bureau of Transportation Statistics website and is US Government work not subject to copyright.