Skip to contents

This function generates a dataframe similar to the flights dataset from nycflights13 for any US airport and time frame. Please note that, even with a strong internet connection, this function may take several minutes to download relevant data.

Usage

get_flights(station, year, month = 1:12, dir = NULL, ...)

Source

RITA, Bureau of transportation statistics, https://www.bts.gov

Arguments

station

A character vector giving the origin US airports of interest (as the FAA LID airport code).

year

A numeric giving the year of interest. This argument is currently not vectorized, as dataset sizes for single years are significantly large. Information for the most recent year is usually available by February or March in the following year.

month

A numeric giving the month(s) of interest.

dir

An optional character string giving the directory to save datasets in. By default, datasets will not be saved to file.

...

Currently only used internally.

Value

A data frame with ~1k-500k rows and 19 variables:

year, month, day

Date of departure

dep_time, arr_time

Actual departure and arrival times, UTC.

sched_dep_time, sched_arr_time

Scheduled departure and arrival times, UTC.

dep_delay, arr_delay

Departure and arrival delays, in minutes. Negative times represent early departures/arrivals.

hour, minute

Time of scheduled departure broken into hour and minutes.

carrier

Two letter carrier abbreviation. See get_airlines to get full name

tailnum

Plane tail number

flight

Flight number

origin, dest

Origin and destination. See get_airports for additional metadata.

air_time

Amount of time spent in the air, in minutes

distance

Distance between airports, in miles

time_hour

Scheduled date and hour of the flight as a POSIXct date. Along with origin, can be used to join flights data to weather data.

Details

This function currently downloads data for all stations for each month supplied, and then filters out data for relevant stations. Thus, the recommended approach to download data for many airports is to supply a vector of airport codes to the station argument rather than iterating over many calls to get_flights().

Note

If you are repeatedly getting a timeout error when downloading flights, this could be because your download is taking longer than the default timeout R option. You can change the timeout value for your R session by running the code options(timeout = timeout_value_in_seconds) in your console.

See also

get_weather for weather data, get_airlines for airlines data, get_airports for airports data, get_planes for planes data, or anyflights for a wrapper function.

Use the as_flights_package function to convert this dataset to a data-only package.

Examples


# flights out of Portland International in June 2018
if (FALSE) get_flights("PDX", 2018, 6)

# ...or the original nycflights13 flights dataset
if (FALSE) get_flights(c("JFK", "LGA", "EWR"), 2013)

# use the dir argument to indicate the folder to 
# save the data in \code{dir} as "flights.rda"
if (FALSE) get_flights("PDX", 2018, 6, dir = tempdir())