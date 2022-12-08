Ethical approval

The institutional review board of Ethikkommission Kanton St. Gallen, Switzerland, approved this study (EKSG 01/06/2010). Since the study involved the analysis of publicly available data, the requirement for informed consent was waived.

Strengthening the reporting of observational studies in epidemiology

The authors followed the STROBE checklist.

The race

The ‘New York City Marathon’ is the largest city marathon in the world. The race takes place annually on the first Sunday of November in New York City. The course is a point-to-point course, starting at Fort Wadsworth on Staten Island, going through Brooklyn, Queens and the Bronx and then finishing at Central Park in Manhattan. The start of the race is in waves due to the large number of participants. The first start is at 08:00 a.m. with the professional wheelchair division, and at 08:22 a.m. is the start of the handcycle category and selected athletes with disabilities. Then, at 08:40 a.m., the start of the professional women, and at 09:05 a.m., the start of the professional men is held. From 09:10 a.m. to 12:00, five waves with recreational athletes are held. The race was not held in 2012 due to Hurricane Sandy and 2021 due to the COVID-19 pandemic.

Weather data

Historical weather data in hourly intervals were obtained from Weather History Download New York23. The weather values available are hourly readings (between 09:00 a.m. and 04:00 p.m.) of the following magnitudes: temperature (°Celsius; measured at 2 m above ground), barometric pressure (hPa), humidity (%; measured at 2 m above ground) and sunshine duration (min). Between 1999 and 2019, no rain has been ever recorded during the ‘New York City Marathon’.

Subjects

Data were obtained from the race website24 and included name, sex, age, calendar year, and split times at 5 km, 10 km, 15 km, 20 km, 25 km, 30 km, 35 km, and 40 km for all finishers of both sexes. Non-finishers, wheelchairs and handcycles were excluded from the analysis. A total of 560,731 marathon runners’ records were available for analysis (342,799 men (61.2%) and 217,932 women (38.8%)). The race times are recorded by a computer chip attached to the back of the runner’s number, which calculates the difference between the race start and the point of reference24.

Data processing

The processing of the data files involved several steps. First, the data was cleaned up and its integrity was verified by removing formatting errors and ensuring the alignment of the processed values. Next, each qualifying record had the time-adjusted average values of the weather factors imputed. Since the duration of each runner’s race is different, from just over 2 h for elite runners to 6 or 8 h and over for the slowest participants, the average values of the temperature, pressure, and other weather factors they experienced during the race are also slightly different. Considering these differences when calculating and imputing the weather values to each record, we were able to better represent the actual environmental conditions in each case. Finally, the runner´s records were classified into four performance groups for comparison: all runners, top 100, top 10 and top 3. The top 3 sub-population is created by extracting the three best (fastest) athletes (both sexes) from each year’s race. Similarly, the top 10 is created by extracting the best (fastest) ten male and ten female finishers from each year’s race. The top 100 is created by extracting the best (fastest) 100 male and 100 female finishers from each year’s race. These groups are not exclusive, so one record in the top 3 will also be included in the subsequent groups (top 10, top 100 and all runners). Because of this, the groups are nested where the top 100 includes the top 10, including the top 3. After the processing was complete, descriptive statistical methods were used to compare the groups and draw insights and conclusions.

Statistical analysis

Descriptive statistics were presented in mean (standard deviation), minimum, maximum and percentiles (25, 50 and 75). Pearson and Spearman correlation was performed to analyze the average running speed and the weather variables (i.e., temperature, pressure, humidity and sunshine). The effect sizes of the correlations were 0–0.1 = no effect, 0.1–0.3 = small effect, 0.3–0.5 = medium effect and 0.5–1 = large effect, following Cohen25. Also, the distributions of running speeds for each performance group were calculated and displayed with boxplots for easy comparison. Similarly, with the range of weather conditions for each group, the Kolmogorov–Smirnov two-sample test was used to assess the statistical significance of the differences observed, given the unbalanced nature of the nested performance groups. Ordinary Least Squares (OLS) regression was performed to build a predictive model of the average race speed as a function of the weather factors. All analyses were done using the Python programming language (Python Software Foundation, https://www.python.org/) in a Google Colab notebook (https://colab.research.google.com/).