Predicting Marathon Finishing Times Using Ensemble Learning: An Empirical Study on Boston Marathon Data
VERSION OF RECORD ONLINE: 11/09/2025
Corressponding author's email:
phuongttn@hcmute.edu.vnDOI:
https://doi.org/10.54644/jte.2025.1924Keywords:
Ensemble Learning, Marathon Prediction, Boston Marathon, Machine learning, Performance forecastingAbstract
This study proposes an ensemble machine learning model to predict marathon finishing times, using empirical data from the Boston Marathon spanning 2015–2017. After thorough preprocessing and feature engineering—including intermediate checkpoint times (5K, 10K, Half Marathon), age, gender, nationality, and year of participation—six models were implemented and evaluated: K-Nearest Neighbors (KNN), Artificial Neural Network (ANN), Case-Based Reasoning (CBR), a prior benchmark model (FA-PP-R-ML), Long Short-Term Memory (LSTM), and a novel ensemble model combining Linear Regression, Random Forest, and MLPRegressor via a meta-learning approach. Experimental results on the test set demonstrate that the proposed ensemble model achieved the highest predictive performance, with a Mean Absolute Error (MAE) of 7.32 minutes, Root Mean Squared Error (RMSE) of 11.06 minutes, and R² score of 0.928—outperforming all baseline models in both accuracy and robustness. Visualization techniques such as scatter plots and boxplots further confirmed the model’s high agreement between predicted and actual values. Nevertheless, the study acknowledges several limitations, including a constrained dataset limited to three years of a single event, a narrow scope of model comparison, simplifications in algorithmic assumptions, and limited hyperparameter tuning. Future work should explore more diverse datasets, incorporate exogenous factors (e.g., weather, elevation), adopt advanced modeling techniques such as attention mechanisms, graph-based learning, or AutoML, and enhance model interpretability to support real-world applications in athlete coaching and performance forecasting.
Downloads: 0
References
A. Keogh, O. Sheridan, O. McCaffrey, S. Dunne, A. Lally, and C. Doherty, “The determinants of marathon performance: An observational analysis of anthropometric, pre-race and in-race variables,” Int. J. Exerc. Sci., vol. 13, no. 6, pp. 1132–1142, 2020.
W. Yong, P. Lingyun, and W. Jia, “Statistical analysis and ARMA modeling for the big data of marathon score,” Sci. Sports, vol. 35, no. 6, pp. 375–385, 2020.
Rojour, “Finishers Boston Marathon 2015, 2016 & 2017,” Kaggle, 2017. [Online]. Available: https://www.kaggle.com/datasets/rojour/boston-results. Accessed: 2025.
L. Lerebourg, D. Saboul, M. Clémençon, and J. B. Coquart, “Prediction of marathon performance using artificial intelligence,” Int. J. Sports Med., vol. 44, no. 5, pp. 352–360, 2023.
C. Feely, B. Caulfield, A. Lawlor, and B. Smyth, “Using case-based reasoning to predict marathon performance and recommend tailored training plans,” in Proc. 28th Int. Conf. Case-Based Reasoning (ICCBR 2020), 2020.
J. Chen, “Factor and correlation analysis for predicting marathon race performance using machine learning algorithms,” J. Electr. Syst., pp. 1948–1958, 2024.
H. Muijlwijk, B. Smyth, M. C. Willemsen, and W. A. IJsselsteijn, “Benefits of human-AI interaction for expert users interacting with prediction models: A study on marathon running,” in Proc. 29th Int. Conf. Intell. User Interfaces (IUI ’24), Greenville, SC, USA, 2024.
Y. Ding, “Analyzing athletes’ physical performance and trends in athletics competitions using time series data mining algorithms,” J. Electr. Syst., pp. 736–746, 2024.
K. K. El-Kassabi and M. A. S. H. Taha, “Deep learning approach for forecasting athletes’ performance in sports tournaments,” unpublished.
R. Huang, Z. Qian, H. Ma, Z. Han, and Y. Xie, “Sports performance prediction for college students through ensemble learning algorithm,” IEICE Trans. Inf. Syst., vol. E108.D, no. 7, pp. 776–783, 2025.
T. Anande, S. Alsaadi, and M. Leeson, “Enhanced modelling performance with boosting ensemble meta learning and Optuna optimization,” SN Comput. Sci., vol. 6, Art. no. 12, 2024.
Rojour, “boston_results: Scrapping and visualizing Boston Marathon results,” GitHub, 2017. [Online]. Available: https://github.com/rojour/boston_results. Accessed: 2025.
D. H. Wolpert, “Stacked generalization,” Neural Netw., vol. 5, no. 2, pp. 241–259, 1992.
D. E. Rumelhart, G. E. Hinton, and R. J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533–536, 1986.
T. M. Cover and P. E. Hart, “Nearest neighbor pattern classification,” IEEE Trans. Inf. Theory, vol. 13, no. 1, pp. 21–27, 1967.
J. L. Kolodner, “An introduction to case-based reasoning,” Artif. Intell. Rev., vol. 6, pp. 3–34, 1992.
S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
A. K. Kuchibhotla and L. D. Brown, “Model-free study of ordinary least squares linear regression,” arXiv preprint arXiv:1809.05296, Sep. 2018.
S. Lee, “7 surprising stats where linear regression shapes sports data analysis,” Number Analytics, LLC, Mar. 19, 2025. [Online]. Available: https://www.numberanalytics.com/blog/surprising-stats-linear-regression-sports-data-analysis. Accessed: Apr. 29, 2025.
TechGoGreen, “Random forest algorithm,” TechGoGreen, Jun. 20, 2023. [Online]. Available: https://techgogreen.com/random-forest-algorithm/?utm_source=chatgpt.com. Accessed: Apr. 29, 2025.
A. Kumar, “Sklearn neural network example – MLPRegressor,” Analytics Yogi, May 2, 2023. [Online]. Available: https://vitalflux.com/sklearn-neural-network-regression-example-mlpregressor/. Accessed: Apr. 29, 2025.
V. Hua, N. T. Dang, M. S. Nguyen, H. N. Bui, and A. B. Arun, “The impact of data imputation on air quality prediction problem,” PLoS One, vol. 19, no. 9, Art. no. e0306303, 2024.
A. Vaswani et al., “Attention is all you need,” in Proc. 31st Conf. Neural Inf. Process. Syst. (NeurIPS 2017), Long Beach, CA, USA, 2017.
Z. Wu, S. Pan, F. Chen, G. Long, C. Zhang, and P. S. Yu, “A comprehensive survey on graph neural networks,” IEEE Trans. Neural Netw. Learn. Syst., vol. 32, no. 1, pp. 4–24, 2021.
X. He, K. Zhao, and X. Chu, “AutoML: A survey of the state of the art,” Knowl.-Based Syst., vol. 212, Art. no. 106622, 2021.
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2025 Journal of Technical Education Science

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Copyright © JTE.


