Estimated Glyoxal Concentrations in Foods with Machine Learning-Based Transformation Strategies
Keywords:
Glyoxal, Machine learning, CatBoost, Yeo-Johnson, Advanced glycation end productsAbstract
Background and aim: Glyoxal (GO), which naturally occurs in biological systems and forms during food processing, is a highly toxic reactive glucose compound and precursor formation of advanced glycation end products (AGEs). Estimating GO concentration in food products plays a pivotal role in improving food safety. Methods: This study used machine learning (ML) regression models to estimate the GO content (µg/100 g) in foods via information on nutrients such as carbohydrates, protein, fat, and sugars obtained from studies in the literature. Fourteen algorithms, including tree-based, ensemble, and regularized linear methods, were tested under different target transformation strategies, such as the Yeo–Johnson, quantile, standard, root-mean-square, and logarithmic shift strategies. Coefficient of determination (R²), root mean square error (RMSE), and mean absolute error (MAE) metrics were used to compare the model findings. Results: The LightGBM exhibited the lowest MAE value (13.63) under the Yeo–Johnson transformation, whereas the square-root-transformed CatBoost model presented the highest prediction accuracy (R² = 0.53, RMSE = 22.24) among all the configurations. Findings of the study have indicated that preprocessing significantly improved prediction robustness and that model performance was highly sensitive to the chosen transformation type, suggesting that a concept such as estimated GO (eGO) could be introduced into the literature. The study also demonstrated that the concept of eGO was introduced, while fat content was the most influential variable in the CatBoost models, whereas LightGBM exhibited a more balanced feature contribution, with sugars and carbohydrates prominent under certain transformations. Conclusion: These findings provide useful methodological guidance for data science and food safety professionals when selecting appropriate modeling techniques for chemical prediction in food science.
References
1. Banerjee S. Role of food companies to supply nutritious foods as per buyers changing lifestyles, buying habits and the recent trends. Int J Innov Res Sci Eng Technol 2020; 9(3): 1062–1067.
2. Raczuk E, Dmochowska B, Samaszko-Fiertek J, Madaj J. Different Schiff bases—structure, importance and classification. Molecules 2022; 27(3): 787.
3. Zhang M, Huang C, Ou J, Liu F, Ou S, Zheng J. Glyoxal in foods: Formation, metabolism, health hazards, and its control strategies. J Agric Food Chem 2024; 72(5): 2434–2450. doi:10.1021/acs.jafc.3c08225
4. Zhou X, Zhang Z, Liu X, et al. Typical reactive carbonyl compounds in food products: Formation, influence on food quality, and detection methods. Compr Rev Food Sci Food Saf 2020; 19(2): 503–529. doi:10.1111/1541-4337.12535
5. Khan MI, Shin JH, Kim JD. Advanced glycation end product signaling and metabolic diseases: A comprehensive review. Metabolites 2023; 13(7): 770. doi:10.3390/metabo13070770
6. Uribarri J. Dietary advanced glycation end products and food toxicity. 2024/2025; early view. PMID: 40779335
7. Treibmann S, Widdecke H, Henle T. Glycation reactions of methylglyoxal during digestion in a simulated gastrointestinal model. Molecules 2024; 29(9): 2056. doi:10.3390/molecules29092056
8. Zhang M, Huang C, Ou J, Liu F, Ou S, Zheng J. Glyoxal in foods: Formation, metabolism, health hazards, and its control strategies. J Agric Food Chem 2024; 72(5): 2434–2450. doi:10.1021/acs.jafc.3c08225
9. Zhao M, Chen Z, Qiu H, Liu H, Guo T. Inhibitory effect of guava leaf polyphenols on advanced glycation end products and their mechanism. Foods 2022; 11(16): 2495. doi:10.3390/foods11162495
10. Lim HH, Weng SW, Hsu CS, Chen HC, Lin CH. In-solution derivatization and detection of glyoxal and methylglyoxal in alcoholic beverages and fermented foods by HS-SPME–GC–MS. Food Res Int 2020; 133: 109129. doi:10.1016/j.foodres.2020.109129
11. Menichetti G, Leclercq C, Collova C, et al. Machine learning prediction of the degree of food processing. Nat Commun 2023; 14: 4562. doi:10.1038/s41467-023-37457-1
12. Wan X, Wei J, Ji M, Chen Y, Xu X, Zhou G. Machine learning prediction of exposure to acrylamide using urinary biomarkers. Food Chem Toxicol 2022; 170: 113449. doi:10.1016/j.fct.2022.113449
13. Wu X, Zhou Y, Cui H, Li C, Tang Z. Predicting acrylamide contents in fried dough twist with image features and neural networks. Food Prod Process Nutr 2024; 6: 27. doi:10.1186/s43014-023-00212-6
14. Samuel HS, Etim EE, Nweke-Maraizu U, Yakubu S. Machine learning in chemical kinetics: Predictions, mechanistic analysis, and reaction optimization. Appl J Environ Eng Sci 2024; 10(1): al-Appl.
15. Li L, Zhuang Y, Zou X, Chen M, Cui B, Jiao Y, Cheng Y. Advanced glycation end products: Detection and occurrence in food. Foods 2023; 12(11): 2103. doi:10.3390/foods12112103
16. Aria M, Cuccurullo C. bibliometrix: An R-tool for comprehensive science mapping analysis. J Informetr 2017; 11(4): 959–975.
17. Donthu N, Kumar S, Mukherjee D, Pandey N, Lim WM. How to conduct a bibliometric analysis: An overview and guidelines. J Bus Res 2021; 133: 285–296.
18. Zhang W, Zhang Q, Yu B, Zhao L. Knowledge map of creativity research based on keywords network and co-word analysis, 1992–2011. Qual Quant 2015; 49(3): 1023–1038.
19. Rabbani N. AGEomics biomarkers and machine learning—realizing the potential of protein glycation in clinical diagnostics. Int J Mol Sci 2022; 23(9): 4584.
20. Niu L, Kong S, Chu F, Huang Y, Lai K. Investigation of AGEs, α-dicarbonyl compounds, and their correlations with chemical composition and salt levels in commercial fish products. Foods 2023; 12(23): 4324. doi:10.3390/foods122
21. Raymaekers J, Rousseeuw PJ. Transforming variables to central normality: A review of the Yeo–Johnson approach. Mach Learn 2020; 109(7): 1379–1401. doi:10.1007/s10994-021-05960-5
22. Riani M, Atkinson AC, Cerioli A. Automatic robust Box–Cox and extended Yeo–Johnson transformations in regression. 2023. Available at: https://ideas.repec.org/p/ehl/lserod/114903.html
23. Hamasha MM, Ali H, Hamasha SD, Ahmed A. Ultrafine transformation of data for normality. Heliyon 2022; 8(5): e09370. doi:10.1016/j.heliyon.2022.e09370
24. Hodson TO. Root mean square error (RMSE) or mean absolute error (MAE): When to use them or not. Geosci Model Dev Discuss 2022: 1–10. doi:10.5194/gmd-15-5481-2022
25. Onyango AN. Small reactive carbonyl compounds as tissue lipid oxidation products and their mechanisms. Chem Phys Lipids 2012; 165(7–8): 777–786. doi:10.1016/j.chemphyslip.2012.09.004
26. Chai T, Draxler RR. Root mean square error (RMSE) or mean absolute error (MAE)? Geosci Model Dev 2014; 7: 1247–1257. doi:10.5194/gmd-7-1247-2014
27. Willmott CJ, Matsuura K. Advantages of MAE over RMSE in assessing average model performance. Clim Res 2005; 30(1): 79–82. doi:10.3354/cr030079
28. Raymaekers J, Rousseeuw PJ. Transforming variables to central normality. Mach Learn 2021; 113(8): 4953–4975. doi:10.1007/s10994-021-05960-5
29. Chen X, Chen X, Liu Y, et al. Assessing aromatic-driven glyoxal formation over Eastern China. Remote Sens 2025; 17(18): 3174. doi:10.3390/rs17183174
30. Jalali MW, Saidi B, Farahmand H, Panah MAR, Saruhan EN. Scalable AI-driven air quality forecasting for public health. Discov Atmos 2025; 3(1): 25. doi:10.1007/s44292-025-00052-8
31. Liu BH, Zhang LW, Wei YQ, Chen C. Dual power transformation and Yeo–Johnson techniques for reliability assessments. Buildings 2024; 14(11): 3625. doi:10.3390/buildings14113625
32. Brenning A. Interpreting machine-learning models in transformed feature space. Mach Learn 2023; 112(9): 3455–3471. doi:10.1007/s10994-023-06327-8
33. Gyawali A. Utilisation of remote sensing, machine learning, and agent-based simulation for biophysical assessment. 2025. URN:ISBN:978-952-412-343-3
34. Karwowska Z, Aasmets O, Metspalu M, et al. Effects of data transformation and model selection on feature importance in microbiome classification data. Microbiome 2025; 13(1): 2. doi:10.1186/s40168-024-01996-6
35. Inan-Eroğlu E, Ayaz A. Formation of AGEs in foods during cooking and underlying mechanisms. Nutr Res Rev 2020; 33(2): 273–286. doi:10.1017/S0954422419000209
36. Kharbach M. AI-powered advances in data handling for food analysis. Foods 2025; 14(19): 3415. doi:10.3390/foods14193415
37. Carstensen PE, Bendsen J, Reenberg AT, Ritschel TK, Jørgensen JB. A whole-body multiscale mathematical model for human metabolism. IFAC-PapersOnLine 2022; 55(23): 58–63. doi:10.1016/j.ifacol.2023.01.015
38. Odnoblyudova A, Hizli C, John ST, et al. Nonparametric modeling of nutrient effects on blood glucose dynamics. ML4H Proc 2023: 428–444.
39. Jiang Y, Wang S, Shuai J, et al. Dietary dicarbonyl compounds exacerbate immune dysfunction and oxidative stress. Food Funct 2024; 15(3): 1234–1245. doi:10.1039/d3fo05708a
40. Dangal A, Tahergorabi R, Acharya DR, et al. Review on deep-fat fried foods: Physical and chemical attributes. Eur Food Res Technol 2024; 250(6): 1537–1550. doi:10.1007/s00217-024-04482-3
41. Vu TP, Gumus-Bonacina CE, Corradini MG, He L, McClements DJ, Decker EA. Role of solid fat content in oxidative stability of crackers. Antioxidants 2022; 11(11): 2139. doi:10.3390/antiox11112139
42. Zhou X, Zhang Z, Liu X, et al. Typical reactive carbonyl compounds in food products. Compr Rev Food Sci Food Saf 2020; 19(2): 503–529. doi:10.1111/1541-4337.12535
43. Ma Y, Long Y, Li F, et al. Exploring acrylamide and HMF formation in glucose–asparagine–linoleic acid systems. Front Nutr 2022; 9: 940202. doi:10.3389/fnut.2022.940202
44. Liakos KG, Athanasiadis V, Bozinou E, Lalas SI. Machine learning for quality control in the food industry. Foods 2025; 14(19): 3424. doi:10.3390/foods14193424
45. Yusufoğlu B, Karakuş E, Yaman M. Determining methylglyoxal and glyoxal in functional snack foods with herbal teas. Food Sci Technol 2022; 42: e82621. doi:10.1590/fst.82621
46. Al-Abbasy OY, Younus SA, Rashan AI, Ahmad OAS. Maillard reaction: Formation, advantage, disadvantage and control. Food Sci Appl Biotechnol 2024; 7(1): 145–161. doi:10.30721/fsab2024.v7.i1.333
47. Vistoli GA, De Maddis D, Cipak A, et al. Advanced glycoxidation and lipoxidation end products: Formation mechanisms. Free Radic Res 2013; 47(sup1): 3–27. doi:10.3109/10715762.2013.815348
48. Xiong X, Xue Y, Cai Y, He J, Su H. Prediction of personalised postprandial glycaemic response. Front Endocrinol 2024; 15: 1423303. doi:10.3389/fendo.2024.1423303
49. Hu Y, Ma B, Wang H, et al. Non-destructive detection of pesticide residues using hyperspectral imaging. Foods 2023; 12(9): 1773. doi:10.3390/foods12091773
50. Olaoye SA, Owoseni O, Olalusi A. Optimization of soybean residue-based floating fish feed. Turk J Agric Eng Res 2022; 3(1): 31–50. doi:10.46592/turkager.1008345
51. Beura M, Salman CM, Rahaman S, et al. Prediction of resistant starch using machine learning. Int J Adv Eng Sci Appl Math 2025: 1–15. doi:10.1007/s12572-025-00386-x
52. Naravane T, Tagkopoulos I. Machine learning models to predict micronutrient profile in food after processing. Crit Rev Food Sci Nutr 2023; 63(6): 100500. doi:10.1016/j.crfs.2023.100500
53. Muthukumar KA, Gupta S, Saikia D. Machine learning techniques to analyze nutritional content in processed foods. Discov Food 2024; 4: 182. doi:10.1007/s44187-024-00253-x
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Busra Yusufoglu

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by-nc/4.0) which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Transfer of Copyright and Permission to Reproduce Parts of Published Papers.
Authors retain the copyright for their published work. No formal permission will be required to reproduce parts (tables or illustrations) of published papers, provided the source is quoted appropriately and reproduction has no commercial intent. Reproductions with commercial intent will require written permission and payment of royalties.

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
