nature.com

Big data analytics in food industry: a state-of-the-art literature review

AbstractThe food industry has experienced rapid growth over the past two decades, driven by technological advancements that have generated vast quantities of complex data. However, the industry’s ability to effectively analyze and leverage this data remains limited due to the lack of control over diverse variables. This review addresses a critical gap by exploring how AI-ML-based approaches can be applied to solve key challenges in the food sector.

IntroductionAccess to safe, healthy, and adequate food is fundamental for the expanding world population. The food sector faces increasing issues in maintaining food safety, quality, and supply due to escalating population demands and evolving consumer tastes. Confronting these difficulties necessitates innovative, cost-effective, and sustainable solutions to enhance the efficiency of food production, distribution, and safety monitoring systems1. Integrating Big Data Analytics (BDA) has become an essential instrument, providing transformative capabilities throughout the food sector, including supply chain optimization and improving food quality and safety.Big Data Analytics denotes the application of sophisticated data processing and analytical instruments to get significant insights from extensive and intricate datasets. Within the food business, BDA can furnish actionable insights to enhance food processing techniques, minimize waste, optimize inventory control, and guarantee adherence to safety regulations2,3. For instance, by employing real-time surveillance of production processes and supply chain logistics, Big Data Analytics (BDA) may mitigate inefficiencies and augment the traceability of food items, thereby elevating the overall quality and safety of the food supply chain.The implementation of Big Data Analytics techniques has been progressively rising in the food industry. Investment in Big Data Analytics (BDA) and Artificial Intelligence (AI) rose from 27% in 2018 to 33.9% in 2019, with the sector valued at $169 billion in 2018 and anticipated to attain $274 billion by 20224,5. Despite increasing interest, there is an absence of comprehensive synthesis in the literature concerning the specific uses, efficacy, and challenges of Big Data Analytics and Machine Learning in the food business. This systematic review seeks to bridge this gap by consolidating the literature on big data analytics and machine learning applications in food safety, quality, and processing. The review aims to address the subsequent research questions (RQs):RQ1: Which machine learning algorithms are most helpful in enhancing food safety and quality within the food industry?RQ2: What are the most often utilized BDA processes in food processing, and what are their effects?RQ3: What are the principal problems and constraints in implementing Big Data Analytics and Machine Learning within the food industry?This comprehensive study is essential due to the fragmentation of previous studies, which frequently concentrate on technologies or isolated case analyses. This paper intends to thoroughly examine the uses, advantages, and problems of Big Data Analytics (BDA) and Machine Learning (ML) in the food industry through a systematic and replicable methodology. The results will possess substantial theoretical and practical ramifications for food technologists, data scientists, industry professionals, and legislators. The document is organized as outlined below: The “Methodology” section delineates the methodology employed for the systematic review, encompassing the search strategy, inclusion criteria, and data extraction procedure. The “Definition and characteristics of big data” section provides a thematic overview of the current literature, emphasizing the predominant machine-learning approaches and their applications in food safety, quality, and processing. The “Sources of big data in the food sector” section examines the advantages and drawbacks of various ML models, identifies research deficiencies, and underscores the practical obstacles of executing BDA in the food industry. The “Applications of big data analytics in food safety” section presents conclusions and recommendations for further study and practical application techniques within the food business. This paper seeks to enhance the existing knowledge of Big Data Analytics and Machine Learning within the food sector by comprehensively examining their uses, advantages, and problems. The results will help researchers, industry stakeholders, and regulators formulate data-driven solutions to enhance food safety, quality, and sustainability.MethodologyThis study follows a systematic review approach based on the Kitchenham guidelines for conducting evidence-based reviews in software engineering, adapted for the food industry context6. The methodology was structured to ensure a comprehensive, unbiased, and reproducible synthesis of the current state of Big Data Analytics (BDA) applications in the food sector. The primary goal of this review was to identify the key applications, benefits, and challenges of implementing BDA in food safety, quality, and processing.Search and selection processA comprehensive search was conducted across multiple databases to ensure the collection of relevant literature. The databases used include Scopus, Web of Science, IEEE Xplore, ScienceDirect, and Google Scholar. The search terms employed were combinations of keywords such as “Big Data Analytics,” “Machine Learning in Food Industry,” “Food Safety,” “Food Quality,” “Food Processing,” “Supervised Learning Models in Food quality,” “Multivariate analysis in Food safety,” “IoT and Blockchain technologies in food industries” and “Artificial Intelligence in Food Sector.” The search was limited to peer-reviewed articles, conference papers, and industry reports published between 2010 and 2024.Inclusion and exclusion criteriaTo ensure the relevance and quality of the selected studies, inclusion and exclusion criteria were applied. The inclusion criteria included studies that focused on the application of BDA and ML techniques in food safety, quality, or processing; articles published in peer-reviewed journals and conferences; studies that provided empirical evidence, case studies, or experimental results; and publications in English. The exclusion criteria included studies that did not focus on the food industry, articles without experimental or empirical data, non-English publications, and duplicate studies.The data extraction process involved identifying and cataloging key information from each selected article. The extracted data included the study title and authors, year of publication, type of study (empirical, case study, review, etc.), BDA techniques used (e.g., neural networks, support vector machines, decision trees, etc.), food industry applications (e.g., safety, quality, processing), key findings and conclusions, and identified challenges and limitations. The extracted data were synthesized qualitatively to provide a comprehensive understanding of the current landscape of BDA applications in the food industry. The synthesis involved categorizing studies based on their application area (safety, quality, processing), comparing the effectiveness of different ML models used in the food sector, identifying recurring challenges and limitations faced by researchers and industry practitioners, and highlighting trends and gaps in the existing literature.A quality assessment was conducted to ensure the reliability and validity of the selected studies. The criteria for quality assessment included the relevance of the study to the research questions, clarity and transparency of the methodology used in the study, robustness of the experimental design and data analysis, and generalizability of the study findings to broader food industry contexts.Finally, the findings of this review were reported according to the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines (Fig. 1). Additionally, the key findings were organized into thematic sections covering food safety, quality, and processing to provide a structured narrative. This systematic review methodology ensured a rigorous and comprehensive analysis of the use of Big Data Analytics in the food industry, contributing valuable insights for researchers, industry practitioners, and policymakers.Fig. 1Prisma diagram for the systematic review modified from Page et al.233.Full size imageDefinition and characteristics of big dataBDA was previously defined in the 1990s by the computer industry and is now used as a catch-all term for anything negative or positive about the twenty-first-century technological society7,8. Interestingly, before the 2000s, big data was considered a problem9. Since the advent of computers, a tremendous quantity of data has been produced at an increasingly rapid rate. This condition serves as the primary impetus for both the ongoing and upcoming horizons of research10. Mobile devices, digital sensors, communications, computation, and storage have all advanced in recent years, which has made it possible to collect data9. Industrial Development Corporation (IDC) has mentioned that the total amount of world data has expanded nine times in the past five years11, and Chen and Liu12 have mentioned that this generated will double in the next two years. The necessity for massive corporations like Yahoo, Google, and Facebook to examine large volumes of data gave rise to the relatively new concept known as “big data”13.The term “big data” has been defined in a number of different ways, ranging from the 3V model of Volume, Variety, and Velocity to the 4V model of Volume, Velocity, Variety, and Veracity12,13,14. The amount of the information is referred to as its volume, the pace at which it is received and transmitted is referred to as its velocity, and the sources of the data and the sorts of data are referred to as its variety15. The definition of “big data” was expanded by IBM and Microsoft to include “veracity” or “variability” as the fourth “V.” The unpredictability and dependability of data are what is meant by the term “veracity”16. In order to characterize big data, McKinsey & Company included value as the fourth V. The term “value” alludes to the significance of the insights that are buried inside large amounts of data17. A detailed description of the 4 V’s is given below for a better understanding of data analytics dependencies used in current scenarios18.

Volume: The data that is currently being stored is measured in petabytes, which is troublesome in and of itself; it is expected that during the next few years, it will expand to zettabyte (ZB). This is mostly attributable to the increased utilization of smartphones and social media networking platforms.

ii.

Velocity: The term “velocity” can refer to either the rate at which data is collected or the rate at which it is transferred. The increased reliance on live data presents difficulties for the more conventional methods of data analysis because the data is both too extensive and constantly shifting.

iii.

Variety: Because the data that is collected does not come from a certain set category or from a primary source, it comes in many different raw data forms. These formats can be obtained through the internet, texts, sensors, emails, and other sources, and they can be either structured or unstructured. The sheer magnitude of the problem renders obsolete conventional analytical approaches useless for handling big data.

iv.

Veracity: It is the central objective in this category, and the primary source of difficulty within the dataset is often noise or anomalies well within data.

Big data analytics approaches comprises many different approaches such as supervised learning approaches, unsupervised approaches, Artificial intelligence-based approaches, and decision based approaches, and machine learning (ML) along with data dimensionality reduction techniques (DRT) including linear discriminant Analysis (LDA) (Fig. 2) and Principle component analysis19.Fig. 2Linear discriminant analysis taken from Siddique et al.225. Two class data in dimensional space for LDA analysis to maximize the classifiable data on the hyper-plane.Full size imageMachine learning (ML) can be defined as a sub-branch of artificial intelligence that is based on computer algorithms and is significantly used in predictive analysis models, which can handle large amounts of data and specific trends and patterns19,20. Popular ML methods such as artificial neural network, support vector machines (Fig. 3), backpropagation neural network (Fig. 4), decision tree, random forest, k-means clustering (Fig. 5), k-nearest neighbor are used in the vast engineering areas for data categorization, data clustering regressive predictive modeling, ensemble methods, clustering, transfer learning, image processing, feature extraction, reinforcement learning, natural language processing, and deep learning. Based on learning approaches, there are three types of machine learning methods: supervised, semi-supervised, and unsupervised learning19,20,21.Fig. 3: Support vector machines adopted from Siddique et al.225.A represents split dataset in half, B represents the two-close value, C represents the outlier value as the solution for (B), D represents nonlinear dataset, and E represents use of kernel function and change in dimensionality of data.Full size imageFig. 4Backpropagation neural network from Siddique et al.225. Representing input layer, hidden layer and output layer.Full size imageFig. 5Diagrammatic representation of k-means clustering for cluster formation.Full size imageSources of big data in the food sectorThe most common sources of big food data pertaining to food industries (from harvesting to restaurants), government sectors, health care22, and media posts include news, video, pictures, and audio. An analysis of big data with a high level of quality can contribute to the growth of the food industry20,21.Food regulatory agencies’ dataBig data related to food safety is vital in ensuring public health and food security. While big food data can originate from various sources such as media posts, consumer behavior data, and research studies, regulatory agencies play a pivotal role in collecting, managing, and disseminating structured datasets. These agencies maintain numerous databases that support food safety efforts by tracking foodborne pathogens, monitoring antimicrobial resistance, and ensuring compliance with food safety standards. The following sections highlight key contributions from U.S. and international regulatory agencies and challenges in data sharing and integration.Key U.S. food safety regulatory agencies and data systemsIn the United States, several federal agencies are responsible for ensuring food safety through data collection and monitoring systems. The four major agencies include the Food and Drug Administration (FDA) under the Department of Health and Human Services (DHHS), the Food Safety and Inspection Service (FSIS) under the U.S. Department of Agriculture (USDA), the Environmental Protection Agency (EPA), and the National Marine Fisheries Service (NMFS) under the Department of Commerce.

Food and Drug Administration (FDA)

The FDA plays a crucial role in monitoring food safety through the Inspection Classification Database (ICD) and Import Refusal Reports (IRR) databases. The ICD provides real-time inspection data, including violations, corrective actions, and classifications for both domestic and international facilities22,23. The IRR lists products refused entry into the United States, detailing the product type, country of origin, reason for refusal, and date of refusal. These datasets are typically updated weekly, with a time lag of one to two weeks from the date of the event to the public record23,24,25.

The Center for Food Safety and Applied Nutrition (CFSAN), a part of the FDA, ensures that food quality is safe for human consumption. Food safety activities consumed approximately $1.6 billion of the FDA budget in 202123,26,27,28.

Food Safety and Inspection Service (FSIS)

The U.S. government uses the FSIS system to share food sample analysis reports. The agency provides publicly available reports through its open dataset portal29. These reports include data on foodborne pathogen testing, residue testing, and product recalls, updated monthly and quarterly. The typical time lapse between data collection and public release ranges from 30 to 90 days30,31.

Centers for Disease Control and Prevention (CDC)

The CDC operates several key systems for foodborne illness surveillance, including PulseNet, FoodNet, and the National Antimicrobial Resistance Monitoring System (NARMS).

PulseNet is a molecular subtyping network that detects foodborne outbreaks in real time by comparing DNA fingerprints of pathogens. It collects and stores genomic data from bacterial isolates, with daily or weekly updates. The time-lapse from data collection to availability is typically less than 7 days, making PulseNet a critical tool for real-time outbreak detection32,33.

NARMS tracks antimicrobial resistance patterns in bacteria isolated from humans, animals, and retail meat products34,35,36. NARMS provides annual reports summarizing resistance trends over time. However, due to the complexity of data analysis, there is often a 6- to 12-month time-lapse from data collection to publication36.

FoodNet tracks foodborne illness trends and helps estimate the burden of foodborne diseases in the U.S.37,38.

International food safety data systemsInternational regulatory agencies also play a significant role in collecting and managing big data related to food safety. These agencies maintain databases that provide insights into food consumption patterns, foodborne outbreaks, and regulatory actions across different countries.

European Food Safety Authority (EFSA)

The EFSA provides a comprehensive food consumption database covering dietary patterns across 34 European countries39,40. The database is updated every few years, with the most recent major update in 2018. Depending on the scope of the survey, the time lapse between data collection and release can range from 6 months to 2 years40.

Rapid Alert System for Food and Feed (RASFF)

The RASFF, operated by the European Union, is a popular online health and safety repository for industrial and research purposes41,42. It provides real-time alerts on food safety issues such as contamination, labeling, and fraud. The system is updated daily, and alerts are often issued within 24 h of a reported issue, ensuring minimal time lapse43.

Global Environmental Monitoring System for Food (GEMS/Food)

The World Health Organization (WHO) established the Global Environmental Monitoring System for Food (GEMS/Food) in 1976. The system collects data on food pollutants and chemical contaminants worldwide44. GEMS/Food is updated annually and provides historical data over four decades. However, there is often a time lag of several months to a year before data is fully analyzed and published45.

Genomic databases and whole-genome sequencingIncluding genetic data in food safety activities has significantly increased the data acquired by regulatory networks in recent years. Whole-genome sequencing (WGS) has largely driven the creation of new systems such as GenomeTrakr, EnteroBase, and the National Center for Biotechnology Information’s Pathogen Detection database.

GenomeTrakr is one of the most advanced systems for WGS of foodborne pathogens46,47. It stores genomic data from more than 350,000 isolates from food, environmental, and clinical samples. The system is updated in real time, and data is made available on platforms such as NCBI’s Pathogen Detection database. GenomeTrakr has drastically reduced the time required for pathogen identification, providing near real-time genomic epidemiology capabilities48.

EnteroBase and NCBI Pathogen Detection provide publicly accessible genetic data for tracking foodborne outbreaks and pathogen evolution49,50.

Although the building of an intelligent supervisory system for the food supply chain is helped by the collaboration and exchange of data across the authorities and organizations that are responsible for the regulation of food51,52,53, there are still several obstacles to overcome. Some of the challenges are limited data share, and lack of the standard54. Multiple analysis of the same product by different departments and agencies have led to a concerning problem of waste of resources and increased operating costs. Specific global standard, proper sharing of real-time data within the department and between departments and in between nations for import and export purposes could be a possible area to explore which might be useful in decreasing some of waste of resources and time. There should be development of data mining and classification models that can easily categorize the same product with different names which are also a point of problem in development of network inspection models55.Food industries dataThe food sector is closely linked to agriculture, fishing, poultry, dairy, processing, and restaurants. All these sectors utilize modern equipment to improve organizational management and profitability56,57,58. Integrating advanced technologies such as cloud computing, portable wireless sensors, blockchain, and the Internet of Things (IoT) is crucial for developing cohesive food supply chains59. These technologies enable stakeholders, rendering them essential to the real-time management and oversight of supply chains. The integration of smart technologies within these interrelated sectors enhances the food system’s resilience by minimizing delays, increasing accuracy, and assuring adherence to safety norms60. IoT devices can oversee production conditions, but it is blockchain that truly guarantees product traceability from origin to consumption. This secure and transparent system instills confidence in stakeholders and consumers alike, ensuring that the products they handle and consume are of the highest quality61,62. Moreover, cloud computing enables the effortless transfer of substantial datasets, allowing stakeholders to make data-informed decisions that improve supply chain efficiency. Portable sensors provide real-time data acquisition on soil health, crop conditions, and environmental variables, enabling the optimization of agricultural techniques and the minimization of waste63. By using these sophisticated solutions, the food sector attains enhanced traceability and increases profitability through optimized resource management and prompt decision-making procedures64.Applications of IoT and smart technologies in agriculture and food productionImplementing integrative active technologies enhances the efficiency of agricultural output and business administration65. IoT devices, sensors, and drone technologies have revolutionized conventional agricultural methods by facilitating accurate data acquisition on essential factors, including precipitation, topography, animal health, nutrition, crop sowing, and improved growth cycles66,67. This constant data stream from the aforementioned advanced technological tools enables farmers to make informed decisions, maximize resource utilization, and enhance productivity68,69,70.Sensors are essential in precision agriculture and monitor soil moisture, temperature, and humidity levels71,72,73. Smart irrigation systems (SIS) utilize real-time data from sensors to automatically regulate water usage, thereby reducing waste and ensuring crops receive the necessary amount of water for optimal growth. The SIS conserves water and improves crop yields by sustaining optimal growing conditions74. Sensors can monitor animal health by recording vital signs, detecting illnesses early, and preventing disease outbreaks75,76.Drones outfitted with multispectral cameras represent an essential element of precision agriculture. These drones can sweep extensive agricultural regions, producing comprehensive imagery that aids in identifying crop health concerns such as nutritional deficits77, pest infestations, and water stress78. By identifying these issues promptly, farmers can implement preventive strategies to reduce possible yield losses. Drones can identify regions necessitating precise pesticide administration, thus decreasing chemical usage and mitigating environmental impact79,80.Smart technologies boost crop management and improve equipment maintenance via predictive maintenance models80. Intelligent sensors integrated into agricultural machinery gather data on operational efficiency, detecting potential problems prior to failures. This predictive maintenance strategy not only diminishes unanticipated downtime and enhances operational efficiency but also significantly decreases repair expenses, providing a reassuring sense of cost savings59,81,82. A sensor may identify abnormal vibrations in a tractor’s engine, necessitating fast maintenance to avert total failure during critical farming periods.IoT devices are not just tools, they are empowering agents in supply chain management83. Real-time monitoring of agricultural products from farm to retail guarantees food quality and safety by preserving ideal storage conditions during transit. Smart sensors or biosensors track temperature, humidity, and additional environmental variables to avert rotting and guarantee adherence to food safety standards82. Moreover, IoT systems empower farmers and distributors, giving them the control to furnish precise and transparent information to consumers regarding the provenance and quality of their food items, enhancing consumer trust.The integration of machine learning algorithms with IoT devices significantly improves agricultural efficiency83. These algorithms, when applied to the extensive statistics gathered from sensors and drones, can forecast weather trends, enhance planting plans, and automate harvesting procedures80. For instance, machine learning models can forecast the optimal planting time for specific crops by analyzing past weather data and present soil conditions, thereby enhancing yield potential79,80.The integration of smart technologies in agriculture encompasses not only crop and livestock management but also environmental sustainability. Smart agriculture enhances sustainable farming methods by decreasing water consumption, limiting chemical inputs, and optimizing resource allocation84. Moreover, incorporating renewable energy sources, including solar-powered IoT devices, diminishes the environmental impact of contemporary farming methods. This not only reduces the carbon footprint of farming but also contributes to the overall health of the ecosystem.Implementing IoT and smart technology in agricultural and food production has transformed the industry83. Technologies such as precision farming, predictive maintenance, supply chain transparency, and sustainability equip farmers to tackle contemporary agricultural concerns. The food sector can use real-time data and sophisticated analytics to enhance efficiency, decrease expenses, and guarantee long-term sustainability in food production systems.IoT in food manufacturing and supply chainsRecent studies on IoT in food manufacturing have driven the enhancement of the IoT platform to meet market and consumer demands for high-quality goods85,86. These technologies enable real-time oversight of production operations, guaranteeing food safety and quality. IoT devices, including temperature and humidity sensors, maintain optimal conditions throughout transportation and storage, thereby minimizing spoilage and assuring adherence to safety regulations87. Temperature sensors in refrigerated vehicles can monitor and regulate cooling systems to sustain the optimal temperature, thereby preventing rotting during extended transport. Furthermore, humidity sensors in storage facilities can notify management if circumstances stray from optimal parameters, enabling prompt remedial measures to prevent product deterioration88.IoT applications play a crucial role in predictive maintenance in manufacturing facilities. Sensors affixed to food processing machinery constantly assess performance and identify potential problems before they escalate into malfunctions89. Vibration sensors on conveyor belts or mixers are like early warning systems, detecting irregular patterns that signify mechanical issues, facilitating rapid repairs, and minimizing production downtime. This predictive methodology enhances operational efficiency and instills a sense of reliability, guaranteeing unbroken manufacturing lines90.Blockchain technology is a promising instrument in food supply chains, especially for improving traceability91. Blockchain can establish immutable records of each transaction in the supply chain, from production to sale, guaranteeing that food goods can be traced to their source92,93. This technique is very advantageous for guaranteeing food safety and quality assurance. For instance, if a batch of tainted produce is identified, blockchain data may promptly trace its source farm, the distribution centers it traversed, and the shops that acquired it. This facilitates a focused recall, mitigating public health hazards and diminishing financial damages for enterprises.An illustration of a blockchain application is the IBM Food Trust platform, which partners with prominent retailers such as Walmart to enhance food traceability. The software lets buyers scan a product’s QR code and examine its route from farm to retail. This transparency is not just a feature but a testament to the integrity of the supply chain, fostering consumer trust and displaying a dedication to food safety and ethical sourcing94.Automated inventory management is another application of IoT in food supply chains. IoT devices, such as RFID tags and smart shelves, can monitor inventory levels in real time. These systems alert managers when inventory levels are deficient or when products approach expiration, guaranteeing timely replenishment and prioritizing older items95. This minimizes food waste and enhances supply chain efficiency by averting overstocking and spoiling.IoT-enabled fleet management systems enable logistics companies to monitor their trucks in the transportation sector in real time. GPS trackers and IoT sensors assess routes, fuel efficiency, and vehicle performance96. A logistics company shipping perishable goods can receive notifications if a vehicle strays from its designated path or the refrigeration system fails. This real-time surveillance guarantees food goods’ safe and timely delivery, preserving quality across the supply chain97.Notwithstanding its promise, blockchain technology encounters obstacles in the food sector, including elevated implementation expenses and technical intricacies. Small and medium-sized firms (SMEs) may encounter difficulties adopting these technologies due to limited resources. Moreover, interoperability challenges among various blockchain platforms can impede wider adoption. To mitigate these problems, industry players must establish standardized protocols and collaborative platforms that enhance the accessibility and scalability of blockchain solutions.In recent years, studies on the Internet of Things (IoT) in food manufacturing have encouraged the expansion of the IoT platform in order to fulfill market needs80,85 diverse monitoring models and unbalanced energy usage59,86. Applications that integrate the Internet of Things will assist food industries in the creation of new data sources87. Not only does Industry 4.0 encourage the rapid agricultural evolution 4.0, but it also makes it possible for businesses to send real-time data to recognize and fulfill the shifting stakeholder requirements80,81,86,87,98.According to a Eurostat report use of smart agriculture will help in the reduction of agricultural costs by 4–6% and will increase profitability by 3% by 202699. Implementation of these approaches will help industries tackle the food production problems and facilitate the lowing of raw materials. It encourages smart agriculture, which saves resources like water, maintains soil, limits carbon pollution, and improves productivity73,99. Smart agriculture enables producers, network operators, the administration, as well as other stakeholders to exchange their insights enhancing the agro value chain for sustainable development80,81,86. These big data analytics approaches have several challenges, including data fairness, process traceability, reusability of shared data, and lack of standard information. The lack of well-developed defined protocols has generated inconsistencies among data managerial platforms59,80,81,85,86,87. Insecure IoT nodes inside the global food supply also pose a threat and might weaken the system. Many firms employ cloud computing, but its application to massive data on food safety is relatively new81. Durability, data equality, information security, and legal difficulties remain unresolved85. Blockchain technology could make the food production process safer and much more accessible, but it’s underdeveloped and complicated to use92. Currently, blockchain’s product safety usage is confined to traceability93. Data validation and information management still need exploration80,81,83,85,91,97,99.Interactive media dataSocial media platforms significantly contribute to accumulating big data and generating valuable insights. In 2022 social media absorption reached 4.65 billion people, representing 57.8% of the global population100. Consumers engage with food at multiple stages of the distribution chain, encompassing transactions, consumption, assessment, and experience sharing, producing substantial amounts of data101. These created datasets are progressively available via digital media platforms, including social networking sites, search histories, user ratings, comments, and archives of sales revenue and usage records102. Advanced data mining techniques are crucial for collecting and analyzing various data streams to produce meaningful insights102,103.A continuous flow of videos, articles, and other content can be found on social media platforms101. Food-related information can also be obtained from platforms like Facebook, where users share food-related content104. Fried et al.105 examined three million Twitter postings to forecast demographic traits. They created a real-time online query engine for visualizing the gathered datasets in which the generated model surpassed current baseline models, illustrating the potential of social media data for real-time analytics. Singh et al.106 conducted a comparable study analyzing Twitter data to identify supply chain management challenges within the food industry was conducted by Singh et al.106, which shows the promising potential of the future of social media data in enhancing food safety and operational efficiency within the food supply chain.Data collected from various media sources and social networking platforms raises the question of data veracity, including data accuracy, dependability, and trustworthiness. Some social media platforms may contain inaccurate information, leading to serious issues and wrong decisions. For example, rumors and misinformation regarding food safety can easily influence consumer behavior107. Resolving these difficulties necessitates algorithms that consolidate data from various sources and eliminate erroneous or irrelevant information108. Ensuring data authenticity necessitates applying data validation procedures to identify and eliminate erroneous or fraudulent information. Machine learning algorithms play a crucial role in this process, as they can identify anomalies and contradictions in social media data, guaranteeing that only genuine information is employed for decision-making. Moreover, integrating fact-checking methods into social media monitoring techniques can significantly improve the dependability of the gathered data, providing a sense of reassurance about the reliability of the data.As big data analytics (BDA) methodologies remain in the developmental phase, most current technologies predominantly depend on simulation models. To counteract the dissemination of misinformation in real time, administrative authorities must recognize the urgency of consistently surveilling social networking platforms and websites to develop an efficient and authoritative integrated system. These initiatives can also examine and shape public opinion and behavior about food safety and associated matters109,110.The extensive food-related data produced both internally and outside can assist firms in increasing their market share111. Financial transaction data provide comprehensive records of consumer food consumption, enabling the analysis of patterns and trends. This information can improve conventional investigative techniques, such as tracking the causative food vehicle during a foodborne outbreak or pinpointing the source of contamination in retail establishments and restaurants.Machine learning algorithms are essential for assessing aggregated sales data. The utilization of this data for epidemic surveillance and outbreak investigations has been evidenced in multiple instances107,112. In epidemic investigations, food goods exhibiting sales trends that correspond with the outbreak’s dissemination are likely to represent the sources of contamination113. Probabilistic models employing purchase histories and maximum-likelihood estimation can discern clusters of possibly contaminated products113,114. This theory-driven model can be augmented by classification-based learning techniques to identify patterns in the data and enhance accuracy115.Unsupervised classification algorithms have been employed to examine spatial distribution patterns of analogous food products to discern indistinguishable categories116. Kaufman et al.117 examined a comparable methodology utilizing weekly sales data from 580 supermarket products in Germany, implementing simulated foodborne pandemic scenarios. The model was enhanced to incorporate time, client mobility, and noise and was effectively evaluated during a real-world epidemic in Norway117,118.To tackle data veracity issues, enterprises and researchers must establish comprehensive data verification frameworks capable of eliminating misinformation and guaranteeing dependable analytics. These frameworks can include natural language processing (NLP) approaches to discover textual inconsistencies with image recognition algorithms to detect misleading graphics. Integrating these verification techniques into big data platforms enables stakeholders in the food business to enhance the accuracy and reliability of their insights, thereby mitigating the danger of false alarms and unwarranted recalls.In conclusion, although social media platforms and digital data sources have extensive prospects for big data analytics in the food industry, maintaining data validity is a significant concern. Creating sophisticated algorithms for data validation and verification is crucial for maximizing the potential of these data streams, hence maintaining public safety and consumer confidence in the advancing digital food ecosystem.Food-related text dataIn machine learning methodologies, text data can be easily compared as “oil” for algorithmic systems, which is crucial for operating ML-based models. In contrast to organized commercial and sales data, text data typically consists of unrefined natural-language text that can deliver real-time insights regarding food safety contamination events or dangers119. Textual data sources encompass customer status updates, evaluation platforms, online content from media organizations and professional entities, and proprietary corporate web-based information55,119. These sources offer insights into nascent food safety trends, consumer apprehensions, and possible risks.Text data has been utilized in diverse online data mining, text mining, pattern recognition, and natural language processing (NLP) systems to improve conventional food safety monitoring systems by delivering notifications and alerts regarding foodborne illnesses or safety concerns55. For instance, customer posts on social media sites, including Twitter55,120,121,122, Facebook, Yelp123, and Amazon124 have been extensively utilized as textual data sources. Posts on these platforms can offer preliminary indications of potential food safety concerns that conventional surveillance techniques may not yet detect. The textual content may encompass natural language writing, titles, hashtags, and embedded images, rendering it a substantial source of unstructured data for analysis.Google Trends data and Twitter tweets have been used to forecast consumer behavior around food safety. Fried et al.105 utilized three million Twitter postings to forecast demographic attributes and created a real-time query engine for viewing the aggregated statistics. Likewise, Singh et al.106 employed Twitter data to discern supply chain management challenges, underscoring the significance of social media data in enhancing food safety protocols.Private content platforms, including corporate comment boards, public forums, and Google search queries, offer substantial food-related textual data120. These data points are crucial for food safety monitoring systems as they assist in identifying possible risks, consumer complaints, and outbreak indicators before they escalate into public health catastrophes. Furthermore, online data from academic and professional organizations’ websites can help to create food safety-related hazard monitoring decisions and support systems that assess outbreaks, recalls, and regulatory measures17.Using textual data for monitoring foodborne diseases presents numerous benefits compared to conventional data sources. According to Oldroyd et al.125 and Tao et al.55, the principal advantages are real-time accessibility and extensive coverage. In contrast to government-issued outbreak reports, which may need weeks for publication, social media posts and internet reviews offer immediate feedback. These prompt reports are especially beneficial for identifying outbreaks among internet demographics frequently underrepresented in conventional foodborne disease surveys37,125.Over the past decade, researchers have increasingly used text data to oversee and identify food safety concerns. Harris et al.120 and Effland et al.123 examined tweets from cities including Chicago, St. Louis, Las Vegas, and New York City to discern foodborne disease trends. In a pioneering study, Sadilek et al.126 used Google search keywords with smartphone location data to pinpoint eateries with inadequate health code compliance. This method proved threefold more effective than conventional Twitter-based algorithms in detecting high-risk establishments. The integration of text and location data has significantly enhanced the precision of identifying potential food safety issues.A significant application of textual data in food safety monitoring is using Amazon reviews for product safety assessment. Maharana et al.124 created a text classification algorithm that examined consumer input to detect food safety concerns in products. The methodology successfully identified unfavorable evaluations of food safety concerns, reassuring consumers and assisting manufacturers and regulators in implementing proactive efforts to mitigate these issues. This approach illustrated the capability of text data mining to augment consumer safety and elevate product quality, instilling confidence in the effectiveness of food safety measures.Textual data is crucial to epidemic detection systems. Kaufman et al.117 and Norström et al.118 utilized weekly sales data and consumer feedback to create predictive models for detecting foodborne disease outbreaks. These models were further enhanced to incorporate temporal variables, including client mobility and data noise. By integrating text-based data sources, researchers successfully identified epidemics that may have remained undetected.Despite its potential, the implementation of text data in food safety encounters numerous hurdles, especially with data reliability. Social media platforms frequently harbor unverified material, complicating the differentiation between credible claims and rumors. Researchers have created machine learning algorithms to evaluate text trends, identify abnormalities, and eliminate erroneous information. This underscores the need for robust validation frameworks to ensure that only credible information is utilized for decision-making.Moreover, natural language processing (NLP) algorithms can classify textual data into pertinent categories, including product complaints, health code violations, and epidemic reports. These solutions facilitate the automation of data validation, guaranteeing that only credible information is utilized for decision-making. The amalgamation of textual data from diverse online available resources, such as social media, online reviews, and professional websites, has revolutionized the monitoring and detection of food safety hazards. Using machine learning algorithms and NLP approaches, researchers and businesses can examine extensive unstructured text data to detect potential risks and enhance food safety protocols. Nevertheless, guaranteeing data integrity is a significant barrier, necessitating the creation of rigorous validation frameworks to provide decision-makers with correct and dependable insights. The further progression of text mining technology will be crucial for maximizing the potential of text data in the food sector, enhancing public health outcomes and customer confidence.Applications of big data analytics in food safetyInfectious diseases that are spread through food continue to be a significant and persistent threat to public health. Foodborne illnesses are responsible for 128,000 hospitalizations and 3000 fatalities per year in the United States. According to the World Health Organization (WHO)127, foodborne pathogens are responsible for the illness of 600 million individuals and the deaths of 420,000 people every single year127. Campylobacteriosis has been the most common foodborne illness in Europe, according to the European Food Safety Authority (EFSA) and the European Centre for Disease Prevention and Control (ECDC), followed by salmonellosis, yersiniosis, Shiga toxin-producing Escherichia coli-STEC infections, and listeriosis. In 2020, listeriosis had the largest portion of hospitalization in foodborne illness cases40.Early detection of harmful organisms and microbial load could improve food safety and prevent foodborne outbreaks. The current “gold-standard” methodology for characterizing foodborne pathogens in food products is reliable, but time-consuming and labor-intensive, restricting food services companies to releasing their commodity to customers first rather than getting entire microbiological information on a specific lot or batch (especially in fresh products). There are several approaches for diagnosing foodborne pathogens in food products, with each having its own advantages and disadvantages in terms of ease of use, consistency of results, schedule and cost-effectiveness, etc. Rapid alternatives like spectroscopic technology have been around for more than a decade, however, there remain barriers to replacing conventional pathogen identification methodologies40,128. The major reasons for these difficulties are the necessity for qualified staff, relatively expensive equipment and pre-analysis procedures for some approaches, such as DNA-based procedures128,129.The difficulty in converting these big datasets stems from the fact that the measurements obtained by different sources contain multiple sources of variability, necessitating the use of varied statistical methods to analyze the data. Multivariate analytics is based on the combined analysis of numerous response variables versus many explanatory variables, allowing for a more comprehensive view of the gathered data and causes of unpredictability in a single run130. When the observations involve large amounts of data, a typical multivariate analysis consists of two steps: (1) data pretreatment and (2) modeling131. Principal components analysis (PCA), cluster analysis (CA), linear discriminant analysis (LDA), and partial least squares (PLS) are the most used multivariable approaches for evaluating these huge variable-associated datasets132. Multivariate statistics (MS) is a subfield of statistics. Even in distinct domains, the ML and MS ideas currently often overlap because of their abilities to analyze high-dimensional datasets, sometimes focusing much more on fundamental relationships between variables (multivariate statistics) as well as the algorithms and their implications (probabilistic statistics) (machine learning)132.For the determination of meat spoilage, Pu et al.133 have used fluorescent spectroscopy on the meat samples stored at 4 and 15 °C with an excitation wavelength of 340 nm. The data was then collected and processed using Multivariate Curve Resolution with Alternating Least-Squares (MCR-ALS) to get statistical information that was connected to fluorescence fluctuations and primarily ascribed to NADH content133.Based on Laser-Induced Breakdown Spectroscopy (LIBS) and Backpropagation Neural Networks, Marcos-Martinez et al.134 developed a technique for identifying Pseudomonas aeruginosa, E. coli, and S. Typhimurium isolates. For this experiment, authors first cultivated the abovementioned cultures in three distinct agar plates, and a base spectrum database was first generated. The spectra were taken in the 200–1000 nm region. Then, using a backpropagation (BP) technique for the training method a three-layer Multilayer Perceptron model was constructed, tested with basic measurements such as sensitivity and specificity, and then externally evaluated. The approach correctly identified both known and unknown samples with an accuracy of 100% regardless of the culture media; nevertheless, the limited sample size used to create the technique was emphasized134.A method for the subjective and statistical detection of four bacterial species/strains, including one Staphylococcus aureus strain, one S. Typhimurium strain, and two E. coli strains, was developed by Liao et al.135. The method is based on the conjunction of Three-Dimensional Surface-Enhanced Raman Scattering (3D SERS) and Laser-Induced Breakdown Spectroscopy (LIBS). For quality diagnosis, the 3D SERS approach was used, while LIBS was used for quantitation. Principal Components Analysis (PCA) and Hierarchical Cluster Analysis (HCA) were used to evaluate the spectral data, which resulted in correct classification in all situations. The LIBS method was then applied to the spectral band 200–800 nm, and the most prominent emission spectra band at 279.5 nm and related to intracellular magnesium ions, was chosen for detection. The spectrum data were examined for this technique by fitting the emission spectrum to a Voigt profile (a probability distribution) to decrease noise and then using log-log linear regression to make quantitative estimates between fitted peak area and bacterial concentrations. The quantification limit was estimated to be around 5 × 103 CFU/ml, and the correlation of reliability R2 was found to be >0.97135.Argyri et al.136, for the very first time, showed that Raman spectroscopy can be a potential rapid approach for the detection of meat spoilage. The method involved analyzing meat samples directly with Raman spectroscopy and conventional microbiological techniques were used to construct a quantitative model following data processing for relationship. During this study, pH variations and organoleptic quality evaluation were also performed, and a subjective framework with three classes (fresh, semi-fresh, spoil) was constructed. Half-out cross-validation models were used in association with multivariate analysis techniques and ML algorithms. The statistical results for Raman Spectroscopy analysis were promising and showed that the Radial basis function of support vector machines (SVM), support vector regression (SVR) and support vector machines regression sigmoid function (SVRP) achieved a 70% accuracy rate for all counts. However, genetic algorithm artificial neural network (GA-ANN) analysis showed improved classification results with no classification issues on fresh, semi-fresh, and spoiled meat samples (Table 1)136.Table 1 Summary table for the authors, used techniques and their related workFull size tableLu et al.137 have developed a method for the identification of 14 microbial species at different growth stages using convolution neural network (CNN) and Laser Tweezers Raman Spectroscopy (LTRS). They have shown that their model was useful in classifying all 14 microbial strains with an overall average classification accuracy of 95.64%. Although their method was capable enough for classification-based study, the relatively high cost of equipment sure limits its commercial use in the field of food microbiology and its related areas137.Several studies have employed Fourier-Transform Infrared Spectroscopy (FTIR) to identify microbial deterioration in various food products. Fengou et al.138 and Spyrelli et al.139 used the same concept as Argyri et al.136 evaluated FTIR for its ability to estimate surface spoilage in fish and chicken breast fillets using a combination of conventional and advanced analytics approaches to develop a model for spoilage prediction. Both of the aforementioned studies used partial least square regression (PLS-R) models to evaluate quantitative predictions. Fengou et al.138 showed that FTIR might be an effective technique for predicting the total count in fish samples (both whole and fillets), with the root mean square error (RMSE) of the constructed model estimated to be 0.717 log CFU/g138. The results of the Spyrelli et al.139 chicken breast fillets study indicated that using the PLS-R model and SoftML online platform algorithms, reliable quantitative predictions for the total count and Pseudomonas spp. at chicken breast fillets could be made using FTIR and Multispectral image analysis (MSI).Hyperspectral imaging (HSI) is similar to MSI in terms of principles, with the distinction being the number of bands used. Because of the continuous high bandwidth detected by HSI, it has a better spectral resolution but a poorer spatial resolution140. The higher the number of bands, the more in-depth details and precise fingerprints of samples can be obtained. Michael et al.141 used HSI to establish a method for rapidly distinguishing isolated Cronobacter sakazakii, Salmonella spp., E. coli, L. monocytogenes, and S. aureus. The procedure entailed isolating various strains of the aforementioned bacteria and immobilizing them in a microscope slide, which was then studied with HSI to build a database. Multivariate approach and data analytics approaches were used to develop classification models including principal component analysis (PCA), and k-nearest neighbor (k-NN) after selecting a wavelength range of 425.57 to 753.84 nm. In the results, authors have shown than classification accuracy of various strains within genera such as C. sakazakii, Salmonella spp., and E. coli was found to be 100% classified with the exception of strain BAA-894 in C. sakazakii and strains O26, O45, and O121 in E. coli had 66.67% classification accuracy. When evaluated together, only C. sakazakii P1, E. coli O104, O111, and O145, S. Montevideo, and L. monocytogenes showed 100% classification accuracy, whereas E. coli O45 and S. Tennessee showed 0.00% classification accuracy in the developed model.Bonah et al.142 examined variable selection techniques for detecting E. coli O157:H7 and S. aureus in pork samples using Vis-NIR hyperspectral imaging utilizing Variable Combination Population Analysis (VCPA), informative variables (IRIV), and Genetic Algorithm (GA). Before collecting the Vis-NIR HIS spectrum, pork samples were inoculated with pathogen culture. Spectral data were processed and cleaned with “noise-reducing methods,” including Savitzky–Golay filtration techniques, Second derivatives, and Standard Normal Variate (SNV). They also investigated six wavelengths selection method and their combinations to determine representative factors. Root means square errors of measurement, cross-validation, and forecasting on the prediction dataset were used to evaluate the algorithms’ prediction accuracy. Based on the results authors have emphasized that Vis-NIR HSI may be a good set of instrumentation along with BDA approaches for detecting foodborne pathogens142,143. The same authors have also developed detection methods for S. Typhimurium in minced pork using electronic nose for different inoculation levels (102, 104, and 107 CFU/g). For qualitative classification of infected samples, principal components analysis (PCA) was performed, while SVM techniques were used to build the model for computational predictions. SVM regression models with and without improved hyperparameters were also created142. In machine learning, hyperparameters are frequently used to design the basic training procedure144. The results showed that SVM with optimal parameters performed well and could be used to estimate S. Typhimurium in pork samples quantitatively, whereas PCA can be employed for subjective discrimination analysis143. A recent work by Siddique et al.145 employed NIR-Hyperspectral spectroscopy to detect spoiling during storage via predictive classification algorithms, isolating precise wavelengths for industrial-retailer-based applications. The dataset was utilized to train a Back Propagation Neural Network (BPNN) with 250,000 iterations, a learning rate of 0.02, and five hidden layers, in conjunction with a Linear Support Vector Machine (SVM-Linear) employing ten-fold cross-validation to categorize spoilage into three stages: Initiation (up to 3 log CFU/ml), Propagation (3 to 6.9 log CFU/ml), and Spoiled (>7 log CFU/ml). The BPNN model demonstrated an accuracy of 93.7% for baseline counts, 95.2% for propagation, and 98% for spoilt, underscoring its efficacy. These precise wavelengths not only present opportunities for economical spoiling detection systems but also hold the potential for real-time detection, thereby enhancing food safety and minimizing waste in the supply chain and exciting professionals about the practical applications of the research.An electronic tongue was applied by Al Ramahi et al.146 in order to differentiate between E. coli, S. aureus, and P. aeruginosa that were suspended in nutritional broth. They used principal components analysis (PCA) to analyze the outcomes of their investigation, and they placed a great emphasis on the fact that the created approach was able to effectively differentiate the three isolates after 15 h of incubation. Ghrissi et al.147 followed the same methodology as the previous investigation146. In aqueous dilutions, the authors employed e-tongue to distinguish and measure Enterococcus faecalis, S. aureus, E. coli, and P. aeruginosa. Sensors were expected to interact chemically with bacterial cell membrane in this investigation. The authors of this paper developed a model for microbe discrimination using linear discriminant analysis and a simulated annealing technique for variable selection (LDA-SA). They also employed multiple linear regression combined with a simulated annealing technique (MLRSA) to create the quantifying model by selecting the most appropriate sensory data. Leave-one-out cross-validation was used to verify both designs (LOOCV)147.Despite these advancements, many of these techniques require complex preprocessing and post-processing steps and expensive equipment, limiting their commercial scalability in the food industry. For example, Lu et al.137 achieved a 95.64% classification accuracy for identifying microbial species using CNNs and Laser Tweezers Raman Spectroscopy, but the high equipment cost remains a barrier to widespread adoption. Similarly, while Fourier-Transform Infrared Spectroscopy (FTIR) has been successfully applied to predict spoilage in fish and chicken breast fillets138,139, its resource-intensive nature can deter food companies from integrating these technologies into their supply chains.Dimensionality reduction techniques such as Principal Component Analysis (PCA)148, Partial Least Squares Regression (PLS-R), and other multivariate statistical approaches have been widely used to analyze big datasets in food safety. These methods can identify patterns and relationships within high-dimensional data, aiding in classifying spoilage stages and pathogen types. For instance, Michael et al.141 utilized HSI and multivariate data analysis to achieve 100% classification accuracy for most bacterial strains, although some were more difficult to classify. This underscores the crucial need for continuous refinement of big data models to address data variability and improve model robustness, highlighting the ongoing research and development in this field130,131.Electronic nose (e-nose) and tongue (e-tongue) systems have also been explored for rapid pathogen detection. These systems use sensor arrays to detect volatile compounds and chemical interactions associated with spoilage and contamination. Studies by Al Ramahi et al.146 and Ghrissi et al.147 demonstrated that e-tongue systems, combined with machine learning algorithms like Linear Discriminant Analysis (LDA) and Multiple Linear Regression (MLR), can effectively differentiate between bacterial species and estimate microbial concentrations. However, these systems require further development to improve sensitivity and reduce cross-contamination risks.For instance, hyperspectral imaging (HSI) has gained attention as a nondestructive technique for real-time spoilage detection. By analyzing continuous high-bandwidth data, HSI can provide detailed spectral fingerprints of food samples, enabling the identification of contaminants at various stages of spoilage. Studies by Bonah et al.142 and Siddique et al.145 have shown that Vis-NIR HSI combined with ML models can achieve high prediction accuracies for detecting spoilage of poultry breast fillets during storage. These successful implementations demonstrate the potential of big data applications in food safety.Best practices for implementing big data in food safetySeveral best practices should be considered to maximize the potential of big data applications in food spoilage and pathogen detection. One crucial step is the integration of Big Data Analytics (BDA) with IoT devices. Incorporating Internet of Things (IoT) devices into the supply chain can enable real-time data collection from various points in the food production process. This integration improves traceability and contaminant detection, allowing producers to respond quickly to potential issues139. Additionally, the development of cost-effective detection systems is essential. Reducing the cost of equipment and simplifying preprocessing steps can make big data-driven detection systems more accessible to smaller food producers and retailers, ensuring broader adoption across the industry129.Another important aspect is continuous model refinement. Machine learning models should continuously update with new data to improve accuracy and robustness. This practice addresses data variability and environmental factors issues, ensuring the models remain effective over time130. Moreover, the standardization of data collection protocols is critical for improving data quality and ensuring comparability across studies. By establishing standardized data collection, processing, and analysis protocols, food safety monitoring systems can become more reliable and consistent131.Integrating big data applications and machine learning models in food spoilage and pathogen detection offers significant potential to improve food safety, reduce waste, and ensure consumer protection. While several rapid detection methods have shown promise in research settings, their commercial scalability remains challenging. Addressing cost, data variability, and model refinement issues will be critical to fully harnessing the benefits of big data analytics in the food industry. By adopting best practices and leveraging advanced technologies, the food industry can proactively detect spoilage and contamination, improving public health outcomes and ensuring safe, high-quality food products.Big data analytics in food processingThe term “food processing” refers to a variety of processes, some of which include evaporation, boiling, toasting, freezing, bottling, extruding, encapsulating, fermenting, and modified environment packaging. These techniques are used to increase the shelf life and quality of food149. Because of its enormous economic potential, the food processing industry has grown rapidly149,150. It is the largest segment of the world’s food sector and is expected to grow even more in the near future. According to published reports in 2019, food processing industries are valued at $ 11.7 trillion in 2019, and it is predicted to rise at a compound annual growth rate (CAGR) of 5% from 2020 to 2027151. This increase in the food processing industry is connected to an increase in human population, lifestyle changes, the pandemic situation, and improved food quality152. Although it is a growing field, processing industries are not completely problem-free, there are many issues related to time, increased raw material cost, increase energy cost, and decline in product quality due to unexpected hurdle in processing plants. Inadequate optimized parameters, erroneous sensors, not well-trained workers, and unidentified patterns cause these challenges in food processing plants153. Many researchers have also developed modeling techniques that can be implemented in food manufacturing, but their dependencies on raw ingredient, final product and involved processes limit their use in practical applicability154,155. To solve these applicability issues, researchers have developed semi-physical and entirely theoretical models such as multi-phase models, and single-phase diffusion models153,154,155,156,157. Although there are benefits to using these fundamental basic model, there are several key challenges associated with it due to the nature of the food system, which is heterogeneous, porous, and perishable in nature. Conventional modeling is computationally demanding compared to advanced analytical models. Several investigators have used conventional statistical models, such as Page models158, the Henderson and Pabis model159, the Lewis model160 and the Newton model, to anticipate humidity transport during evaporation and roasting161 for frying, and response–surface technique for optimization of canning (Table 1)162. There is no doubt that these models are simple, easy to fit, and relatively cost-effective, but analyzing and maintaining large and complicated datasets is challenging.Historically, the food processing industry has relied on traditional models to optimize operations such as evaporation, baking, bottling, and packing163. However, these models come with several limitations that hinder their practical application in modern food manufacturing163,164. One of the key drawbacks is the oversimplified assumptions embedded in traditional models which assumes that food systems are uniform and isotropic, implying that the characteristics of food remain constant throughout the substance165. Food systems are inherently diverse, porous, and perishable, with varying textures, densities, and moisture levels166. The Lewis model, commonly used in drying operations, suggests a linear relationship between drying time and moisture content167. This assumption often proves inaccurate for porous foods like grains or fruits, where internal diffusion barriers lead to nonlinear moisture loss patterns168. The discrepancy between theoretical assumptions and actual food properties can lead to errors when traditional models are applied in industrial food systems.Adaptation is a significant weakness of traditional models, as their static nature necessitates recalibration for every new product, ingredient, or processing scenario. This lack of adaptability makes them unsuitable for dynamic food manufacturing settings where recipes, procedures, and conditions regularly fluctuate169. For example, based on ‘The Newton model,’ the ability to forecast moisture loss during frying may require recalibration for various oil types, batter formulations, or frying temperatures170. This inefficacy is particularly pronounced in sectors emphasizing innovation and rapid product development. Furthermore, traditional models struggle to manage food systems’ nonlinear and intricate relationships. The interplay of diverse components, including ingredient characteristics, processing conditions, and environmental variables, is often nonlinear and challenging to represent using conventional models169. The Henderson and Pabis model may perform well in controlled environments; however, it overlooks the impact of varying humidity levels and inconsistent airflow patterns, reducing predictive accuracy159.Due to their reliance on finite variables, traditional models struggle to manage the increasingly prevalent extensive datasets in conventional food processing171. These models, being contingent on parameters, are not equipped to handle the intricate datasets produced by food processing facilities’ sensors, IoT devices, and quality control systems172. In contrast, the complexity of modern food processing demands advanced modeling techniques that can effectively manage diverse data inputs and adapt to constantly changing conditions. Machine learning (ML) models offer a significant advantage over traditional methods with their ability to handle large and varied datasets171,172.Unlike static models, machine learning models are data-driven and adept at analyzing extensive, multidimensional information. One of their key strengths is their ability to uncover latent patterns within food systems, a task that traditional models struggle with173. Another significant advantage of machine learning models is their capacity to manage diverse and nonlinear systems171. These models can understand complex interactions among variables without relying on strict assumptions about uniformity or constant conditions. For instance, machine learning algorithms can predict the shelf life of food products by analyzing factors like temperature, humidity, and packaging conditions174. Importantly, machine learning-based methodologies are flexible to new factors and evolving conditions, enabling them to provide more accurate and dynamic forecasts. Corporations like Nestlé and Kellogg’s have effectively utilized this predictive power and have improved their products’ shelf life and quality using machine learning models for sensor data analysis175,176.Machine learning models exhibit significant adaptability and may be trained on novel datasets to manage diverse products and processes without necessitating extensive recalibration177. The adaptability of these ML-based data-driven models makes them optimal for food producers who regularly innovate and alter their production techniques178. A practical application of machine learning in food production is ingredient optimization179. By examining interactions among diverse components, machine learning models can recommend optimal pairings to enhance texture, flavor, and nutritional value in food products180. IBM Watson has created new food products by evaluating millions of recipes and identifying innovative ingredient combinations based on flavor profiles and dietary objectives181. Nonetheless, despite the myriad advantages of machine learning models, their use in the food processing sector presents specific ethical problems and concerns (Table 1).Moreover, it is crucial not to overlook the ecological implications of machine learning models. While machine learning can enhance food processing by reducing waste and energy consumption, it is important to recognize that training extensive machine learning models can lead to a significant carbon footprint181. Organizations that adopt machine learning in food processing must carefully weigh the benefits of improved efficiency against the need to minimize their environmental impact23. Despite these challenges, machine learning offers a promising future for food processing. Machine learning models are better equipped to handle complex, diverse systems and adapt to changing conditions in real time than traditional models. Traditional models are suitable for simple systems and processes, but their limitations in managing large amounts of data, nonlinear interactions, and dynamic environments make them unsuitable for modern food processing180,181,182.Traditional models in food processing are constrained by their basic assumptions, rigidity, and incapacity to manage complex relationships. These constraints impede its relevance in dynamic, real-world food systems. Conversely, ML models present considerable benefits by delivering data-driven insights, facilitating real-time monitoring, and adjusting to new situations without lengthy recalibration. However, implementing machine learning presents issues about data privacy, bias, employment displacement, and environmental consequences. To optimize the advantages of machine learning in food processing, interdisciplinary collaboration among food scientists, data scientists, and engineers is crucial, but it is the key to creating robust, ethical models that tackle the industry’s distinct difficulties. The transition from traditional to machine learning-based models is essential to satisfy the requirements of contemporary food manufacturing, elevate product quality, and optimize operational efficiency in a swiftly evolving worldwide market.Big data analytics in different processing stepsHernandez-Perez et al.183 have employed ML-based algorithms that can be used to calculate and determine the evaporation rate and moisture spread in samples during the drying process of mango and cassava. Authors have mentioned that good quality simulation of drying process is obtained using artificial neural network and also emphasized that ANN can be implemented in online estimation of the product drying process.Generally, baking is a simple method yet involves complex inter-relationship of physical (Heat, time, size of oven) and chemical properties (water content, protein content, fat content and others) to develop a good-quality baked product. The main challenge in this part of the processing industry is to increase production with improved quality of baked food. Several studies have been conducted to solve the abovementioned issues using mathematical-based models184,185,186,187. But due to the complexity of these models and intensive computational process involvement, implementation of these models to the baking industry is not practical. On the other hand, simplicity and ability of ML predictive approaches researchers have tried to use these in industries such as baking of soft cake188, milk cake189, and Bread190. Broyart and Trystram191 developed two neural network models to forecast the changes in biscuit texture and color during the baking process. Overall average moisture content and temperature, thickness, and surface color throughout the length of the oven as a factor of the baking process were considered as output. Inductive modeling techniques based on ANN models have the ability to accurately forecast product thickness and color changes.Sablani et al.192 used an ANN-based prediction model to assess the thermal conductivity of several bread commodities. The ANN model with two hidden layers containing six neurons in each hidden layer design yielded better results with a 10% mean relative error (MRE). Based on the observed results, the authors have shown that to forecast thermal conductivity values their developed model might be useful in bread baking industry.Extrusion is an effective approach to transform raw food resources into finished food products with a specific cross-sectional shape and design. Because this technology is cost-effective, simple, energy-efficient, and environmentally friendly, it has attracted a lot of attention and has grown in popularity over the last two decades. Extrusion is used to make various products such as cereals, pasta, noodles, and nuggets193. Controlling feed material, raw component quality, water content, total proteins, pH value, and characteristics like feed rate, extruder length, and screw geometry are the industry’s key issues. The abovementioned parameters affect extruded product quality171. There is no developed mathematical model exists that allows predictive modeling to regulate these parameters to improve product quality by optimizing process. Expensive equipment, and changing settings for specific product types is difficult, therefore optimizing parameters with the same settings can improve product quality. Optimal process parameters are crucial for the development of these products. Optimizing feed rate, temperature, and pace can enhance color, appearance, and textural quality194,195. There have been several studies reported in which authors have used ANN to predict product quality. Shihani et al.196 used an ANN and RSM model with several inputs (temperature, moisture content, and screw speed) and outputs (water solubility, water absorption, specific mechanical energy, sensory scores, and expansion ratio) to characterize extruded goods. The authors compared RSM and ANN models to characterize extruded goods. ANN models forecast extruded products with less inaccuracy than RSM models. Fan et al.197 used a feedforward ANN model to solve hardness and gumminess in rice flour-based products. Unknown relationships between input and output factors complicate extrusion operations. The authors employed a multilayer feedforward ANN model to solve complicated food processing prediction problems. The network was trained using BPNN with input and output vectors. The created BPNN model has shown great prediction accuracy and promising outcomes, but these results may be erroneous when interior variables like as moisture content, ripeness, and flavor are considered for the model development and predictive analytics work197.A low-cost, color-based ANN model was developed by León-Roque et al.198 to estimate the ratio between fermented and non-fermented products total free amino acids in 120 cocoa beans. Authors in this study have collected the Red Green Blue (RGB) color of the fermented cocoa beans from the surface and central region, in the absorption spectrum range from 400 to 450 nm. The predicted results showed excellent classification results for the classification of fermented beans.Zhu et al.199 have developed a rapid method for the detection of fermentation in black tea using electrical properties and used several ML-based models using multilayer perception (MLP), random forest (RF), and support vector machines (SVM) along with PCA and hierarchical clustering analysis to predict the quality attributes of the fermentation process of black tea. Based on prediction accuracy results, multilayer perception, random forest, and SVM are 88.90%, 100%, and 76.92%, respectively, indicating that the random forest was the most appropriate algorithm for predicting the degree of fermentation of black tea.Canning is a unique process not only in food processing but also important from the food safety perspective. In this process, food is sealed in a container and subjected to a heating procedure to increase its shelf life (ranging from 1 to 5 years). The quality of canned food is directly and indirectly affected by a variety of elements, including the kind of solution, concentrations, soaking duration of food items, and processing parameters (temperature and time; container material; and the characteristics of food material, such as moisture content, pH, and thermal diffusivity)200. Increasing the shelf life of canned foods while improving their quality and safety can be accomplished by optimizing these various manufacturing parameters. Various statistical models for forecasting the canning process and optimizing the canning process parameters have been proposed and applied. However, due to the complicated nature of mathematical expressions and direct application in canning operations, no theoretical or purely mechanical model for anticipating canning operations has been produced to date. ML-based technology, such as an ANN prediction-based approach, might be a good way to keep track of the process and improve the many quality parameters. ANN model developed by Kseibat et al.201 to predict the operating temperature, duration, and basic minimum deterioration during canning. The authors in this study employed beginning temperature, can size, microorganism sensitivity indicator, and sensitive indicator of quality as input factors, and temperature, duration, and basic minimum deterioration as output components of the model. The model’s results demonstrated that the ANN-developed model can accurately forecast the temperature, time, and quality degradation of the canning process with MREs of 0.2%, 3.9%, and 1.5%, respectively. Based on the model’s findings, authors have concluded that the initial food temperature had little impact on the output parameters used to forecast responses201.Zhang et al.202, for the first time, showed the implementation of a generic hybrid mechanistic modeling and machine learning approach to design new food products. In this study, authors have explained the mechanism for mechanistic models for the estimation of food characteristics while the machine learning model predicts the sensory characteristics of the developed product. Mittal and Zhang203,204,205 have intensively investigated the use of predictive modeling approach for deep-frying process, prediction of moisture and temperature content, and the prediction of freezing points for different food products using neural network algorithmic model. In this study, frying duration, moisture of the product and surface, product thickness, oil temperature, food’s initial temperature, and other product parameters are used as input for the input layer of ANN. Also, a big dataset using four-level ANN networks is used to predict more accurate frying process and validated it with experimental data for forecasting meatball deep-frying process. A backpropagation neural network was developed using initial moisture, relative humidity, average temperature of smoke house was used as input variables, and dry basis moisture content, center temperature of product, and average temperature of product were considered as output desired labels. Authors have noticed that shrinkage rate as input variable improved the prediction accuracy. For predicting the effect of several modified pretreatment processes before frying and its effect on end product on the moisture as well as oil concentration of fried mushrooms, Mohebbi et al.206 developed a coupled algorithm employing ANN- and genetic algorithms (GA) using frying temperature, time, osmotic condition, and gum-coating parameters as inputs to the model, with moisture and oil content as outputs. With a goodness of fit (R2) of 0.93 and 96% prediction accuracy was obtained. In another study to predict the textural features of potato chips during deep-fat frying, Gouyo et al.207 have used ensemble learner to develop a decision tree (DT) based algorithm. In this, study, the deep-fat-fried potato chips were found to be crispier than the air-fried chips. This was most likely owing to the differences in water transport pathways between deep-fat and air frying.Although these models contributed to a better understanding of the frying, baking, canning, extrusion, and freezing of different food products, they were unable to determine the underlying cause of the contraction of food during deep-fat frying. Although crust generation is a common occurrence in the frying process, this key factor was overlooked when developing the models. All the aforementioned studies have used ANN-based algorithms in almost every case; there is a vast gap of studies to show the use of other predictive modeling approaches such as support vector machine, k-nearest neighbor, random forest and fuzzy set classification and decision-making process. There is also a strong need for studies that can shed some light on the combination of multivariate analysis techniques along with big data analytics, specifically in image processing techniques for supervised and unsupervised approaches to solve issues related to different food processing steps.Using machine learning (ML) models in food processing has demonstrated significant potential in enhancing process efficiency, product quality, and predictive accuracy across multiple areas, including drying, baking, extrusion, canning, and frying208. Despite the progress made with artificial neural networks (ANNs) and associated algorithms, numerous limitations and research gaps still necessitate more investigation. A primary difficulty is that the majority of research primarily utilizes artificial neural networks (ANNs) in the food industry, frequently neglecting the advantages of alternative predictive modeling techniques such as support vector machines (SVM), k-nearest neighbors (KNN), random forests (RF), and fuzzy logic systems. These alternate models may benefit in particular situations, such as managing nonlinear interactions or enhancing classification precision in intricate datasets. A further barrier is the scarcity of research that combines multivariate analysis approaches with big data analytics, especially in image-based processing, which could transform quality control and monitoring in food manufacturing. Moreover, although machine learning models can precisely forecast specific parameters such as moisture content, thermal conductivity, and textural alterations, they frequently neglect to consider fundamental physical phenomena, including food contraction during frying or crust formation, essential for creating more accurate and dependable predictive systems. This error indicates that hybrid methodologies integrating mechanical models with data-driven machine-learning techniques may reconcile the disparity between theoretical frameworks and practical implementation in food processing.Furthermore, several machine learning models require extensive datasets and significant computational resources, which could pose challenges for smaller food processing enterprises. The need for big data analytics, especially in image processing for defect identification and quality evaluation, is an area that is largely unexplored. Combining supervised and unsupervised learning methodologies with multivariate analysis could help bridge this gap and improve the accuracy of predictions across various food processing stages. An important aspect that requires more attention is the interpretability of machine learning models. While ANN models often deliver high accuracy, they are often seen as ‘black boxes,’ making it difficult to understand how input variables influence the final prediction. This lack of transparency can hinder their adoption in sectors that require clear decision-making processes. Future research should focus on using explainable AI (XAI) methodologies to improve the interpretability of prediction models in food processing. Lastly, it is crucial to investigate the environmental and ethical implications of using machine learning in food processing. The carbon footprint associated with training extensive neural networks and the potential for algorithmic bias in quality control are significant issues that need to be addressed.In conclusion, whereas machine learning models like artificial neural networks exhibit considerable promise in enhancing diverse food processing activities, they possess inherent limits. The excessive dependence on a singular model type and the insufficient investigation of alternate predictive methodologies underscores a significant research deficiency. Subsequent research should concentrate on the amalgamation of multivariate analysis, big data analytics, and explainable AI to create more resilient and comprehensible models capable of addressing the intricacies of contemporary food processing. A hybrid method that integrates mechanistic and data-driven models may yield more precise and dependable predictions while accounting for the fundamental physical processes sometimes neglected by solely data-driven models. By rectifying these deficiencies and constraints, the food processing sector can advance toward more sustainable, efficient, and precise predictive solutions that align with the changing requirements of contemporary industrial processes.Big data analytics in food quality and authenticityDifferent big data analytics approaches in the current world are being used in every aspect of the food industry. As seen in the above section, different ML techniques, big data analytics approaches have been used in different areas of food safety and in different food processing steps. ML-based models have been used in the inspection of various food ranging from fresh produce to stored food products. Food quality has also been seen as an important factor that influences the cost of final product starting from initial raw ingredients to reaching the consumers at retail stores209. Before the technological advancements in the assessment of food quality in any field including fresh produce, dairy, fisheries and poultry was labor-intensive, more prone to false positive results, and requires experienced employee to complete the task, and based on employee experiences, a lot of quality defects go unnoticed.Quality evaluation of food basically consists of grading the food product based on external feature, morphological character and visual sensory attributes such as color, texture and appearance210. Considering the current demand of the food industry, there is a great need to explore the use of non-invasive sensor during inline and online food quality detection systems211. To solve this problem, different researchers have explored the ideas to use rapid quality detection techniques such as cameras212, sensors213, near-infrared214, hyperspectral imaging, radio-frequency waves, and Fourier transformation infrared techniques98 in the quality evaluation of food product in different food matrixes.Although these detection systems have proved to be helpful, their practical application is limited due to complex and large dataset generation. Use of BDA in unwinding these complex datasets for data pattern identification and analysis will provide a great insight into the food industry, which will be helpful in maintaining and improving the quality. There have been a number of studies that focused on these quality parameters for the grading of food products based on physical and chemical attributes and even sometimes a combination of these with ML-based models98,210,212,214. Rapid detection methods for these quality attributes generate huge datasets, but there is always a chance of getting redundant, noisy and inappropriate information associated with the data. This noisy and uncleaned large volume of data has been a major concern for feature extraction that can relate to the problem directly in solving food quality issues.Big data analytics in food qualityComputer-based image analysis system has been used for the classification of various fruits and vegetables such as grading of apples using multilayer perceptron model, grading of strawberries using image processing using k-means clustering, low-cost tomatoes grading system coupled with machine learning techniques has been reported by Ireri et al.215 have shown that grading of tomatoes was done on the basis of color, size and weight. In this study, SVM, ANN, and random forest algorithms were developed for the grading of tomatoes based on RBG image analysis. Based on the analysis of collected image data, SVM showed 91.26 to 94.67% of classification, ANN showed 92.99 to 95.83% classification, for the decision tree analysis 91.08 to 94.12% of classification accuracy215.Kanade and Shaligram216 have reported the use of k-nearest neighbor model in the classification of guava fruits (k-NN) into four different classes of green, ripe, overripe and spoiled. Authors have reported about 90% of classification accuracy. Before the development of analytic approaches, classification of corn seed was a challenging task, the process was labor-intensive and need experts to do these quality evaluations.Prakasa et al.217 have developed automated image classification system based on region of interest (ROI) and k-means clustering for the classification of corn seed. Results in this study showed that 90% of the accurate classification. Septiarini et al.218 have used SVM along with image processing techniques for the classification oil palm fruit based on level of ripeness and color processing for red, green, blue and gray. Results obtained from these experiments showed that developed model was tested on 160 images with an accuracy of 92.5%. Authors have also reported that error percentage was found to be less than 2.4% and color features is the dominant factor in the analysis. Aiadi et al.219 have used Gaussian Mixture Model (GMM), and Expectation-Maximization (EM) algorithm was used for parameters estimation and Davies-Bouldin index was used to automatically and precisely estimate the number of components (i.e., appearances) for the classification of 11 different types of dates. Results obtained from the experimental data showed that developed model had high identification rate of 98.65%.Yu et al.220 have demonstrated that using stacked auto-encoder on visible and near-infrared hyperspectral imaging (HSI) generated data were able to combine classify shrimp based on assigned label of freshness in determining their total volatile basic nitrogen (TVB-N) contents. Addition to calibration set of data 116 samples were used in the experiments. Results obtained in the experiment showed that 93.97% of classification accuracy based of desired output grade of Shrimp. Yu et al.220 used successive projections algorithm (SPA) and deep-learning-based stacked auto-encoders algorithms for the prediction of TVB-N content in Pacific white shrimp. Authors in the study used combination of multivariate analysis and BDA approaches for the prediction and found that results obtained in the study showed that model prediction coefficient had a R2 value of 0.92.Big data analytics approaches have also been used in meat and poultry processing industry to maintain the quality of fresh raw poultry, beef, lamb and goat. For classification of poultry meat based on quality such as normal fillets, and myopathic fillets. Barbon et al.221 have used an SMV model to tell the difference between normal and pale meat. They have shown that SVM can be used to classify breast fillets with muscle myopathies. The classification accuracy for normal breast fillets was 53.4%, while it was 72% for pale breast fillets. Geronimo et al.222 have used a machine vision system and SVM to categorize fillets images. They have observed that developed model from SVM algorithm for WB classified 91.83% fillets correctly. These researchers also used multilayer perceptron (MLP) to classify the dataset. For WB fillets, the classification performance of model was 90.67%. Yang et al.223 analyzed images derived from the expressible fluid to classify WB using SVM and deep learning (DL) algorithms. These researchers found high classification accuracy for both training (100%) and testing set (93.3%). Morey et al.224 have implanted linear discriminate analysis technique for the classification of normal fillets from the myopathic fillets using electrical sensor. They have observed that these sensors when coupled with LDA techniques were able to classify these fillets up to an accuracy of 68.69% for normal fillets and 57.75% for WB fillets. In another study conducted by Siddique et al.225 have shown that use of SVM and backpropagation neural network algorithms performed well in classification of these normal and myopathic fillets. Authors have found that SVM model was able to classify 73.28% of normal fillets and 81.48% of WB fillets. Siddique et al.226 have also demonstrated that the use of singular value decomposition (SVD) analysis method in the determination of quality of fillets through collection of amplitude and phase. Authors in this study have observed that SVD classified 100% normal fillets, and 78% WB fillets based on radio-frequency wave analysis data.Penning et al.227 have used eight different ML algorithms for the determination of beef quality attributes using image analysis and mass-spectroscopy data. They have observed that PCA-FS and LDA classified 82% beef for quality grade, FS, and SVM Linear classified 99% of meat for production background, PCA-FS, and SVM—Radial classified 85% for breed type, and FS and XGBoost classified 91% for muscle tenderness. Alaiz-Rodríguez and Parnell228 have used ML algorithms in the detection of lamb meat quality, authors have used decision tree and SVM and compared these models with Partial Least Square (PLS) model and Principal Component Analysis (PCA) regression methods. Results have shown that SVM was able to classify 91.80% of the collected fat data as compared to PCA analysis.Big data analytics in food authenticityTampering or adulteration of food is also one of the big problems in the area of food quality in the food industry and accounts for $15–40 billion every year to the food industry such as temperament with food quality, labeling of the food product use of cheap quality ingredients in food processing. We have provided a very small, related information about these issues and recent developments to tackle this evolving situation. For example, Al-Sarayreh et al.229 have used support vector machines (SVM) and deep convolution neural networks (CNN) algorithm to evaluate the level of adulteration in red meat by using Hyperspectral imaging technique (HIS). Based on the analysis and obtained results from their study, the authors have confirmed that the CNN model has the best prediction power with a classification accuracy of 94.4%.Farah et al.96 have used differential scanning colorimetry with random forest (RF), gradient boosting machine (GBM) and multilayer perceptron in the identification and detection of adulterant added for quality evaluation of raw milk. Authors have found that all the developed model for MLP, RF and GBM classified 100% adulterated samples with 100% prediction capability for GBM and MLP and 88.5% prediction capabilities with RF developed models.Fabris et al.230 have used RF and SVM-based detection model for quality evaluation of cheese during processing. In this study, authors have tried to establish a relationship between the storage condition of milk and the final quality of cheese. The results of their study have shown that PCA when coupled with SVM showed better quality identification in cheese samples made in different seasons (summer vs winter). Dankowska and Kowalewski231 have studied the classification of olive oil based on shelf life and type (refined/extra virgin) using SVM and k-NN coupled with multivariate analytics approaches (PCA) by analyzing fluorescent data collected from synchronous fluorescence spectroscopic measurements. Authors have found that k-NN and SVM were the best optimized model for the classification of data based on labels. The k-NN and SVM model classified 94.60% and 94.4% labeled data, respectively, in their assigned groups.Machine learning (ML) methods, including support vector machines (SVM), artificial neural networks (ANN), random forests (RF), k-nearest neighbors (k-NN), and decision trees (DT), have shown considerable promise in enhancing food quality evaluation through the analysis of intricate datasets. Among these, SVM has proven to be an exceptionally successful model architecture for classification tasks, especially in situations involving nonlinear connections between input characteristics and output predictions. Research indicates that SVM excels in tasks such as recognizing muscle myopathies in poultry fillets222 and detecting food adulteration229 owing to its capacity to establish optimal decision boundaries in high-dimensional spaces. Nonetheless, the efficacy of SVM can be affected by the selection of hyperparameters, such as kernel type, regularization parameter (C), and gamma values. Although SVM provides strong classification accuracy, it may be computationally intensive for extensive datasets and necessitates meticulous parameter tweaking to prevent overfitting.Conversely, ANN models have demonstrated significant efficacy in tasks characterized by intricate, nonlinear interactions and substantial data quantities. Their architecture, comprising numerous concealed layers and neurons, enables the identification of complex patterns in food quality characteristics such as texture, moisture content, and color232. The principal factors affecting ANN forecasts comprise the quantity of hidden layers, the number of neurons per layer, the employed activation functions, and the learning rate. Artificial Neural Network (ANN) models have been efficiently employed in applications like shrimp freshness classification232, cheese quality prediction230, and thermal conductivity assessment in bread192. Nonetheless, a limitation of artificial neural networks is their “black box” characteristic, which complicates the interpretation of how input variables influence predictions. This absence of transparency can restrict their practical utilization in sectors where explainability is essential.Random forest (RF) models have become popular because of their adaptability, straightforward implementation, and capacity to manage extensive datasets with high-dimensional input variables. Random Forest models operate by constructing numerous decision trees and consolidating their predictions to enhance accuracy and mitigate overfitting. In food quality applications, RF has been employed to identify adulterants in raw milk96 and to assess cheese quality in relation to seasonal fluctuations230. A primary advantage of RF is its capacity to manage absent data and assess feature significance, yielding critical insights into the most impactful variables in the prediction process. Nonetheless, RF models can be computationally demanding and may encounter difficulties with several irrelevant features, necessitating preprocessing measures to enhance performance.The k-nearest neighbors (k-NN) algorithm, a straightforward machine learning model, has been effectively employed in classification tasks, including the grading of guava fruits according to maturity levels216 and the classification of olive oil based on shelf life and type231. The primary factors affecting k-NN predictions are the number of neighbors (k) and the employed distance measure. Although k-NN is straightforward to develop and comprehend, it is susceptible to noisy data and can be computationally intensive for large datasets.The optimal model architecture is contingent upon the particular food quality evaluation task and the nature of the available data. Support Vector Machines (SVM) are adept at binary classification tasks, including nonlinear interactions, but Artificial Neural Networks (ANN) are proficient at managing intricate, high-dimensional data characterized by nonlinear dependencies. Random Forest delivers strong performance in multi-class classification tasks and feature priority ranking, while k-NN is a straightforward, interpretable model suited for smaller datasets. The primary influential parameters in these models consist of kernel type and regularization parameters for SVM, the number of layers and neurons for ANN, tree depth and the number of estimators for RF, and the selection of distance metric and neighbors for k-NN. By choosing a suitable model and refining these factors, the food sector may get precise predictions and enhance quality control procedures.Results and discussionThis work implements a systematic review methodology based on the Kitchenham criteria for evidence-based reviews tailored to the food industry setting6. The Kitchenham criteria, a set of guidelines for conducting systematic reviews, were chosen for their applicability to the unique challenges and opportunities in the food industry. The methodology was designed to provide a thorough, impartial, and replicable synthesis of the existing uses of Big Data Analytics (BDA) in the food industry. The study identified significant trends and patterns across different topics and techniques. Principal areas encompassed Artificial Intelligence (AI), Big Data, Food Safety, Internet of Things (IoT), and Machine Learning. The research approaches utilized in these investigations encompassed Surveys, Experiments, Modeling, Simulations, and Framework Development.AI has emerged as the leading topic, with over 50 research studies highlighting its potential applications in food safety, smart agriculture, and digital transformation. The close relationship between Big Data and Food Safety underscores the growing reliance on extensive datasets and real-time monitoring systems to meet food quality and safety standards. The frequent use of Survey and Modeling approaches indicates that researchers are primarily focused on exploring the viability and theoretical frameworks of these technologies. However, it is crucial to shift toward more empirical research through experiments and simulations to validate these frameworks in practical settings. This emphasis on the need for action should motivate the researchers to prioritize empirical research in their work.Research questions addressedRQ1: What are the main applications of BDA in the food industry?The analysis identified that the primary applications of Big Data Analytics in the food industry include food safety, quality assurance, and processing efficiency. Studies showed that AI and Machine Learning models are being used for predictive analytics, automation of processes, and real-time monitoring. For example, some studies highlighted the use of IoT devices to monitor food storage conditions, while others focused on blockchain technology to improve supply chain traceability. Machine Learning models such as Support Vector Machines (SVMs), Neural Networks, and Decision Trees were frequently used to predict food quality parameters and detect contaminants. However, most studies focused on developing conceptual frameworks without validating these applications in real-world scenarios.RQ2: How effective are machine learning (ML) models in improving food quality, safety, and processing?The effectiveness of ML models varied depending on the application. Neural Networks and SVMs were found to be effective in predicting spoilage, detecting pathogens, and classifying food products based on quality. However, studies noted several limitations, including:

Lack of real-time validation: Many ML models were tested in controlled environments but lacked real-world testing.

Algorithmic complexity: Some studies reported that complex algorithms were difficult to implement in traditional food processing facilities.

Data limitations: The accuracy of ML models depended heavily on the quality and quantity of data available.

Overall, while ML models showed potential for improving food quality and safety, there is a need for more empirical studies to validate these findings in practical settings.RQ3: What challenges are associated with the practical implementation of BDA techniques?Several challenges were identified in the practical implementation of Big Data Analytics techniques:

Data Quality and Availability: Many studies highlighted the issue of limited access to high-quality data, which affects the accuracy of predictive models.

Technological Infrastructure: The lack of advanced technological infrastructure in food processing facilities was a common barrier to implementing BDA solutions.

Cost and Complexity: The high cost of implementing IoT devices and maintaining blockchain systems was noted as a significant challenge.

Skill Gaps: The need for specialized knowledge to manage and analyze big data was another recurring issue.

To address these challenges, future research should focus on developing cost-effective solutions and training programs to bridge the skill gaps.Patterns and trends in themes and methodologiesThe cross-referenced analysis of topics and approaches revealed distinct patterns that enhance comprehension of their interconnections. Artificial intelligence was mostly examined regarding predictive analytics, automation, and categorization models. This topic frequently accompanied Survey and Modeling techniques, indicating an increasing interest in exploring the possible applications of AI in food safety and agriculture. However, despite the recurrent references to AI, there was a conspicuous lack of research concentrating on advancing novel machine learning algorithms designed for these areas, highlighting the urgent need for further exploration in this research area.Likewise, big data emerged as a significant topic, frequently addressed by IoT and blockchain technology. The emphasis was on analyzing and processing extensive information to improve decision-making in food safety and supply chain management. The favored approaches for these investigations were Modeling and Simulation, emphasizing theoretical frameworks and predictive models.Researchers highlighted the significance of real-time monitoring systems, pathogen detection, and risk assessment frameworks in Food Safety. These studies often utilized survey and experimental approaches to collect empirical data and assess the efficacy of recommended solutions. This underscores a substantial focus on the practical implementation and verification of food safety protocols, providing reassurance about the effectiveness of current practices.Network analysisThe detailed network analysis graph (Fig. 6) offers an in-depth understanding of the relationships between themes and approaches examined in the assessed articles. The graph elucidates several critical insights:

Central theme: The graph illustrates that AI, Big Data, and IoT are pivotal nodes, signifying their substantial impact across diverse study domains. These topics are linked to several other elements, including Predictive Analytics, Real-time Monitoring, and Pathogen Detection, illustrating their extensive applicability in the food industry.

Clusters of related concepts: The network graph illustrates distinct clusters of interconnected concepts. Blockchain is intrinsically associated with Supply Chain and Traceability, demonstrating its application in enhancing transparency throughout the food supply chain. Smart Agriculture is intrinsically linked to Sustainability, underscoring the increasing interest in employing digital technologies, such as IoT and AI, to advance sustainable farming techniques, including precision agriculture and resource optimization.

Influence of methodologies: The Survey and Modeling techniques emerge as significant influences inside the network, acting as essential connectors across diverse themes. This signifies that these approaches are frequently employed to investigate the viability and conceptual frameworks of BDA applications within the food industry. However, the restricted links to Experiment and Simulation nodes indicate a deficiency in empirical validation and practical testing, which could potentially limit the real-world applicability of the research findings.

Emerging themes: The graph underscores several emerging themes, including Algorithm Development, Data Quality, and Technological Infrastructure. These nodes signify domains necessitating increased focus from researchers to enhance the practical application of BDA approaches. Algorithm development, in particular, is vital for improving the precision and efficacy of predictive models, highlighting the significant role of researchers in shaping the future of the food industry.

Gap in research: The network analysis identifies multiple deficiencies in the existing research environment. Sustainability emerges as a relatively solitary node with constrained links to other themes, underscoring the urgent need for further research to investigate the utilization of digital technologies in advancing sustainable practices within the food industry.

Fig. 6Comprehensive network analysis graph representing the detailed relationship between various themes and methodologies.Full size imageThe network graph highlights the interrelatedness of themes and approaches in the analyzed papers. It underscores the significance of data-driven breakthroughs, including AI, Big Data, and IoT, in the food industry while identifying domains requiring additional investigation and empirical validation (Fig. 6).Word cloud analysisThe important terms and recurrent themes covered in the examined works on machine learning (ML) and big data analytics (BDA) in the food industry are reflected in the word cloud produced by the systematic review. Concepts like “Data,” “Processing,” “Food,” “Detection,” and “Analysis,” which are essential to the current research in food safety, quality, and supply chain management, are highlighted in the visualization. The word cloud’s strong use of the terms “Data” and “Processing” underscores the critical role of extensive data processing in ensuring the safety and quality of our food. This emphasis on data processing should reassure the audience about the rigorous measures being taken to guarantee the safety and quality of their food. The word “Food” is key in highlighting that all developments in BDA and ML are intended to solve food production, sustainability, and safety issues (Fig. 7).Fig. 7Comprehensive visual summary of the ongoing advancements and challenges in integrating BDA and ML technologies in the food sector.Full size imageThe use of sophisticated machine learning models in the food industry is a growing trend, as highlighted by the frequent mention of methods such as ‘Neural,’ ‘Support Vector,’ ‘Random Forest,’ and ‘Deep Learning.’ These models are extensively employed in a variety of food industry applications, particularly for quality control, categorization tasks, and predictive analytics. The efficiency of neural networks (NN) in simulating intricate nonlinear interactions in food datasets is especially highlighted. These models have been used to evaluate meat and dairy product quality, predict moisture content, and detect spoiling. Random Forest (RF) and Support Vector Machines (SVM) are widely employed for classification tasks, particularly in detecting food adulteration and identifying foodborne illnesses. These models are renowned for their capacity to produce feature importance rankings and resilience when working with high-dimensional datasets.The importance of BDA and ML in determining and upholding food quality standards is a key focus of the research. Words like ‘Detection,’ ‘Pathogen,’ ‘Foodborne,’ and ‘Safety’ emphasize how crucial machine learning models are for spotting threats to food safety, including spoiling and microbial contamination. Much research has concentrated on employing AI-based techniques to identify pathogens in real time to keep food products safe for consumption. The word ‘Quality’ further highlights the role of BDA and ML in this area. Methods like spectroscopy, image analysis, and sensor-based monitoring have been used to assess the quality of fruits, vegetables, meat, and dairy products.Words like “IoT,” “Blockchain,” and “Smart Agriculture” describe how cutting-edge technology is being incorporated into supply chain management and food production. Real-time food product tracking from farm to fork is made possible by the Internet of Things (IoT), which also improves traceability and lowers waste. Blockchain technology is being utilized increasingly to guarantee accountability and transparency in the food supply chain, thwarting fraud and ensuring that goods fulfill legal requirements. “smart agriculture” describes using BDA and ML to maximize crop yields, minimize resource waste, and optimize agricultural operations that include precision farming methods that use drones, sensors, and artificial intelligence (AI) algorithms to increase output. Words like “Algorithm,” “Modeling,” and “Optimization” suggest that a large portion of the research is concerned with creating and improving algorithms to manage intricate datasets from the food sector. However, problems like data quality, computing demands, and the need for qualified staff make applying these algorithms practically in real-world situations difficult. The k-nearest Neighbor (k-NN) technique is referred to as “Neighbor” and is praised for its ease of use and efficiency in smaller datasets.The frequent use of words like “predictive,” “automation,” and “monitoring” suggests that future studies will likely focus on creating predictive models that can automate various food processing operations and provide real-time food safety and quality monitoring. This emphasis on predictive models should instill a sense of optimism about the potential of technology to revolutionize the food industry. Terms like “Frameworks” and “Integration” indicate a growing interest in developing unified systems that combine several data sources and machine learning models to enhance operations throughout the food industry. The word cloud offers insight into the present research landscape concerning BDA and ML applications within the food industry. It strongly emphasizes data-driven solutions for food safety, quality, and processing. Advance big data analytics approaches such as neural networks, IoT, blockchain, and deep learning are found to be the central theme of the presented research, with an increasing focus on real-time monitoring, automation, and predictive analytics.The review underscores the possibilities and challenges linked to implementing these technologies in the food sector. Future research must overcome practical challenges, including data quality concerns and computing demands, while investigating novel applications of emerging technology to guarantee food security and sustainability.Emerging trends and gapsSeveral emerging trends have been identified through the network and word cloud analyses conducted as part of this review. The interconnectedness of themes such as AI, Big Data, IoT, and Food Safety suggests that these are central focus areas in current research. However, there are also notable gaps that require further exploration.One significant gap is the underutilization of experimental methodologies to validate proposed solutions in real-world scenarios. While many studies employ theoretical and modeling approaches, there is a need for more empirical research to ensure the practical applicability of these methods. Additionally, there is a limited focus on developing new algorithms tailored to specific applications in the food industry, indicating an opportunity for innovation in this area.Another gap identified is the lack of research on sustainability and smart agriculture themes. Given the growing importance of sustainability in food production, future research should explore how digital technologies can promote sustainable practices and improve resource efficiency in agriculture.Methodology gaps and recommendationsThis systematic review has identified several key gaps in the methodologies used across the reviewed studies:

Underutilization of experimental and simulation methodologies: Many studies rely heavily on surveys and modeling techniques without conducting real-world experiments to validate their findings. To address this gap, future research should incorporate more experimental methodologies to test proposed solutions in practical settings.

Need for algorithm development: Despite the frequent mention of AI and Machine Learning, few studies focus on developing new algorithms specifically tailored to food industry applications. Future research should aim to create more specialized algorithms that improve the accuracy and efficiency of predictive models.

Limited focus on sustainability and smart agriculture: Themes related to sustainability and smart agriculture appear less frequently in the reviewed studies. Given the increasing importance of these areas, future research should explore how digital technologies can support sustainable food production and smart farming practices.

ConclusionThe present research underscores the transformative potential of Big Data Analytics (BDA) and Machine Learning (ML) in the food sector. These technologies hold the promise of significantly improving food safety, processing efficiency, and product quality. The integration of BDA and ML to address complex challenges, such as pathogen identification, food spoilage prediction, and quality assessment, is a thrilling prospect that is vital for protecting public health and enhancing sustainability.To fully realize this potential, specific crucial areas require further research and development. A primary challenge in utilizing BDA and ML in food operations is the urgent need to guarantee the precision and reliability of the data included in predictive models. High-quality data is not just essential, it is a cornerstone for creating robust algorithms capable of producing precise predictions. However, food-related datasets often include noise, inaccuracies, and missing values, underscoring the significance of improving data preprocessing and integration methods. The study underscores the imperative for resilient AI-ML systems adept at handling augmented data volumes. This aligns with the growing demand for scalable solutions that can handle real-time data and adapt to diverse food processing environments. Enhancing the scalability and efficiency of these models is crucial for their practical use in real-world scenarios.A crucial element is the necessity for substantial computational resources and the availability of skilled personnel to manage and analyze extensive datasets. Despite advancements in technology that facilitate the processing of vast datasets, there is still a need to design more efficient algorithms with less computational requirements. Reducing computational requirements will improve the accessibility of these technologies for small- and medium-sized enterprises (SMEs) in the food sector, who often lack the resources to invest in sophisticated computing equipment. Moreover, data literacy and skilled personnel are essential. Advocating to integrate data analytics and machine learning courses into food science curricula could mitigate the current skills gap.The paper also underscores the importance of interdisciplinary collaboration among data scientists, food technologists, and industry specialists. This collaboration is not just important, it is a shared responsibility for transforming theoretical advancements into actual implementations. Researchers and industry stakeholders must collaborate to identify specific challenges within the food sector that can be addressed through Big Data Analytics (BDA) and Machine Learning (ML) and collectively develop efficient and user-friendly solutions. Furthermore, governments play a vital role in promoting this advancement by implementing legislative initiatives that enhance research and development, facilitate data sharing, and encourage the implementation of big data analytics technology. Policies that reduce the standardization of data formats and create incentives for inter-organizational data exchange would enhance the effectiveness of BDA and ML applications.An essential consideration is selecting the most suitable machine learning approaches for particular applications in the food sector, as different models possess distinct advantages and disadvantages. Support Vector Machines (SVM) are among the most effective algorithms for classification tasks, especially in the presence of nonlinear correlations between input characteristics and output predictions. Support Vector Machines (SVM) have effectively identified food adulteration and categorized muscle myopathies in poultry meat. The primary advantage of SVM is its capacity to manage high-dimensional datasets and nonlinear relationships, rendering it suitable for applications necessitating accurate categorization. Nonetheless, SVM models necessitate meticulous parameter optimization (including kernel type, regularization, and gamma values) and may incur significant processing costs for extensive datasets.Artificial Neural Networks (ANN) are esteemed for their capacity to model intricate, nonlinear relationships within extensive datasets. Artificial Neural Networks (ANNs) excel in moisture content prediction, spoiling detection, and product texture and color evaluation. The advantage of artificial neural networks resides in their capacity to autonomously discern complex patterns from data, eliminating the necessity for manual feature engineering. Nonetheless, ANN models necessitate considerable computational resources and extensive labeled datasets, which may challenge SMEs. Furthermore, artificial neural networks (ANNs) are frequently regarded as “black boxes,” indicating that their predictions are challenging to interpret, which can be a barrier in sectors where explainability is crucial.Random Forest (RF) models are preferred for their resilience and adaptability in managing noisy and high-dimensional data. RF models are frequently employed in detecting food adulteration, cheese quality evaluation, and meat tenderness prediction. A notable advantage of RF is its capacity to deliver feature importance rankings, which are a measure of how much each feature contributes to the model’s predictions, facilitating the identification of the most relevant features impacting forecasts. Furthermore, RF models exhibit a reduced susceptibility to overfitting in comparison to alternative models. Nevertheless, RF models may become computationally demanding when processing extensive information and necessitate tuning to enhance performance.The k-Nearest Neighbors (k-NN) technique is a clear and comprehensible model that performs effectively on small datasets and uncomplicated classification tasks. It has been utilized to assess fruit maturity and evaluate olive oil quality. The primary advantage of k-NN is its straightforward implementation and minimal computational expense. However, k-NN is susceptible to noisy data, which refers to data that contains errors or outliers, and incurs significant computing costs as the dataset size escalates, particularly in high-dimensional environments.Deep Learning (DL) methodologies, especially Convolutional Neural Networks (CNNs), are extensively employed in evaluating food quality using image analysis. CNN models are proficient at autonomously extracting features from visual data, rendering them appropriate for identifying adulteration in meat through hyperspectral imaging. The advantage of deep learning lies in its capacity to analyze unrefined image data without the need for manual feature extraction, making it especially beneficial for visual quality evaluations. Nonetheless, deep learning models necessitate substantial computer resources and big labeled datasets. They also experience restricted interpretability, posing a problem in practical implementations.In conclusion, the study thoroughly analyzes how big data analytics and machine learning could revolutionize the food industry while acknowledging the challenges that must be addressed for these technologies to realize their full potential. The discourse on diverse ML models underscores the advantages and drawbacks of each methodology, accentuating the necessity of choosing the appropriate model according to particular applications. Although SVM provides elevated accuracy for classification applications, it may be computationally demanding. Artificial Neural Networks have robust forecasting skills for intricate datasets, although a deficiency hinders their interpretability. Random Forest models are resilient and adaptable; nonetheless, they necessitate optimization for extensive datasets. K-NN is straightforward and cheap for small datasets but becomes impractical with bigger ones, while deep learning approaches are optimal for image-based applications but demand substantial resources. Future research should concentrate on creating scalable, efficient, and interpretable machine learning algorithms, alongside improving data pretreatment methods to guarantee that models are trained on high-quality datasets. Moreover, multidisciplinary cooperation among data scientists, food technologists, and industry stakeholders will be essential for converting theoretical progress into practical applications. By overcoming the constraints of existing models and emphasizing explainability, scalability, and data integration, the food sector can provide safer, superior-quality goods, thereby enhancing public health outcomes and sustainability.

Data availability

No datasets were generated or analyzed during the current study.

Code availability

The underlying code for this study is not publicly available but may be made available to qualified researchers on reasonable request from the corresponding author.

ReferencesManufuture. Vision 2020 and Strategic Research Agenda of the European Agricultural Machinery Industry and Research Community for the 7th Framework Programme for Research of the European Community, Brussels, Belgium. https://www.manufuture.org/wp-content/uploads/AET-Vision-and-SRA1.pdf (2006).Gupta, S., Chen, H., Hazen, B. T., Kaur, S. & Gonzalez, E. D. S. Circular economy and big data analytics: a stakeholder perspective. Technol. Forecast. Soc. Change 144, 466–474 (2019).

Google Scholar

Dubey, R. et al. Big data analytics and artificial intelligence pathway to operational performance under the effects of entrepreneurial orientation and environmental dynamism: a study of manufacturing organizations. Int. J. Prod. Econ. 226, 107599 (2020).

Google Scholar

PR Newswire. The Global Big Data Analytics Market, 2027: A $105+ Billion Opportunity Assessment. https://www.prnewswire.com/news-releases/the-global-big-data-analytics-market-2027-a-105-billion-opportunity-assessment-301014418.html (2020).Statista Research Department (SRD). Big Data Market Size Revenue Forecast Worldwide from 2011 to 2027. https://www.statista.com/statistics/254266/global-big-data-market.forecast/ (2022).Kitchenham, B. & Brereton, P. A systematic review of systematic review process research in software engineering. Inf. Softw. Technol. 55, 2075 (2013).

Google Scholar

Weiss, S. M. & Indurkhya, N. Predictive Data Mining: A Practical Guide (Morgan Kaufmann, 1998).Diebold, F. X. On the origin (s) and development of the term ‘Big Data’. http://ssrn.com/abstract=2152421 (2012).Russom, P. Big data analytics. TDWI Best. Pract. Rep. Fourth Q. 19, 1–34 (2011).

Google Scholar

Bryant, R., Katz, R. H. & Lazowska, E. D. Big-data computing: creating revolutionary breakthroughs in commerce, science and society. http://www.basexml.com/xdrp/demo/xconvertsample/pdf/big_data.pdf (2008).Gantz, J. & Reinsel, D. Extracting value from chaos. IDC iView 1142, 1–12 (2011).

Google Scholar

Chen, M., Mao, S. & Liu, Y. Big Data: A Survey. Mobile Netw Appl. 19, 171–209 (2014).

Google Scholar

Garlasu, D. et al. A big data implementation based on Grid computing. In Proc. 2013 11th RoEduNet International Conference, 1–4 (IEEE, 2013).Rodríguez-Mazahua, L. et al. A general perspective of Big Data: applications, tools, challenges and trends. J. Supercomput. 72, 3073–3113 (2016).

Google Scholar

De Mauro, A., Greco, M. & Grimaldi, M. What is big data? A consensual definition and a review of key research topics. AIP Conf. Proc. 1644, 97–104 (2015).Al-Sai, Z. A. & Abdullah, R. Big data impacts and challenges: a review. In Proc. 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT), 150–155 (IEEE, 2019).Chen, C. P. & Zhang, C. Y. Data-intensive applications, challenges, techniques and technologies: a survey on Big Data. Inf. Sci. 275, 314–347 (2014).

Google Scholar

Katal, A., Wazid, M. & Goudar, R. H. Big data: issues, challenges, tools and good practices. In Proc. 2013 Sixth International Conference on Contemporary Computing (IC3), 404–409 (IEEE, 2013).Rashidi, H. H., Tran, N. K., Betts, E. V., Howell, L. P. & Green, R. Artificial intelligence and machine learning in pathology: the present landscape of supervised methods. Acad. Pathol. 6, 2374289519873088 (2019).PubMed

PubMed Central

Google Scholar

Khan, M. I. H., Sablani, S. S., Joardder, M. U. H. & Karim, M. A. Application of machine learning-based approach in food drying: opportunities and challenges. Dry. Technol. 40, 1051–1067 (2022).

Google Scholar

Zhang, X., Zhou, T., Zhang, L., Fung, K. Y. & Ng, K. M. Food product design: a hybrid machine learning and mechanistic modeling approach. Ind. Eng. Chem. Res. 58, 16743–16752 (2019).CAS

Google Scholar

FDA. Inspection Classification Database (ICD). https://www.fda.gov/inspections.compliance enforcement-and-criminal-investigations/inspection-classification.database (2024).Misra, N. N. et al. IoT, big data, and artificial intelligence in agriculture and food industry. IEEE Internet Things J. 9, 6305–6324 (2020).

Google Scholar

Kapur, S. Food import safety and international trade: comparing regulatory regimes in India and the USA. Lex. Portus 10, 7–29 (2024).

Google Scholar

Peng, L., Li, X., Li, J., Liu, S. & Liang, G. The drug risks of cilostazol: a pharmacovigilance study of FDA Adverse Event Reporting System database. PLoS ONE 19, e0314957 (2024).CAS

PubMed

PubMed Central

Google Scholar

FDA. President’s FY 2022 Budget Request: Key Investments for Food Safety. https://www.fda.gov/media/149883/download (2022).Johnson, R. The Federal Food Safety System: A Primer (Congressional Research Service, 2012).Metcalf, J. & Crawford, K. Where are human subjects in big data research? The emerging ethics divide. Big Data Soc. 3, 2053951716650211 (2016).

Google Scholar

Ahuja, J. K., Moshfegh, A. J., Holden, J. M. & Harris, E. USDA food and nutrient databases provide the infrastructure for food and nutrition research, policy, and practice. J. Nutr. 143, 241S–249S (2013).CAS

PubMed

Google Scholar

FSIS Open Government Dataset Portal. FSIS Food Safety Data. https://www.fsis.usda.gov/wps/portal/fsis/topics/data-collection-and-reports/data (2023).National Research Council (NRC), Division on Earth, Life Studies & Committee on an Evaluation of the Food Safety Requirements of the Federal Purchase Ground Beef Program. An Evaluation of the Food Safety Requirements of the Federal Purchase Ground Beef Program. The National Academies Press Washington, DC (2011).Kemmett, K. The Characterisation and Epidemiology of Avian Pathogenic Escherchia coli in UK Broiler Chickens. Doctoral dissertation, University of Liverpool (2013).Swaminathan, B., Barrett, T. J., Hunter, S. B., Tauxe, R. V. & Force, C. P. T. PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerg. Infect. Dis. 7, 382 (2001).Crarey, E., Kabera, C. & Tate, H. Monitoring and surveillance: the National Antimicrobial Resistance Monitoring System. Antimicrobial Resistance and Food Safety, 259–282 (Elsevier, 2015).Gupta, A. et al. Antimicrobial resistance among Campylobacter strains, United States, 1997–2001. Emerg. Infect. Dis. 10, 1102 (2004).Karp, B. E. et al. National antimicrobial resistance monitoring system: two decades of advancing public health through integrated surveillance of antimicrobial resistance. Foodborne Pathog. Dis. 14, 545–557 (2017).PubMed

PubMed Central

Google Scholar

Scallan, E. & Mahon, B. E. Foodborne Diseases Active Surveillance Network (FoodNet) in 2012: a foundation for food safety in the United States. Clin. Infect. Dis. 54, S381–S384 (2012).PubMed

PubMed Central

Google Scholar

Hall, A. J. et al. Acute gastroenteritis surveillance through the national outbreak reporting system, United States. Emerg. Infect. Dis. 19, 1305 (2013).PubMed

PubMed Central

Google Scholar

Merten, C. et al. Methodological characteristics of the national dietary surveys carried out in the European Union as included in the European Food Safety Authority (EFSA) Comprehensive European Food Consumption Database. Food Addit. Contam. Part A 28, 975–995 (2011).CAS

Google Scholar

European Food Safety Authority. The European Union One Health 2019 Zoonoses report. EFSA J. 19, e06406 (2021).Postolache, A. N. et al. Analysis of RASFF notifications on contaminated dairy products from the last two decades: 2000–2020. Rom. Biotechnol. Lett. 25, 1396–1406 (2020).

Google Scholar

Kowalska, A. & Manning, L. Using the rapid alert system for food and feed: potential benefits and problems on data interpretation. Crit. Rev. Food Sci. Nutr. 61, 906–919 (2021).PubMed

Google Scholar

Mu, W. et al. Making food systems more resilient to food safety risks by including artificial intelligence, big data, and Internet of Things into food safety early warning and emerging risk identification tools. Compr. Rev. Food Sci. Food Saf. 23, e13296 (2024).PubMed

Google Scholar

Joint UNEP/FAO/WHO Food Contamination Monitoring and Assessment Programme & World Health Organization. Report of the Third Meeting of the GEMS (No. WHO/HPP/FOS/92.5. Unpublished) (World Health Organization, 1992).Abid, H. M. R. et al. Quantitative and qualitative approach for accessing and predicting food safety using various web-based tools. Food Control 162, 110471 (2024).Brown, E. W., Gonzalez-Escalona, N., Stones, R., Timme, R. & Allard, M. W. The rise of genomics and the promise of whole genome sequencing for understanding microbial foodborne pathogens. Foodborne Pathogens: Virulence Factors and Host Susceptibility, 333–351 (Springer, 2017).Timme, R. E. et al. GenomeTrakr proficiency testing for foodborne pathogen surveillance: an exercise from 2015. Microb. Genomics 4, e000185 (2018).Timme, R. E. et al. Phylogenomic pipeline validation for foodborne pathogen disease surveillance. J. Clin. Microbiol. 57, 10–1128 (2019).

Google Scholar

Zhou, Z. et al. The EnteroBase user’s guide, with case studies on Salmonella transmissions, Yersinia pestis phylogeny, and Escherichia core genomic diversity. Genome Res. 30, 138–152 (2020).CAS

PubMed

PubMed Central

Google Scholar

Sayers, E. W. et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 49, D10 (2021).CAS

PubMed

Google Scholar

Moy, G. G. & Vannoort, R. W. (eds) Total Diet Studies, 169–177 (Springer, 2013).Martinez, M. G., Fearne, A., Caswell, J. A. & Henson, S. Co-regulation as a possible model for food safety governance: opportunities for public–private partnerships. Food Policy 32, 299–314 (2007).

Google Scholar

Marvin, H. J., Janssen, E. M., Bouzembrak, Y., Hendriksen, P. J. & Staats, M. Big data in food safety: an overview. Crit. Rev. Food Sci. Nutr. 57, 2286–2295 (2017).PubMed

Google Scholar

Wieczorek, J. et al. Darwin Core: an evolving community-developed biodiversity data standard. PLoS ONE 7, e29715 (2012).CAS

PubMed

PubMed Central

Google Scholar

Tao, Q., Ding, H., Wang, H. & Cui, X. Application research: big data in food industry. Foods 10, 2203 (2021).PubMed

PubMed Central

Google Scholar

Freudenthal, M. & Willemson, J. Challenges of federating national data access infrastructures. In Proc. International Conference for Information Technology and Communications, 104–114 (Springer, 2017).Snow, V. et al. Resilience achieved via multiple compensating subsystems: the immediate impacts of COVID-19 control measures on the agri-food systems of Australia and New Zealand. Agric. Syst. 187, 103025 (2021).

Google Scholar

Willemson, J. Pseudonymization service for X-road eGovernment data exchange layer. In Proc. International Conference on Electronic Government and the Information Systems Perspective, 135–145 (Springer, 2011).Lezoche, M., Hernandez, J. E., Díaz, M. D. M. E. A., Panetto, H. & Kacprzyk, J. Agri food 4.0: a survey of the supply chains and technologies for the future agriculture. Comput. Ind. 117, 103187 (2020).

Google Scholar

Abass, T., Eruaga, M. A., Itua, E. O. & Bature, J. T. Advancing food safety through IoT: real-time monitoring and control systems. Int. Med. Sci. Res. J. 4, 276–283 (2024).

Google Scholar

Khan, P. W., Byun, Y. C. & Park, N. IoT-blockchain enabled optimized provenance system for food industry 4.0 using advanced deep learning. Sensors 20, 2990 (2020).PubMed

PubMed Central

Google Scholar

Leng, J. et al. Blockchain empowered sustainable manufacturing and product lifecycle management in industry 4.0: a survey. Renew. Sustain. Energy Rev. 132, 110112 (2020).

Google Scholar

Salah, K., Nizamuddin, N., Jayaraman, R. & Omar, M. Blockchain-based soybean traceability in agricultural supply chain. IEEE Access 7, 73295–73305 (2019).

Google Scholar

Mazumdar, S., Jensen, T., Mukkamala, R. R., Kauffman, R. J. & Damsgaard, J. D blockchain and IoT architecture create informedness to support provenance tracking in the product lifecycle? In Proc. 54th Annual Hawaii International Conference on System Sciences (HICSS), 1497–1506 (Hawaii International Conference on System Sciences, 2021).Raj, M. et al. A survey on the role of Internet of Things for adopting and promoting Agriculture 4.0. J. Netw. Comput. Appl. 187, 103107 (2021).

Google Scholar

Boursianis, A. D. et al. Internet of Things (IoT) and agricultural unmanned aerial vehicles (UAVs) in smart farming: a comprehensive review. Internet Things 18, 100187 (2022).

Google Scholar

Singh, R. K., Berkvens, R. & Weyn, M. AgriFusion: an architecture for IoT and emerging technologies based on a precision agriculture survey. IEEE Access 9, 136253–136283 (2021).

Google Scholar

Fuentes-Peñailillo, F., Gutter, K., Vega, R. & Silva, G. C. Transformative technologies in digital agriculture: leveraging Internet of Things, remote sensing, and artificial intelligence for smart crop management. J. Sens. Actuator Netw. 13, 39 (2024).

Google Scholar

Sharma, K. & Shivandu, S. K. Integrating artificial intelligence and Internet of Things (IoT) for enhanced crop monitoring and management in precision agriculture. Sens. Int. 5, 100292 (2024).Lloret, J., Sendra, S., Garcia, L. & Jimenez, J. M. A wireless sensor network deployment for soil moisture monitoring in precision agriculture. Sensors 21, 7243 (2021).CAS

PubMed

PubMed Central

Google Scholar

Thakur, D., Kumar, Y., Kumar, A. & Singh, P. K. Applicability of wireless sensor networks in precision agriculture: a review. Wirel. Pers. Commun. 107, 471–512 (2019).

Google Scholar

Yin, H. et al. Soil sensors and plant wearables for smart and precision agriculture. Adv. Mater. 33, 2007764 (2021).CAS

Google Scholar

Ayaz, M., Ammad-Uddin, M., Sharif, Z., Mansour, A. & Aggoune, E. H. M. Internet-of Things (IoT)-based smart agriculture: toward making the fields talk. IEEE Access 7, 129551–129583 (2019).

Google Scholar

Sharma, B. & Koundal, D. Cattle health monitoring system using wireless sensor network: a survey from innovation perspective. IET Wirel. Sens. Syst. 8, 143–151 (2018).

Google Scholar

Hassan, M., Park, J. H. & Han, M. H. Enhancing livestock management with IoT-base wireless sensor networks: a comprehensive approach for health monitoring, location tracking, behavior analysis, and environmental optimization. J. Sustain. Urban Futures 13, 34–46 (2023).

Google Scholar

Rao, P. S., Anantha Raman, G. R., Rao, M. S. S., Radha, K. & Ahmed, R. Enhancing orchard cultivation through drone technology and deep stream algorithms in precision agriculture. Int. J. Adv. Comput. Sci. Appl. 15, 781–795 (2024).Mustafa, S. et al. Precision agriculture and unmanned aerial vehicles (UAVs) in Agriculture and Aquaculture Applications of Biosensors and Bioelectronics, 83–108 (IGI Global, 2024).Panda, S. S. et al. Optimizing Sericea Lespedeza fodder production in the southeastern US: a climate-informed geospatial engineering approach. Agriculture 13, 1661 (2023).

Google Scholar

Panda, S. S. et al. Development of a decision support system for animal health management using geo-information technology: a novel approach to precision livestock management. Agriculture 14, 696 (2024).

Google Scholar

Lasi, H., Fettke, P., Kemper, H. G., Feld, T. & Hoffmann, M. Industry 4.0. Bus. Inf. Syst. Eng. 6, 239–242 (2014).

Google Scholar

Mondino, P. & González-Andújar, J. L. Evaluation of a decision support system for crop protection in apple orchards. Comput. Ind. 107, 99–103 (2019).

Google Scholar

Rejeb, A., Keogh, J. G. & Treiblmaier, H. Leveraging the Internet of Things and blockchain technology in supply chain management. Future Internet 11, 161 (2019).

Google Scholar

Benos, L. et al. Machine learning in agriculture: a comprehensive updated review. Sensors 21, 3758 (2021).PubMed

PubMed Central

Google Scholar

Saiz-Rubio, V. & Rovira-Más, F. From smart farming towards agriculture 5.0: a review on crop data management. Agronomy 10, 207 (2020).

Google Scholar

Choi, J. et al. Secure IoT framework and 2D architecture for end-to-end security. J. Supercomput. 74, 3521–3535 (2018).

Google Scholar

Lee, J. et al. SoEasy: a software framework for easy hardware control programming for diverse IoT platforms. Sensors 18, 2162 (2018).PubMed

PubMed Central

Google Scholar

Da Xu, L., He, W. & Li, S. Internet of things in industries: a survey. IEEE Trans. Ind. Inform. 10, 2233–2243 (2014).

Google Scholar

Maiorino, A., Petruzziello, F. & Aprea, C. Refrigerated transport: state of the art, technical issues, innovations and challenges for sustainability. Energies 14, 7237 (2021).CAS

Google Scholar

El Himer, M. Innovation in Condition Monitoring and Predictive Maintenance Solutions in Industrial Contexts. Master’s thesis, University of Stavanger (2019).McFarlane, I. Automatic Control of Food Manufacturing Processes (Springer, 2012).Feng, H., Wang, X., Duan, Y., Zhang, J. & Zhang, X. Applying blockchain technology to improve agri-food traceability: a review of development methods, benefits and challenges. J. Clean. Prod. 260, 121031 (2020).

Google Scholar

Rana, R. L., Tricase, C. & De Cesare, L. Blockchain technology for a sustainable agri food supply chain. Br. Food J. 123, 3471–3485 (2021).Tan, A., Gligor, D. & Ngah, A. Applying blockchain for halal food traceability. Int. J. Logist. Res. Appl. 25, 947–964 (2022).

Google Scholar

Nguyen, H. & Do, L. The adoption of blockchain in food retail supply chain: case: IBM Food Trust blockchain and the food retail supply chain in Malta. https://www.theseus.fi/handle/10024/158615 (2018).Tan, W. C. & Sidhu, M. S. Review of RFID and IoT integration in supply chain management. Oper. Res. Perspect. 9, 100229 (2022).

Google Scholar

Farah, J. S. et al. Differential scanning calorimetry coupled with machine learning technique: an effective approach to determine the milk authenticity. Food Control 121, 107585 (2021).CAS

Google Scholar

Charismadiptya, G. C. The Design and Implementation of Situation Aware Smart Logistics in Perishable Food Transportation. Master’s thesis, University of Twente (2018).Xu, J. L., Riccioli, C. & Sun, D. W. An overview on nondestructive spectroscopic techniques for lipid and lipid oxidation analysis in fish and fish products. Compr. Rev. Food Sci. Food Saf. 14, 466–477 (2015).CAS

Google Scholar

Brookings. Eurostats. https://www.brookings.edu/research/chinas-influence-on-the.global-middle-class/ (2019).Data Reportal. Kepios. https://datareportal.com/social-media-users (2022).Blackwell, R. D. & Blackwell, K. S. Creating consumer-driven demand chains in food service. Quick Service Restaurants, Franchising, and Multi-Unit Chain Management, 137 (Routledge, 2014).Blazquez, D. & Domenech, J. Big Data sources and methods for social and economic analyses. Technol. Forecast. Soc. Change 130, 99–113 (2018).

Google Scholar

Cios, K. J., Pedrycz, W. & Swiniarski, R. W. Data Mining Methods for Knowledge Discovery, Vol. 458 (Springer, 2012).Klassen, K. M. et al. What people “like”: analysis of social media strategies used by food industry brands, lifestyle brands, and health promotion organizations on Facebook and Instagram. J. Med. Internet Res. 20, e10227 (2018).PubMed

PubMed Central

Google Scholar

Fried, D., Surdeanu, M., Kobourov, S., Hingle, M. & Bell, D. Analyzing the language of food on social media. In Proc. 2014 IEEE International Conference on Big Data (Big Data), 778–783 (IEEE, 2014).Singh, A., Shukla, N. & Mishra, N. Social media data analytics to improve supply chain management in food industries. Transp. Res. Part E Logist. Transp. Rev. 114, 398–415 (2018).

Google Scholar

Singh, R. & Singh, R. Applications of sentiment analysis and machine learning techniques in disease outbreak prediction—a review. Mater. Today Proc. 81, 1006–1011 (2021).Soon, J. M. Consumers’ awareness and trust toward food safety news on social media in Malaysia. J. Food Prot. 83, 452–459 (2020).PubMed

Google Scholar

Widom, J. Research problems in data warehousing. In Proc. Fourth International Conference on Information and Knowledge Management, 25–30 (1995).Young, W., Russell, S. V., Robinson, C. A. & Barkemeyer, R. Can social media be a tool for reducing consumers’ food waste? A behaviour change experiment by a UK retailer. Resour. Conserv. Recycl. 117, 195–203 (2017).

Google Scholar

Wang, G., Gunasekaran, A., Ngai, E. W. & Papadopoulos, T. Big data analytics in logistics and supply chain management: certain investigations for research and applications. Int. J. Prod. Econ. 176, 98–110 (2016).

Google Scholar

Sarker, I. H. Machine learning: algorithms, real-world applications and research directions. SN Comput. Sci. 2, 1–21 (2021).

Google Scholar

Todd, E. C., Greig, J. D., Bartleson, C. A. & Michaels, B. S. Outbreaks where food workers have been implicated in the spread of foodborne disease. Part 3. Factors contributing to outbreaks and description of outbreak categories. J. Food Prot. 70, 2199–2217 (2007).PubMed

Google Scholar

Cameron, T. A. A new paradigm for valuing non-market goods using referendum data: maximum likelihood estimation by censored logistic regression. J. Environ. Econ. Manag. 15, 355–379 (1988).

Google Scholar

Liu, X., Wang, G. A., Fan, W. & Zhang, Z. Finding useful solutions in online knowledge communities: a theory-driven design and multilevel analysis. Inf. Syst. Res. 31, 731–752 (2020).

Google Scholar

ElMasry, G. M. & Nakauchi, S. Image analysis operations applied to hyperspectral images for non-invasive sensing of food quality—a comprehensive review. Biosyst. Eng. 142, 53–82 (2016).

Google Scholar

Kaufman, J. et al. A likelihood-based approach to identifying contaminated food products using sales data: performance and challenges. PLoS Comput. Biol. 10, e1003692 (2014).PubMed

PubMed Central

Google Scholar

Norström, M., Kristoffersen, A. B., Görlach, F. S., Nygård, K. & Hopp, P. An adjusted likelihood ratio approach analysing distribution of food products to assist the investigation of foodborne outbreaks. PLoS ONE 10, e0134344 (2015).PubMed

PubMed Central

Google Scholar

Greis, N. P. & Nogueira, M. L. A data-driven approach to food safety surveillance and response in Food Protection and Security, 75–99 (Woodhead, 2017).Harris, J. K. et al. Using Twitter to identify and respond to food poisoning. The Food Safety STL Project. J. Public Health Manag. Pract. 23, 577 (2017).PubMed

PubMed Central

Google Scholar

Devinney, K. et al. Evaluating Twitter for foodborne illness outbreak detection in New York City. Online J. Public Health Inform. 10, e120 (2018).Harrison, S. & Johnson, P. Challenges in the adoption of crisis crowdsourcing and social media in Canadian emergency management. Gov. Inf. Q. 36, 501–509 (2019).

Google Scholar

Effland, T. et al. Discovering foodborne illness in online restaurant reviews. J. Am. Med. Inform. Assoc. 25, 1586–1592 (2018).PubMed

PubMed Central

Google Scholar

Maharana, A. et al. Detecting reports of unsafe foods in consumer product reviews. JAMIA Open 2, 330–338 (2019).PubMed

PubMed Central

Google Scholar

Oldroyd, R. A., Morris, M. A. & Birkin, M. Identifying methods for monitoring foodborne illness: review of existing public health surveillance techniques. JMIR Public Health Surveill. 4, e8218 (2018).

Google Scholar

Sadilek, A. et al. Machine-learned epidemiology: real-time detection of foodborne illness at scale. NPJ Digit. Med. 1, 1–7 (2018).

Google Scholar

WHO. WHO Estimates of the Global Burden of Foodborne Diseases (WHO, 2015).Gracias, K. S. & McKillip, J. L. A review of conventional detection and enumeration methods for pathogenic bacteria in food. Can. J. Microbiol. 50, 883–890 (2004).CAS

PubMed

Google Scholar

Vanegas, D. C., Gomes, C. L., Cavallaro, N. D., Giraldo‐Escobar, D. & McLamore, E. S. Emerging biorecognition and transduction schemes for rapid detection of pathogenic bacteria in food. Compr. Rev. Food Sci. Food Saf. 16, 1188–1205 (2017).CAS

PubMed

Google Scholar

Granato, D. et al. Trends in chemometrics: food authentication, microbiology, and effects of processing. Compr. Rev. Food Sci. Food Saf. 17, 663–677 (2018).PubMed

Google Scholar

Kemsley, E. K., Defernez, M. & Marini, F. Multivariate statistics: considerations and confidences in food authenticity problems. Food Control 105, 102–112 (2019).

Google Scholar

Sârbu, C. et al. Classification and fingerprinting of kiwi and pomelo fruits by multivariate analysis of chromatographic and spectroscopic data. Food Chem. 130, 994–1002 (2012).

Google Scholar

Pu, Y., Wang, W. & Alfano, R. R. Optical detection of meat spoilage using fluorescence spectroscopy with selective excitation wavelength. Appl. Spectrosc. 67, 210–213 (2013).CAS

PubMed

Google Scholar

Marcos-Martinez, D., Ayala, J. A., Izquierdo-Hornillos, R. C., de Villena, F. M. & Caceres, J. O. Identification and discrimination of bacterial strains by laser induced breakdown spectroscopy and neural networks. Talanta 84, 730–737 (2011).CAS

PubMed

Google Scholar

Liao, W. et al. A novel strategy for rapid detection of bacteria in water by the combination of three-dimensional surface-enhanced Raman scattering (3D SERS) and laser induced breakdown spectroscopy (LIBS). Anal. Chim. Acta 1043, 64–71 (2018).CAS

PubMed

Google Scholar

Argyri, A. A., Panagou, E. Z., Tarantilis, P. A., Polysiou, M. & Nychas, G. J. Rapid qualitative and quantitative detection of beef fillets spoilage based on Fourier transform infrared spectroscopy data and artificial neural networks. Sens. Actuators B Chem. 145, 146–154 (2010).CAS

Google Scholar

Lu, W., Chen, X., Wang, L., Li, H. & Fu, Y. V. Combination of an artificial intelligence approach and laser tweezers Raman spectroscopy for microbial identification. Anal. Chem. 92, 6288–6296 (2020).CAS

PubMed

Google Scholar

Fengou, L. C. et al. Evaluation of Fourier transform infrared spectroscopy and multispectral imaging as means of estimating the microbiological spoilage of farmed sea bream. Food Microbiol. 79, 27–34 (2019).CAS

PubMed

Google Scholar

Spyrelli, E. D., Ozcan, O., Mohareb, F., Panagou, E. Z. & Nychas, G. J. E. Spoilage assessment of chicken breast fillets by means of Fourier transform infrared spectroscopy and multispectral image analysis. Curr. Res. Food Sci. 4, 121–131 (2021).CAS

PubMed

PubMed Central

Google Scholar

Feng, X., He, L., Cheng, Q., Long, X. & Yuan, Y. Hyperspectral and multispectral remote sensing image fusion based on endmember spatial information. Remote Sens. 12, 1009 (2020).

Google Scholar

Michael, M., Phebus, R. K. & Amamcharla, J. Hyperspectral imaging of common foodborne pathogens for rapid identification and differentiation. Food Sci. Nutr. 7, 2716–2725 (2019).CAS

PubMed

PubMed Central

Google Scholar

Bonah, E. et al. Detection of Salmonella Typhimurium contamination levels in fresh pork samples using electronic nose smellprints in tandem with support vector machine regression and metaheuristic optimization algorithms. J. Food Sci. Technol. 58, 3861–3870 (2021).CAS

PubMed

Google Scholar

Bonah, E. et al. Comparison of variable selection algorithms on vis-NIR hyperspectral imaging spectra for quantitative monitoring and visualization of bacterial foodborne pathogens in fresh pork muscles. Infrared Phys. Technol. 107, 103327 (2020).CAS

Google Scholar

Wu, J., Chen, S. & Liu, X. Efficient hyperparameter optimization through model-based reinforcement learning. Neurocomputing 409, 381–393 (2020).

Google Scholar

Siddique, A. et al. Development of predictive classification models and extraction of signature wavelengths for the identification of spoilage in chicken breast fillets during storage using near infrared spectroscopy. Food Bioproc. Technol. 18, 933–941 (2024).Rowa’Al Ramahi, A. N. Z. & Abu-Khalaf, N. Evaluating the potential use of electronic tongue in early identification and diagnosis of bacterial infections. Infect. Drug Resist. 12, 2445 (2019).PubMed

Google Scholar

Ghrissi, H., Veloso, A. C., Marx, Í. M., Dias, T. & Peres, A. M. A potentiometric electronic tongue as a discrimination tool of water-food indicator/contamination bacteria. Chemosensors 9, 143 (2021).CAS

Google Scholar

Mannan, F. Classification of Face Images Based on Gender using Dimensionality Reduction Techniques and SVM. http://www.cim.mcgill.ca/%7Efmannan/comp652/comp652_report_FM.pdf (2016).Ghoshal, G. Emerging food processing technologies in Food Processing for Increased Quality and Consumption, 29–65 (Academic, 2018).Rabbinge, R. The ecological background of food production. Ciba Found. Symp. 177, 2–29 (1993).Size, G. M. (2020). Share & Trends Analysis Report By Source (Biodiesel, Fatty Acids, Fatty Alcohols, Soap), by Type (Crude, Refined) By End Use (Food & Beverage, Pharmaceutical). By Region, And Segment Forecasts, 2027.Pelto, G. H. & Pelto, P. J. Diet and delocalization: dietary changes since 1750. J. Interdiscip. Hist. 14, 507–528 (1983).CAS

PubMed

Google Scholar

Leistner, L. & Gould, G. W. Hurdle Technologies: Combination Treatments for Food Stability, Safety and Quality: Combination Treatments for Food Stability, Safety, and Quality (Springer, 2002).Jomaa, W. & Puiggali, J. R. Drying of shrinking materials: modellings with shrinkage velocity. Dry. Technol. 9, 1271–1293 (1991).

Google Scholar

Kiranoudis, C. T., Tsami, E. & Maroulis, Z. B. Microwave vacuum drying kinetics of some fruits. Dry. Technol. 15, 2421–2440 (1997).

Google Scholar

Putranto, A., Chen, X. D., Devahastin, S., Xiao, Z. & Webley, P. A. Application of the reaction engineering approach (REA) for modeling intermittent drying under time varying humidity and temperature. Chem. Eng. Sci. 66, 2149–2156 (2011).CAS

Google Scholar

Mabrouk, S. B., Benali, E. & Oueslati, H. Experimental study and numerical modelling of drying characteristics of apple slices. Food Bioprod. Process. 90, 719–728 (2012).

Google Scholar

Page, G. E. Factors Influencing the Maximum Rates of Air Drying Shelled Corn in Thin Layers (Purdue University, 1949).Hendorson, S. M. Grain drying theory (I) temperature effect on drying coefficient. J. Agric. Eng. Res. 6, 169–174 (1961).

Google Scholar

Lewis, W. K. The rate of drying of solid materials. Ind. Eng. Chem. 13, 427–432 (1921).CAS

Google Scholar

de Oliveira Campos, B. L., da Costa, A. O. S., de Souza Figueiredo, K. C. & da Costa Junior, E. F. Performance comparison of different mathematical models in the simulation of a solar desalination by humidification-dehumidification. Desalination 437, 184–194 (2018).

Google Scholar

Afoakwa, E. O., Yenyi, S. E. & Sakyi-Dawson, E. Response surface methodology for optimizing the pre-processing conditions during canning of a newly developed and promising cowpea (Vigna unguiculata) variety. J. Food Eng. 73, 346–357 (2006).

Google Scholar

Rao, D. G. Fundamentals of Food Engineering (PHI Learning, 2023).Gao, W. et al. The status, challenges, and future of additive manufacturing in engineering. Comput. Aided Des. 69, 65–89 (2015).

Google Scholar

Feyissa, A. H. Robust Modelling of Heat and Mass Transfer in Processing of Solid Foods (DTU Food, 2011).Kumar, M., Madhumita, M., Prabhakar, P. K. & Basu, S. Refractance window drying of food and biological materials: status on mechanisms, diffusion modelling and hybrid drying approach. Crit. Rev. Food Sci. Nutr. 64, 3458–3481 (2024).PubMed

Google Scholar

Kim, H. S., Kim, O. W., Kim, H., Lee, H. J. & Han, J. W. Thin layer drying model of sorghum. J. Biosyst. Eng. 41, 357–364 (2016).

Google Scholar

Adnouni, M. et al. Computational modelling for decarbonised drying of agricultural products: sustainable processes, energy efficiency, and quality improvement. J. Food Eng. 338, 111247 (2023).CAS

Google Scholar

Lagnevik, M. (ed.) The Dynamics of Innovation Clusters: A Study of the Food Industry (Edward Elgar, 2003).Van Koerten, K. N. Deep Frying: From Mechanisms to Product Quality. Doctoral dissertation, Wageningen University and Research (2016).Bhagya Raj, G. V. S. & Dash, K. K. Comprehensive study on applications of artificial neural network in food process modeling. Crit. Rev. Food Sci. Nutr. 62, 2756–2783 (2022).CAS

PubMed

Google Scholar

Gbashi, S. & Njobeh, P. B. Enhancing food integrity through artificial intelligence and machine learning: a comprehensive review. Appl. Sci. 14, 3421 (2024).CAS

Google Scholar

Miraei Ashtiani, S. H. & Martynenko, A. Toward intelligent food drying: integrating artificial intelligence into drying systems. Dry. Technol. 42, 1240–1269 (2024).

Google Scholar

Tarlak, F. The use of predictive microbiology for the prediction of the shelf life of food products. Foods 12, 4461 (2023).CAS

PubMed

PubMed Central

Google Scholar

Yoon, I., Oh, S. H. & Kim, S. W. Sustainable animal agriculture in the United States and the implication in Republic of Korea. J. Anim. Sci. Technol. 66, 279 (2024).PubMed

PubMed Central

Google Scholar

Di Vaio, A., Palladino, R., Hassan, R. & Escobar, O. Artificial intelligence and business models in the sustainable development goals perspective: a systematic literature review. J. Bus. Res. 121, 283–314 (2020).

Google Scholar

Pal, S. Revolutionizing warehousing: unleashing the power of machine learning in multi-product demand forecasting. Int. J. Res. Appl. Sci. Eng. Technol. 11, 615–619 (2023).

Google Scholar

Zatsu, V. et al. Revolutionizing the food industry: the transformative power of artificial intelligence—a review. Food Chem. X 24, 101867 (2024).Kumar, I., Rawat, J., Mohd, N. & Husain, S. Opportunities of artificial intelligence and machine learning in the food industry. J. Food Qual. 2021, 4535567 (2021).

Google Scholar

Cui, Z. et al. Artificial intelligence and food flavor: how AI models are shaping the future and revolutionary technologies for flavor food development. Compr. Rev. Food Sci. Food Saf. 24, e70068 (2025).PubMed

Google Scholar

Shirai, S. S., Seneviratne, O., Gordon, M. E., Chen, C. H. & McGuinness, D. L. Identifying ingredient substitutions using a knowledge graph of food. Front. Artif. Intell. 3, 621766 (2021).PubMed

PubMed Central

Google Scholar

Onyeaka, H. et al. Using artificial intelligence to tackle food waste and enhance the circular economy: maximising resource efficiency and Minimising environmental impact: a review. Sustainability 15, 10482 (2023).

Google Scholar

Hernandez-Perez, J. A., Garcıa-Alvarado, M. A., Trystram, G. & Heyd, B. Neural networks for the heat and mass transfer prediction during drying of cassava and mango. Innov. Food Sci. Emerg. Technol. 5, 57–64 (2004).

Google Scholar

Standing, C. N. Individual heat transfer modes in band oven biscuit baking. J. Food Sci. 39, 267–271 (1974).

Google Scholar

Zanoni, B., Peri, C. & Pierucci, S. A study of the bread-baking process. I: A phenomenological model. J. Food Eng. 19, 389–389 (1993).

Google Scholar

ÖZILGEN, M. & Heil, J. R. Mathematical modeling of transient heat and mass transport in a baking biscuit. J. Food Process. Preserv. 18, 133–148 (1994).

Google Scholar

Sablani, S. S., Marcotte, M., Baik, O. D. & Castaigne, F. Modeling of simultaneous heat and water transport in the baking process. LWT-Food Sci. Technol. 31, 201–209 (1998).CAS

Google Scholar

Goyal, S. & Goyal, G. K. Simulated neural network intelligent computing models for predicting shelf life of soft cakes. Glob. J. Comput. Sci. Technol. 11, 29–33 (2011).

Google Scholar

Emerald, F. et al. Modelling approaches for predicting moisture transfer during baking of chhana podo (milk cake) incorporated with tikhur (Curcuma angustifolia) starch. J. Food Meas. Charact. 14, 2981–2997 (2020).

Google Scholar

Banooni, S., Hosseinalipour, S. M., Mujumdar, A. S., Taherkhani, P. & Bahiraei, M. Baking of flat bread in an impingement oven: modeling and optimization. Dry. Technol. 27, 103–112 (2009).

Google Scholar

Broyart, B. & Trystram, G. Modelling of heat and mass transfer phenomena and quality changes during continuous biscuit baking using both deductive and inductive (neural network) modelling principles. Food Bioprod. Process. 81, 316–326 (2003).

Google Scholar

Sablani, S. S., Baik, O. D. & Marcotte, M. Neural networks for predicting thermal conductivity of bakery products. J. Food Eng. 52, 299–304 (2002).

Google Scholar

Aksenova, O. I. & Alexeev, G. V. The effect of the concentration of the fish processing offal powder, of the humidity level and of the cross-sectional area of the die molding channel on the technological parameters of the extrusion process and on the quality characteristics of potato snacks. IOP Conf. Ser. Mater. Sci. Eng. 753, 082006 (2020).Alam, M. S., Pathania, S. & Sharma, A. Optimization of the extrusion process for development of high fibre soybean-rice ready-to-eat snacks using carrot pomace and cauliflower trimmings. LWT 74, 135–144 (2016).CAS

Google Scholar

Alemayehu, H., Emire, S. A. & Henry, C. Effects of extrusion process parameters on the quality properties of ready-to-eat pulse-based snacks. Cogent Food Agric. 5, 1641903 (2019).

Google Scholar

Shihani, N., Kumbhar, B. K. & Kulshreshtha, M. Modeling of extrusion process using response surface methodology and artificial neural networks. J. Eng. Sci. Technol. 1, 31–40 (2006).

Google Scholar

Fan, F. H. et al. Prediction of texture characteristics from extrusion food surface images using a computer vision system and artificial neural networks. J. Food Eng. 118, 426–433 (2013).

Google Scholar

León-Roque, N., Abderrahim, M., Nuñez-Alejos, L., Arribas, S. M. & Condezo-Hoyos, L. Prediction of fermentation index of cocoa beans (Theobroma cacao L.) based on color measurement and artificial neural networks. Talanta 161, 31–39 (2016).PubMed

Google Scholar

Zhu, H. et al. Application of machine learning algorithms in quality assurance of fermentation process of black tea-based on electrical properties. J. Food Eng. 263, 165–172 (2019).CAS

Google Scholar

Yildiz, F. Initial preparation, handling, and distribution of minimally processed refrigerated fruits and vegetables in Minimally Processed Refrigerated Fruits & Vegetables, 15–65 (Springer, 1994).Kseibat, D. S., Mittal, G. S. & Basir, O. A. Predicting safety and quality of thermally processed canned foods using a neural network. Trans. Inst. Meas. Control 26, 55–68 (2004).

Google Scholar

Zhang, Z. et al. Deep learning in omics: a survey and guideline. Brief. Funct. Genomics 18, 41–57 (2019).CAS

PubMed

Google Scholar

Mittal, G. S. & Zhang, J. Prediction of temperature and moisture content of frankfurters during thermal processing using neural network. Meat Sci. 55, 13–24 (2000).CAS

PubMed

Google Scholar

Mittal, G. S. & Zhang, J. (2000). Prediction of freezing time for food products using a neural network. Food Res. Int. 33, 557–562 (2000).Mittal, G. S. & Zhang, J. Use of artificial neural network to predict temperature, moisture, and fat in slab‐shaped foods with edible coatings during deep‐fat frying. J. Food Sci. 65, 978–983 (2000).CAS

Google Scholar

Mohebbi, M., Shahidi, F., Fathi, M., Ehtiati, A. & Noshad, M. Prediction of moisture content in pre-osmosed and ultrasounded dried banana using genetic algorithm and neural network. Food Bioprod. Process. 89, 362–366 (2011).

Google Scholar

Gouyo, T. et al. Assessment of acoustic-mechanical measurements for texture of French fries: comparison of deep-fat frying and air frying. Food Res. Int. 131, 108947 (2020).CAS

PubMed

Google Scholar

Khan, M. I. H., Sablani, S. S., Nayak, R. & Gu, Y. Machine learning‐based modeling in food processing applications: state of the art. Compr. Rev. Food Sci. Food Saf. 21, 1409–1438 (2022).PubMed

Google Scholar

Trienekens, J. & Zuurbier, P. Quality and safety standards in the food industry, developments and challenges. Int. J. Prod. Econ. 113, 107–122 (2008).

Google Scholar

Patel, K. K., Kar, A., Jha, S. N. & Khan, M. A. Machine vision system: a tool for quality inspection of food and agricultural products. J. Food Sci. Technol. 49, 123–141 (2012).PubMed

Google Scholar

Dixit, Y. et al. Non-invasive spectroscopic and imaging systems for prediction of beef quality in a meat processing pilot plant. Meat Sci. 181, 108410 (2021).CAS

PubMed

Google Scholar

Sun, D. W. (ed.) Computer Vision Technology for Food Quality Evaluation (Academic, 2016).Ruiz-Altisent, M. et al. Sensors for product characterization and quality of specialty crops—a review. Comput. Electron. Agric. 74, 176–194 (2010).

Google Scholar

Pasquini, C. Near infrared spectroscopy: a mature analytical technique with new perspectives—a review. Anal. Chim. Acta 1026, 8–36 (2018).CAS

PubMed

Google Scholar

Ireri, D., Belal, E., Okinda, C., Makange, N. & Ji, C. A computer vision system for defect discrimination and grading in tomatoes using machine learning and image processing. Artif. Intell. Agric. 2, 28–37 (2019).

Google Scholar

Kanade, A. & Shaligram, A. Prepackaging sorting of guava fruits using machine vision based fruit sorter system based on k-nearest neighbor algorithm. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 3, 2456–3307 (2018).

Google Scholar

Prakasa, E. et al. Automatic region-of-interest selection for corn seed grading. In Proc. 2017 International Conference on Computer, Control, Informatics and its Applications (IC3INA), 23–28 (IEEE, 2017).Septiarini, A., Hamdani, H., Hatta, H. R. & Kasim, A. A. Image-based processing for ripeness classification of oil palm fruit. In Proc. 2019 5th International Conference on Science in Information Technology (ICSITech), 23–26 (IEEE, 2019).Aiadi, O., Kherfi, M. L. & Khaldi, B. Automatic date fruit recognition using outlier detection techniques and Gaussian mixture models. Electron. Lett. Comput. Vis. Image Anal. 18, 52–75 (2019).

Google Scholar

Yu, X., Tang, L., Wu, X. & Lu, H. Nondestructive freshness discriminating of shrimp using visible/near-infrared hyperspectral imaging technique and deep learning algorithm. Food Anal. Methods 11, 768–780 (2018).

Google Scholar

Barbon, S., Costa Barbon, A. P. A. D., Mantovani, R. G. & Barbin, D. F. Machine learning applied to near-infrared spectra for chicken meat classification. J. Spectrosc. https://doi.org/10.1155/2018/8949741 (2018).Geronimo, B. C. et al. Computer vision system and near-infrared spectroscopy for identification and classification of chicken with wooden breast, and physicochemical and technological characterization. Infrared Phys. Technol. 96, 303–310 (2019).CAS

Google Scholar

Yang, Y. et al. Evaluation of broiler breast fillets with the woody breast condition using expressible fluid measurement combined with deep learning algorithm. J. Food Eng. 288, 110133 (2021).

Google Scholar

Morey, A., Smith, A. E., Garner, L. J. & Cox, M. K. Application of bioelectrical impedance analysis to detect broiler breast filets affected with woody breast myopathy. Front. Physiol. 11, 808 (2020).PubMed

PubMed Central

Google Scholar

Siddique, A. et al. Acceptability of artificial intelligence in poultry processing and classification efficiencies of different classification models in the categorisation of breast fillet myopathies. Front. Physiol. 12, 712649 (2021).Siddique, A., Freeman, R. & Morey, A. Microwave Analysis of Broiler Breast Meat in Conjunction with Singularity Value Decomposition to Categorize Myopathic Fillets (Poultry Sciences Association, 2021).Penning, B. W., Snelling, W. M. & Woodward-Greene, M. J. Machine learning in the assessment of meat quality. IT Prof. 22, 39–41 (2020).

Google Scholar

Alaiz-Rodríguez, R. & Parnell, A. C. A machine learning approach for lamb meat quality assessment using FTIR spectra. IEEE Access 8, 52385–52394 (2020).

Google Scholar

Al-Sarayreh, M., Reis, M. M., Yan, W. Q. & Klette, R. Potential of deep learning and snapshot hyperspectral imaging for classification of species in meat. Food Control 117, 107332 (2020).CAS

Google Scholar

Fabris, A. et al. PTR‐TOF‐MS and data‐mining methods for rapid characterisation of agro industrial samples: influence of milk storage conditions on the volatile compounds profile of Trentingrana cheese. J. Mass Spectrom. 45, 1065–1074 (2010).Dankowska, A. & Kowalewski, W. Comparison of different classification methods for analyzing fluorescence spectra to characterize type and freshness of olive oils. Eur. Food Res. Technol. 245, 745–752 (2019).CAS

Google Scholar

Yu, X., Wang, J., Wen, S., Yang, J. & Zhang, F. A deep learning based feature extraction method on hyperspectral images for nondestructive prediction of TVB-N content in Pacific white shrimp (Litopenaeus vannamei). Biosyst. Eng. 178, 244–255 (2019).

Google Scholar

Page, M. J. et al. The PRISMA 2020 statement: an updated guideline for reporting systematic reviews. BMJ 372, n71 (2021).Download referencesAcknowledgementsI would like to acknowledge all the contributors of this manuscript for their support in the preparation of the manuscript. This research received no external funding. This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.Author informationAuthors and AffiliationsDepartment of Poultry Science, Auburn University, Auburn, AL, USAAftab Siddique, Tung-Shi Huang & Amit MoreyDepartment of Business Analytics and Information, Auburn University, Auburn, AL, USAAshish GuptaDepartment of Animal Sciences, Auburn University, Auburn, AL, USAJason T. SawyerAuthorsAftab SiddiqueView author publicationsYou can also search for this author inPubMed Google ScholarAshish GuptaView author publicationsYou can also search for this author inPubMed Google ScholarJason T. SawyerView author publicationsYou can also search for this author inPubMed Google ScholarTung-Shi HuangView author publicationsYou can also search for this author inPubMed Google ScholarAmit MoreyView author publicationsYou can also search for this author inPubMed Google ScholarContributionsConceptualization: A.S.; methodology: A.S., A.M., J.T.S., A.G., T.S.H.; investigation: A.S.; writing—original draft preparation: A.S.; writing—review and editing: A.M., A.G., T.S.H. and J.T.S.; supervision: A.M. All authors have read and agreed to the published version of the manuscript.Corresponding authorCorrespondence to

Amit Morey.Ethics declarations

Competing interests

The authors declare no competing interests.

Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissionsAbout this articleCite this articleSiddique, A., Gupta, A., Sawyer, J.T. et al. Big data analytics in food industry: a state-of-the-art literature review.

npj Sci Food 9, 36 (2025). https://doi.org/10.1038/s41538-025-00394-yDownload citationReceived: 20 August 2024Accepted: 18 February 2025Published: 21 March 2025DOI: https://doi.org/10.1038/s41538-025-00394-yShare this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

Read full news in source page