nature.com

UrbanEV: An Open Benchmark Dataset for Urban Electric Vehicle Charging Demand Prediction

AbstractThe recent surge in electric vehicles (EVs), driven by a collective push to enhance global environmental sustainability, has underscored the significance of exploring EV charging prediction. To catalyze further research in this domain, we introduce UrbanEV — an open dataset showcasing EV charging space availability and electricity consumption in a pioneering city for vehicle electrification, namely Shenzhen, China. UrbanEV offers a rich repository of charging data (i.e., charging occupancy, duration, volume, and price) captured at hourly intervals across an extensive six-month span for over 20,000 individual charging stations. Beyond these core attributes, the dataset also encompasses diverse influencing factors like weather conditions and spatial proximity. Comprehensive experiments have been conducted to showcase the predictive capabilities of various models, including statistical, deep learning, and transformer-based approaches, using the UrbanEV dataset. This dataset is poised to propel advancements in EV charging prediction and management, positioning itself as a benchmark resource within this burgeoning field.

Background & SummaryIn recent times, there has been a notable surge in the popularity of electric vehicles (EVs), driven by the goals of diminishing reliance on fossil fuels, ameliorating air quality, addressing global warming concerns, and advancing the United Nations’ Sustainable Development Goals pertaining to carbon neutrality1,2. The Global EV Outlook for 2024, as documented by the International Energy Agency (IEA)3, highlights a persistent increase in electric car sales, projecting a potential volume of approximately 17 million units by 2024. This figure would constitute over one-fifth of the total global car sales. Despite the manifold advantages associated with the electrification of vehicles, this widespread adoption presents formidable obstacles to the dependability and robustness of urban power grids and transportation infrastructures. Amongst these limited charging spaces, particularly in dense cities, creates the most anxiety for EV users, resulting in critical problems such as long parking cruising time and additional traffic congestion4. Another illustration lies in the unbalanced electricity consumption caused by regular EV charging behaviors5,6. This fluctuation in demand during rush hour can impact the stability of the grid, potentially causing voltage fluctuations, frequency variations, and other issues that may affect the reliability of the power supply7,8. Given the ongoing expansion of EV charging infrastructure and the limited sustainability of relying solely on electricity to address associated challenges, many intelligent services have surfaced to manage EV charging demand9,10,11,12. These services include dynamic pricing mechanisms13, collaborative charging resource allocation strategies14,15, and parking guidance systems16.The prediction of electric vehicle charging demand, which serves as a cornerstone for enabling intelligent EV services, has garnered growing attention from academic and industrial communities17,18,19. Accurate predictions of regional EV charging demand can enable the provision of tailored parking suggestions to drivers in transit, alleviating range anxiety concerns, while regulatory bodies can implement dynamic pricing strategies to enhance energy efficiency. To achieve accurate prediction, related studies have attempted to leverage advanced data analysis techniques and predictive modeling to forecast the future charging demand for electric vehicles18,20. It typically involves analyzing historical charging data, considering factors such as the number of electric vehicles, their charging patterns, infrastructure availability, and external variables like weather conditions and time of day21. For example, a recent study investigated the performance of three conventional machine learning models, namely Long Short-Term Memory, Auto-Regressive Moving Average, and Multiple Layer Perceptron, in short-term EV charging forecasting22. Another in-depth research explored the effect of electricity price on electric vehicle charging demand by conducting correlation tests and estimating its price elasticity23. More recently inspired by the success of incorporating spatial information with temporal patterns in traffic prediction, spatial-temporal EV charging demand prediction has emerged as an attractive research topic in the literature. Representative examples include HSTGCN-EV24 and ChatEV25: The former one incorporated two heterogeneous graphs (i.e., a demand-based graph and a geographic graph) to improve predictive precision, while the latter one unified spatial and temporal factors within natural language and harnessed Large Language Models (LLMs) for regional EV charging prediction.Although relevant research continues to expand, a well-structured open-source benchmark dataset that includes a wide array of features and establishes standardized comparison settings for predicting EV charging demand is still absent. Existing studies face several critical limitations with their data. Table 1 illustrates the comparison of representative publicly available datasets from various aspects. Firstly, most of them rely solely on charging data or consider only a limited number of factors, neglecting a comprehensive assessment of other potential influences26,27,28,29. Secondly, although numerous temporal patterns crucial for EV charging demand prediction have been identified, the current datasets are insufficient for delving into spatial analysis in EV charging behaviors30,31,32,33. Lastly, the diverse settings observed across studies introduce substantial variations, hindering a fair comparison of new techniques, frameworks, and models within relevant research19. These limitations hinder the advancements of EV charging prediction and related intelligent services in the era of big data.Table 1 Comparison of representative open datasets associated with electric vehicle charging, where “#” denotes the number of certain headers, “/” denotes that the item is not specified, and EVSE refers to Electric Vehicle Supply Equipment.Full size tableTo fill the gap, we present UrbanEV, an open dataset of EV charging in Shenzhen, China. The dataset compiles comprehensive information for a total of 1,682 public charging stations with 24,798 charging piles, shown in Fig. 1. After applying various data processing techniques, we refine the dataset to encompass 1,362 charging stations with 17,532 public charging piles, making them well-suited for charging demand prediction. Specifically, it provides three charging data (i.e., occupancy, duration, and volume), four dynamic factors (i.e., electricity price, service price, weather conditions, and time of day), three spatial attributes (i.e., adjacency, distance, and coordinates), and four static coefficients (i.e., point of interest, area, pile number, and station number). Moreover, our dataset covers the period from 1 September 2022 to 28 February 2023, encompassing six months with hourly granularity. This level of detail enables the exploration of short-, mid-, and long-term forecasting scenarios. Lastly, based on the station-level information, we further group the data into traffic zones, offering a new perspective on exploring regional EV charging patterns. As shown in Fig. 2, stations located in a specific traffic zone are integrated, and an adjacency matrix among neighboring zones can be built correspondingly to represent the spatial relationship. Making this dataset publicly accessible is intended to equip researchers, policymakers, and industry practitioners with the essential information needed for the effective and sustainable management of EV charging. This initiative aligns with national priorities and contributes to the overarching global sustainability objectives.Fig. 1Spatial distribution of 1,682 public charging stations and 24,798 charging piles in the UrbanEV dataset.Full size imageFig. 2Data visualization of the filter areas. (a) illustrates the distribution of charging piles at the regional level. (b) provides an enlarged view of the CBD dynamic pricing areas. (c) depicts the node graph derived from the enlarged view, showcasing the center node of the enlarged region, its 1-hop and 2-hop neighbors, as well as their adjacency relationships. (d) presents the distance matrix (in meters) of the 1-hop neighbors of the center node in the enlarged region.Full size imageMethodsTo build a comprehensive and reliable benchmark dataset, we conduct a series of rigorous processes from data collection to dataset evaluation. The overall workflow sequentially includes data acquisition, data processing, statistical analysis, and prediction assessment. As follows please see detailed descriptions.Study area and data acquisitionShenzhen, a pioneering city in global vehicle electrification, has been selected for this study with the objective of offering valuable insights into electric vehicle (EV) development that can serve as a reference for other urban centers. This study encompasses the entire expanse of Shenzhen, where data on public EV charging stations distributed around the city have been meticulously gathered. Specifically, EV charging data was automatically collected from a mobile platform used by EV drivers to locate public charging stations. Through this platform, users could access real-time information on each charging pile, including its availability (e.g., busy or idle), charging price, and geographic coordinates. Accordingly, we recorded the charging-related data at five-minute intervals from September 1, 2022, to February 28, 2023. This data collection process was fully digital and did not require manual readings. Furthermore, to delve into the correlation between EV charging patterns and environmental elements, weather data (i.e., air temperature Ta, atmospheric pressure P, and relative humidity h) for Shenzhen city were acquired from two meteorological observatories situated in the airport and central regions, respectively. These meteorological data are publicly available on the Shenzhen Government Data Open Platform. Thirdly, point of interest (POI) data was extracted through the Application Programming Interface Platform of AMap.com, along with three primary types: food and beverage services, business and residential, and lifestyle services. Lastly, the spatial and static data were organized based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen34. The collected data contains detailed spatiotemporal information that can be analyzed to provide valuable insights about urban EV charging patterns and their correlations with meteorological conditions.Processing raw information into well-structured dataTo streamline the utilization of UrbanEV dataset, we harmonize heterogeneous data from various sources into well-structured data with aligned temporal and spatial resolutions. An overview of the descriptive statistics of the processed data is presented in Table 2. This process can be segmented into two parts: the reorganization of EV charging data and the preparation of other influential factors.Table 2 Data statistics of the UrbanEV Dataset. This table presents data across three dimensions: EV, Weather, and Others.Full size tableEV charging dataThe raw charging data, obtained from publicly available EV charging services, pertains to charging stations and predominantly comprises string-type records at a 5-minute interval. To transform this raw data into a structured time series tailored for prediction tasks, we implement the following three key measures:

Initial Extraction. From the string-type records, we extract vital information for each charging pile, such as availability (designated as “busy” or “idle”), rated power, and the corresponding charging and service fees applicable during the observed time periods. First, a charging pile is categorized as “active charging” if its states at two consecutive timestamps are both “busy”. Consequently, the occupancy within a charging station can be defined as the count of in-use charging piles, while the charging duration is calculated as the product of the count of in-use piles and the time between the two timestamps (in our case, 5 minutes). Moreover, the charging volume in a station can correspondingly be estimated by multiplying the duration by the piles’ rated power. Finally, the average electricity price and service price are calculated for each station in alignment with the same temporal resolution as the three charging variables.

Error Detection and Imputation. Data quality is crucial for decision-making, advanced analytics, and machine learning. Inaccuracies, often referred to as dirty data, can significantly undermine the reliability of analysis or modeling efforts35. To improve the quality of our charging data, we identified several errors, notably negative values for charging fees and inconsistencies between counts of occupied, idle, and total charging piles. Records containing these anomalies were removed and treated as missing data. A two-step imputation process was employed for missing values: forward filling replaced missing values using preceding timestamps, followed by backward filling to fill gaps at the beginning of each time series. Additionally, outliers, which could significantly impact prediction performance, were detected using the interquartile range (IQR) method36 for metrics such as charging volume (v), charging duration (d), and the rate of active charging piles (o). To retain more original data and minimize the impact of outlier correction, we set the coefficient to 4, instead of the default 1.5. Each outlier was then replaced with the mean of its adjacent valid values. This preprocessing pipeline transformed the raw data into a structured and analyzable dataset.

Aggregation and Filtration. Building upon the station-level charging data that has been extracted and cleansed, we further organize the data into a region-level dataset with an hourly interval providing a new perspective for EV charging behavior analysis. This is achieved by two major processes: aggregation and filtration. First, we aggregate all the charging data from both temporal and spatial views: a. Temporally, we standardize all time-series data to a common time resolution of one hour, as it serves as the least common denominator among the various resolutions. This aims to establish a unified temporal resolution for all time-series data, including pricing schemes, weather records, and charging data, thereby creating a well-structured dataset. Aggregation rules specify that the five-minute charging volume (v) and duration (d) are summed within each interval (i.e., one hour), whereas the occupancy (o), electricity price (pe), and service price (ps) are assigned specific values at certain hours for each charging pile. This distinction arises from the inherent nature of these data types: volume (v) and duration (d) are cumulative, while (o), (pe), and (ps) are instantaneous variables. Compared to using the mean or median values within each interval, selecting the instantaneous values of (o), (pe), and (ps) as representatives preserves the original data patterns more effectively and minimizes the influence of human interpretation. b. Spatially, stations and piles are aggregated based on the traffic zones delineated by the sixth Residential Travel Survey of Shenzhen34. After aggregation, the resulting dataset includes 331 regions (also referred to as traffic zones) and 4344 timestamps. Variance tests and zero-value filtering were then applied to exclude regions with negligible or no variation in charging data. Specifically, regions with an occupancy variance below 0.001 or with more than 30% zero values were removed. As a result, 275 traffic zones, comprising 1,362 charging stations and 17,532 charging piles, were retained for further analysis, as depicted in Fig. 2.

Other influential factorsApart from the EV charging data, we also constructed a set of variables that might influence charging behaviors37,38. These variables can be categorized into three classes, namely temporal factors, spatial attributes, and static features. First and foremost, the temporal factors include three weather conditions: air temperature (Ta), relative humidity (h), and atmospheric pressure (P). The raw weather data is collected from two meteorological observatories located in the airport and central regions of Shenzhen, and they were further organized into numeric data with the same hourly interval as the structured charging data. Notably, weather data is shared across all charging stations and traffic zones. Spatial information, including the adjacency matrix and distances, is computed using ArcGIS tools. Specifically, adjacency is determined by evaluating whether two traffic zones share a boundary, based on the distance between their geometric centers. Additionally, UrbanEV provides static features, such as Points of Interest (POI), area, and road length for each traffic zone. Only those relevant to charging activities within the 275 selected zones, aligned with the structured charging data, are retained.Data RecordsTo enable further in-depth predictive analyses by researchers, the 1-hour resolution region-level dataset is provided as the primary dataset, with the 5-minute resolution version also made available in the Dryad repository39, offering comprehensive access to time-series data at varying granularities. For consistency, all data are stored as comma-separated value (.csv) files along with their corresponding header descriptions stored in .txt files. Moreover, this dataset provides the geometry information of the studied areas in ArcGIS format (e.g., .shp, shx, and .dbf files). Lastly, we have developed a benchmarking code framework for EV charging forecasting, comprising program files or scripts written in Python (.py). Here is a detailed overview of these files:

(occupancy.csv, duration.csv, and volume.csv) provide the EV charging occupancy ratio, duration, and volume, in the studied areas, measured in %, hours, and kWh, respectively. The volume in volume.csv is derived from the rated power of charging piles and may deviate from actual charging volumes. Nevertheless, it serves as the foundation for validation in subsequent analyses, with volume-11kW.csv providing a vehicle-side estimation as an alternative.

(e_price.csv, s_price.csv) describe the electricity price and service fee, respectively, with a granularity of hour. Both of them are units in Yuan/hour.

(weather_central.csv and weather_airport.csv) store the weather data obtained from two different meteorological stations located in the central area and the airport of Shenzhen city, respectively. Their header information is presented in the file titled weather_header.txt.

(Shenzhen.shp, Shenzhen.shx, Shenzhen.dbf) store geographic information in Shenzhen city in ArcGIS format, using the WGS 1984 Albers projected coordinate system.

(adj.csv, distance.csv) depict the adjacency relationships between traffic zones, along with their respective distances. The distances are computed as the Euclidean distance between the centroids of the zones, measured in meters. In the adjacency file, a value of 1 indicates that two traffic zones are adjacent, otherwise 0.

(inf.csv) contains several basis information for each zone, including pile capacity, longitude, latitude, the area and perimeters of the zone (in meters).

(poi.csv) contains information about Points of Interest throughout the studied city, collected in December 2022.

(volume-11kW.csv) provides an alternative vehicle-side estimation of charging volume to mitigate potential overestimation in volume.csv. Specifically, for direct current charging stations, the volume is calculated using the standard power of the most commonly used electric vehicle, Tesla Model Y (11kW), instead of the rated power of the charging pile.

(main.py, models.py, utils.py, preprocessing.py) are the code files used in this work.

Technical ValidationIn order to validate UrbanEV’s efficacy in EV charging demand prediction, we are conducting a comprehensive benchmarking test covering forecasting methods specifically designed for EV charging demand as well as methods supporting general time-series forecasting tasks. It is noted that the validation relies on a one-hour resolution dataset. Through a thorough comparison and analysis of these baselines, we seek to address three crucial questions: First, Q1: Does the dataset effectively capture the temporal patterns in EV charging behaviors? Second, Q2: Can UrbanEV accurately depict the spatial interplay among different areas? Finally, Q3: Are the identified correlated factors instrumental in enhancing prediction accuracy?In this validation, we compare three traditional forecasting methods, five deep learning models, and two state-of-the-art Transformer-based predictors. The three conventional models include the last observation (LO), Auto-regressive (AR), and Auto-regressive Integrated Moving Average(ARIMA) model. The six deep learning models are listed as follows: a fully connected neural network (FCNN) is a classical network that has been used to capture the non-linearity in time series; Long Short-Term Memory (LSTM), a representative recurrent neural network, has been recently utilized for predicting electric vehicle charging demand22,40; Graph Convolutional Network (GCN), a typical graph learning model, has also been employed for electric vehicle (EV) forecasting tasks24. Expanding on the aforementioned achievements, there has been a recent integration of graph and recurrent models to enhance predictive performance for EV charging demand. Accordingly, GCN-LSTM41 as a hybrid model is included in the evaluation. Moreover, one advanced time-series forecasting method, namely the Attention-Based Spatial-Temporal Graph Convolutional Network (ASTGCN)42, is utilized as well in our study. Lastly, we evaluate two Transformer-based forecasting Models in our investigation, i.e., TimeNet43, and TimeXer44. Comparing and analyzing the performance of these baselines can assist us in evaluating whether UrbanEV can act as a competitive benchmark dataset for EV charging prediction tasks. This validation employed time-series cross-validation to address the temporal characteristics of a six-month dataset, which, accordingly, supports a six-fold approach. Specifically, each fold incrementally included one additional month of data, with 80% allocated to training and the remaining 20% equally divided between validation and testing sets. Model performance was assessed using four complementary metrics45,46: 1) Root Mean Squared Error (RMSE), 2) Mean Absolute Percentage Error (MAPE), 3) Relative Absolute Error (RAE), and 4) Mean Absolute Error (MAE). Finally, the evaluation objective is established as the distribution prediction (spatial-temporal prediction) for EV charging data, supplemented by experiments on node prediction to answer, and insights gained from the factorial experiment.Distribution PredictionThe results presented in Table 3 reveal several key observations regarding the predictive performance across different models. The performance can be categorized into three distinct categories, consistent with the baseline model classification.Table 3 Performance comparison of representative forecasting methods on the spatial-temporal occupancy prediction task using the UrbanEV dataset. The result showcases that models incorporating both spatial and temporal patterns can achieve superior predictive accuracy. This observation suggests that the UrbanEV dataset exhibits pronounced spatiotemporal dependencies within EV charging data.Full size table1) Statistical Models: This category includes models such as LO, AR, and ARIMA, which rely on simple linear transformations to capture temporal dynamics. These models are computationally efficient due to their simplicity but exhibit limited predictive accuracy compared to more advanced approaches. Their inability to model complex nonlinear patterns in the data constrains their utility in scenarios requiring high precision.2) Conventional Deep Learning Models: The second category encompasses models like Fully Connected Neural Networks (FCNN), Long Short-Term Memory (LSTM) networks, and Graph Convolutional Networks (GCN). These models incorporate nonlinear temporal modeling capabilities, leading to significant improvements in predictive accuracy over statistical methods. Additionally, leveraging spatial information-such as using the charging demand from surrounding regions-further enhances prediction performance. Models like GCN-LSTM and ASTGCN demonstrate the benefits of joint spatiotemporal feature modeling, achieving superior results by capturing complex dependencies across both dimensions.3) Transformer-Based Models: Representing the highest-performing category, Transformer-based architectures dynamically capture intricate spatiotemporal interactions, effectively addressing limitations observed in other methods. By leveraging attention mechanisms, these models exhibit a transformative potential for spatiotemporal prediction tasks, delivering state-of-the-art performance. Their ability to adaptively focus on relevant temporal and spatial features provides a robust framework for capturing the nuanced dynamics of EV charging demand.These findings underscore the pivotal role of accurately extracting temporal and spatial features to enhance forecasting accuracy. More importantly, the results demonstrate that the dataset exhibits pronounced spatiotemporal characteristics, and the application of nonlinear modeling techniques proves effective in predicting EV charging-related metrics. Compared to independently modeling temporal or spatial features, the joint modeling of spatiotemporal features significantly improves predictive performance. However, the performance differences among various spatiotemporal prediction models highlight the inherent complexity of the dataset’s spatiotemporal characteristics. This complexity necessitates the design of specialized models tailored to capture these intricate patterns effectively. Hence, we advocate for a deeper investigation into spatiotemporal forecasting models to unveil the underlying patterns in EV charging behaviors within the UrbanEV dataset.Node PredictionThe results presented in Table 4 highlight the performance of various models across three key metrics-charging occupancy (o), duration (d), and volume (v). First, the Transformer-based model, TimeXer, demonstrates superior performance among the models evaluated, achieving the lowest RMSE, MAPE, RAE, and MAE values across all metrics. Specifically, it achieves an RMSE of 0.07 for o, 2.73 for d, and 43.66 for v, significantly outperforming traditional statistical models (e.g., LO, AR, ARIMA) and deep learning methods (e.g., FCNN, LSTM). Second, the recurrent model, LSTM, shows competitive results to TimeXer for charging duration d and volume v and outperforms other compared models in most cases. These two observations indicate that the UrbanEV dataset can serve as a suitable and trustworthy benchmark dataset for EV charging prediction, as the compared models are appropriately ranked: namely, TimeXer > LSTM > others.Table 4 Performance comparison of six representative forecasting methods in node prediction. It is evident that the Transformer-based model, TimeXer, and the RNN-based model, LSTM, stand out with superior performance. This observation indicates that the charging data offered by UrbanEV encompasses ample temporal features.Full size tableFactorial experimentTo investigate the influence of the five features mentioned above both individually and in combination, we conducted factorial experiments in various feature groups, including pairwise combinations (i.e., P + ps, Ta + pe, h + Ta, and ps + pe) to assess whether joint factors affect charging occupancy. Additionally, we integrated all five features to explore whether the collective effect of external factors exhibits consistent influences on o. First, as presented in Table 5, it indicates that the inclusion of individual features yields minimal improvement in predicting EV charging demand and, in some cases, even deteriorates the prediction accuracy. However, combinations of features prove to be significantly more effective in enhancing demand forecasting. Notably, pairings that include pe and ps, such as Ta + pe and ps + pe, demonstrate the strongest auxiliary effects on prediction accuracy. This suggests that external factors like temperature and current charging costs influence users’ charging decisions. For example, extreme temperatures-whether hot or cold-reduce the likelihood of travel, subsequently lowering the demand for charging. Similarly, elevated electricity prices or service fees may prompt users to either seek alternative charging stations or forgo charging altogether.Table 5 Results of factorial experiments with ten factorial groups on EV charging demand. Utilizing the occupancy ratio in each traffic zone as the target data, our analysis indicates that individual features make a minimal contribution to prediction accuracy, and may even hinder performance in certain scenarios. In contrast, the combination of features, notably air temperature (Ta) and electricity price (pe), can lead to significant enhancements.Full size tableSecond, it can be observed that the combination of pe and ps is particularly impactful, as these factors collectively represent the total cost incurred during charging. The interplay between these two features effectively captures the influence of charging costs on user behavior, which cannot be fully captured by either feature alone. Consequently, the joint variation of pe and ps reflects users’ sensitivity to charging costs, making it a superior predictor compared to single-factor variations.Finally, integrating all five features to assist in predicting o does not necessarily improve the prediction accuracy. Although these features encompass various dimensions, such as weather conditions and price fluctuations, providing diverse information, the prediction performance can deteriorate if the model fails to effectively process these inputs. This highlights the importance and potential of developing advanced prediction models capable of handling multidimensional auxiliary information to further enhance forecasting accuracy.

Code availability

Our code used in this paper for the dataset setup, data analysis, and experiments can be found in a GitHub repository at (https://github.com/IntelligentSystemsLab/UrbanEV).

ReferencesTran, M., Banister, D., Bishop, J. D. & McCulloch, M. D. Realizing the electric-vehicle revolution. Nature climate change 2, 328–333 (2012).ADS 

Google Scholar 

Crabtree, G. The coming electric vehicle transformation. Science 366, 422–424 (2019).ADS 

CAS 

PubMed 

MATH 

Google Scholar 

Agency, I. E. Global ev outlook 2024, https://www.iea.org/reports/global-ev-outlook-2024 (2024).Pal, A., Bhattacharya, A. & Chakraborty, A. K. Planning of ev charging station with distribution network expansion considering traffic congestion and uncertainties. IEEE Transactions on Industry Applications 59, 3810–3825 (2023).MATH 

Google Scholar 

Hussain, M. T., Sulaiman, N. B., Hussain, M. S. & Jabir, M. Optimal management strategies to solve issues of grid having electric vehicles (ev): A review. Journal of Energy Storage 33, 102114 (2021).MATH 

Google Scholar 

Muratori, M. Impact of uncoordinated plug-in electric vehicle charging on residential power demand. Nature Energy 3, 193–201 (2018).ADS 

MATH 

Google Scholar 

Chen, Q. et al. Afml: An asynchronous federated meta-learning mechanism for charging station occupancy prediction with biased and isolated data. IEEE Transactions on Big Data 1–16, https://doi.org/10.1109/TBDATA.2024.3484651 (2024).You, L. et al. Fmgcn: Federated meta learning-augmented graph convolutional network for ev charging demand forecasting. IEEE Internet of Things Journal 11, 24452–24466, https://doi.org/10.1109/JIOT.2024.3369655 (2024).Article 

Google Scholar 

Al-Ogaili, A. S. et al. Review on scheduling, clustering, and forecasting strategies for controlling electric vehicle charging: Challenges and recommendations. Ieee Access 7, 128353–128371 (2019).MATH 

Google Scholar 

Vandet, C. A. & Rich, J. Optimal placement and sizing of charging infrastructure for evs under information-sharing. Technological Forecasting and Social Change 187, 122205 (2023).MATH 

Google Scholar 

Gaete-Morales, C., Kramer, H., Schill, W.-P. & Zerrahn, A. An open tool for creating battery-electric vehicle time series from empirical data, emobpy. Scientific data 8, 152 (2021).PubMed 

PubMed Central 

Google Scholar 

Barbar, M., Mallapragada, D. S., Alsup, M. & Stoner, R. Scenarios of future indian electricity demand accounting for space cooling and electric vehicle adoption. Scientific Data 8, 178 (2021).PubMed 

PubMed Central 

Google Scholar 

Zhao, Z. & Lee, C. K. Dynamic pricing for ev charging stations: A deep reinforcement learning approach. IEEE Transactions on Transportation Electrification 8, 2456–2468 (2021).MATH 

Google Scholar 

Aveklouris, A., Vlasiou, M. & Zwart, B. A stochastic resource-sharing network for electric vehicle charging. IEEE Transactions on Control of Network Systems 6, 1050–1061 (2019).MathSciNet 

MATH 

Google Scholar 

Ji, N., Zhu, R., Huang, Z. & You, L. An urban-scale spatiotemporal optimization of rooftop photovoltaic charging of electric vehicles. Urban Informatics 3, 4, https://doi.org/10.1007/s44212-023-00031-7 (2024).Article 

MATH 

Google Scholar 

Liu, S. et al. Reservation-based ev charging recommendation concerning charging urgency policy. Sustainable Cities and Society 74, 103150 (2021).

Google Scholar 

Zhang, X. et al. Deep-learning-based probabilistic forecasting of electric vehicle charging load with a novel queuing model. IEEE transactions on cybernetics 51, 3157–3170 (2020).ADS 

CAS 

MATH 

Google Scholar 

Yaghoubi, E., Yaghoubi, E., Khamees, A., Razmi, D. & Lu, T. A systematic review and meta-analysis of machine learning, deep learning, and ensemble learning approaches in predicting ev charging behavior. Engineering Applications of Artificial Intelligence 135, 108789 (2024).MATH 

Google Scholar 

Akshay, K., Grace, G. H., Gunasekaran, K. & Samikannu, R. Power consumption prediction for electric vehicle charging stations and forecasting income. Scientific Reports 14, 6497 (2024).ADS 

CAS 

PubMed 

PubMed Central 

MATH 

Google Scholar 

Yi, Z., Liu, X. C., Wei, R., Chen, X. & Dai, J. Electric vehicle charging demand forecasting using deep learning model. Journal of Intelligent Transportation Systems 26, 690–703 (2022).MATH 

Google Scholar 

Qu, H., Kuang, H., Wang, Q., Li, J. & You, L. A physics-informed and attention-based graph learning approach for regional electric vehicle charging demand prediction. IEEE Transactions on Intelligent Transportation Systems (2024).Wang, S. et al. Short-term electric vehicle charging demand prediction: A deep learning approach. Applied Energy 340, 121032 (2023).

Google Scholar 

Kuang, H. et al. Unraveling the effect of electricity price on electric vehicle charging behavior: A case study in shenzhen, china. Sustainable Cities and Society 115, 105836 (2024).MATH 

Google Scholar 

Wang, S., Chen, A., Wang, P. & Zhuge, C. Predicting electric vehicle charging demand using a heterogeneous spatio-temporal graph convolutional network. Transportation Research Part C: Emerging Technologies 153, 104205 (2023).MATH 

Google Scholar 

Qu, H. et al. Chatev: Predicting electric vehicle charging demand as natural language processing. Transportation Research Part D: Transport and Environment 136, 104470 (2024).

Google Scholar 

Orzechowski, A. et al. A data-driven framework for medium-term electric vehicle charging demand forecasting. Energy and AI 14, 100267 (2023).MATH 

Google Scholar 

Team, O.-D. Electric vehicle charging station usage. Perth and Kinross Open Data https://data.pkc.gov.uk/ (2020).Lee, Z. J., Li, T. & Low, S. H. ACN-Data: Analysis and Applications of an Open EV Charging Dataset. In Proceedings of the Tenth International Conference on Future Energy Systems, e-Energy ’19, https://ev.caltech.edu/dataset (2019).Obusevs, A., Domenico, D. D. & Korba, P. One year recordings of electric vehicle charging fleet. IEEE Dataport https://doi.org/10.21227/fkap-fr63 (2021).Baek, K., Lee, E. & Kim, J. A dataset for multi-faceted analysis of electric vehicle charging transactions. Figshare https://doi.org/10.6084/m9.figshare.22495141.v1 (2023).Baek, K., Lee, E. & Kim, J. A dataset for multi-faceted analysis of electric vehicle charging transactions. Scientific Data 11, 262 (2024).PubMed 

PubMed Central 

MATH 

Google Scholar 

Asensio, O. I., Lawson, M. C. & Apablaza, C. Z. Electric vehicle charging stations in the workplace with high-resolution data from casual and habitual users. Scientific Data 8, 168 (2021).PubMed 

PubMed Central 

Google Scholar 

Asensio, O. I., Lawson, M. C. & Apablaza, C. Z. High-resolution electric vehicle charging data from a workplace setting. Harvard Dataverse https://doi.org/10.7910/DVN/QF1PMO (2021).Zhou, J. & Ma, L. Analysis on the evolution characteristics of shenzhen residents? travel structure and the enlightenment of public transport development policy. Urban Mass Transit 24, 63–68, https://doi.org/10.16037/j.1007-869x.2021.07.014 (2021).Article 

MATH 

Google Scholar 

Chu, X., Ilyas, I. F., Krishnan, S. & Wang, J. Data cleaning: Overview and emerging challenges. In Proceedings of the 2016 international conference on management of data, 2201–2206 (Association for Computing Machinery, New York, NY, USA, 2016).Nouvellet, P. et al. Reduction in mobility and covid-19 transmission. Nature communications 12, 1–9 (2021).MATH 

Google Scholar 

Zhu, R. et al. Multi-sourced data modelling of spatially heterogenous life-cycle carbon mitigation from installed rooftop photovoltaics: A case study in singapore. Applied Energy 362, 122957, https://doi.org/10.1016/j.apenergy.2024.122957 (2024).Article 

CAS 

Google Scholar 

Schober, P., Boer, C. & Schwarte, L. A. Correlation coefficients: appropriate use and interpretation. Anesthesia & analgesia 126, 1763–1768 (2018).MATH 

Google Scholar 

Li, H. et al. UrbanEV: An open benchmark dataset for urban electric vehicle charging demand prediction. Dryad https://doi.org/10.5061/dryad.np5hqc04z (2025).Shanmuganathan, J., Victoire, A. A., Balraj, G. & Victoire, A. Deep learning lstm recurrent neural network model for prediction of electric vehicle charging demand. Sustainability 14, 10207 (2022).MATH 

Google Scholar 

Kim, H. J. & Kim, M. K. Spatial-temporal graph convolutional-based recurrent network for electric vehicle charging stations demand forecasting in energy market. IEEE Transactions on Smart Grid 15, 3979–3993, https://doi.org/10.1109/TSG.2024.3368419 (2024).Article 

MATH 

Google Scholar 

Guo, S., Lin, Y., Feng, N., Song, C. & Wan, H. Attention based spatial-temporal graph convolutional networks for traffic flow forecasting. In Proceedings of the AAAI conference on artificial intelligence, vol. 33, 922–929 (2019).Wu, H. et al. Timesnet: Temporal 2d-variation modeling for general time series analysis. In Proceedings of the Eleventh International Conference on Learning Representations (2023).Wang, Y. et al. Timexer: Empowering transformers for time series forecasting with exogenous variables. In Proceedings of the Thirty-eighth Annual Conference on Neural Information Processing Systems(2024).Li, J., Qu, H. & You, L. An integrated approach for the near real-time parking occupancy prediction. IEEE Transactions on Intelligent Transportation Systems 24, 3769–3778 (2022).MATH 

Google Scholar 

Esling, P. & Agon, C. Time-series data mining. ACM Computing Surveys (CSUR) 45, 1–34 (2012).MATH 

Google Scholar 

Download referencesAcknowledgementsThis work was supported in part by the National Key Research and Development Program of China(2023YFB4301900), Research Funds from the Department of Science and Technology of Guangdong Province(2021QN02S161), and the GuangDong Basic and Applied Basic Research Foundation (2023A1515012895).”Thank you for your time and support.Author informationAuthor notesThese authors contributed equally: Han Li, Haohao Qu.Authors and AffiliationsSun Yat-sen University, School of Intelligent Systems Engineering, Shen Zhen, 518107, ChinaHan Li, Xiaojun Tan & Linlin YouThe Hong Kong Polytechnic University, Department of Computing, Hong Kong, Hong KongHaohao Qu & Wenqi FanInstitute of High Performance Computing (IHPC), Agency for Science, Technology and Research (A*STAR), 138632, Queenstown, Republic of SingaporeRui ZhuAuthorsHan LiView author publicationsYou can also search for this author inPubMed Google ScholarHaohao QuView author publicationsYou can also search for this author inPubMed Google ScholarXiaojun TanView author publicationsYou can also search for this author inPubMed Google ScholarLinlin YouView author publicationsYou can also search for this author inPubMed Google ScholarRui ZhuView author publicationsYou can also search for this author inPubMed Google ScholarWenqi FanView author publicationsYou can also search for this author inPubMed Google ScholarContributionsL.Y. and H.Q. conceived the experiment(s), H.L. conducted the experiment(s), and H.Q. and H.L. analyzed the results. X.T. and R.Z. reviewed the manuscript and participated actively in its editing. All authors contributed to the development of the manuscript and provided their approval for the final version.Corresponding authorCorrespondence to

Linlin You.Ethics declarations

Competing interests

The authors declare no competing interests.

Additional informationPublisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissionsAbout this articleCite this articleLi, H., Qu, H., Tan, X. et al. UrbanEV: An Open Benchmark Dataset for Urban Electric Vehicle Charging Demand Prediction.

Sci Data 12, 523 (2025). https://doi.org/10.1038/s41597-025-04874-4Download citationReceived: 29 January 2025Accepted: 20 March 2025Published: 28 March 2025DOI: https://doi.org/10.1038/s41597-025-04874-4Share this articleAnyone you share the following link with will be able to read this content:Get shareable linkSorry, a shareable link is not currently available for this article.Copy to clipboard

Provided by the Springer Nature SharedIt content-sharing initiative

Read full news in source page