1. Introduction
Many governments are actively promoting the use of public transportation to further sustainable development and reduce greenhouse gas emissions. Bicycle-sharing systems play an essential role in covering the “last mile” of one’s daily commute. Unfortunately, the growing popularity of these systems often results in situations where some stations have no available bikes, while other stations have too many bikes for the available space. Considering the vast number of bicycles and stations in major urban centers, efficiently allocating resources poses a significant logistical challenge.
Researchers have identified a number of factors that hinder the efficiency of the YouBike system in Taipei. The use of fixed routes and real-time bike availability ratios to guide the process of bike redistribution [
1,
2] is insufficient to overcome a heavy reliance on experienced dispatchers to deal with contingencies, thereby rendering the system susceptible to time lags and decision bias. Efforts to balance the distribution of bikes among multiple stations are hindered by the fact that all decision-making is based entirely on current data without the ability to model behavioral patterns (e.g., weekly commuting), respond to sudden events (e.g., rainstorms), or anticipate future demand, all of which can have a profound impact on bike usage patterns.
Figure 1 and
Figure 2 illustrate the dramatic increase in crowd flow at the Huashan Cultural and Creative Park during special events and the corresponding YouBike usage at nearby stations.
In the current study, we applied LSTM models [
3,
4,
5,
6] to the analysis of crowd-related time series data in order to capture the dynamic features required to make accurate predictions of demand patterns associated with special events or sudden changes in the weather. We also sought to gain the cooperation of users in enhancing system efficiency. One solution involves having users upload their destination to a server tasked with optimizing the allocation of bicycles among stations. Another solution involves the provision of incentives to return bikes to under-supplied stations in order to alleviate the need for manual bike redistribution.
Nonetheless, implementing these solutions presented a number of challenges. Developing LSTM models for every YouBike station would be time-consuming and computationally intractable. Thus, we reduced the number of models by grouping stations according to similarities in usage patterns via K-means clustering [
7,
8,
9]. The inclusion of crowd-related data in a predictive model greatly increases complexity and variability, both of which can have a detrimental effect on accuracy. Thus, we employed a random forest genetic algorithm to simplify data inputs by identifying the crowd grids with the most pronounced influence on prediction outputs.
The efficacy of the proposed system was assessed in simulations of the public transportation system in Taipei, based on user and station data provided by YouBike. To avoid skewing the results due to the COVID-19 pandemic, this analysis was performed using data exclusively from 2019.
2. Datasets
The four datasets used in this study included the following: (A) YouBike user rental data for behavior prediction via the LSTM model, (B) YouBike station data to calculate deficiencies in the number of bike stations around destinations, (C) crowd data to enhance LSTM predictions, and (D) weather data to observe the influence of rainfall on YouBike usage behavior.
2.1. YouBike User Database
As shown in
Table 1, this dataset comprises 27,956,451 records of Taipei YouBike monthly rentals and user card usage (1 January–31 December 2019), including user card number, rental time, rental station ID, rental station, return time, return station ID, and return station.
2.2. YouBike Station Database
As shown in
Table 2, this dataset contains data related to 401 Taipei YouBike stations for the same period, including station ID, station name, total parking spaces, available bikes, station area, latitude, longitude, address, available return spaces, and operational status.
2.3. Crowd Dataset
This dataset includes crowd flow data for the Taipei area for the same period (2923 data points).
2.4. Weather Dataset
This dataset includes weather data for the Taipei area for the same period.
3. Methods and Results
This study sought to optimize the YouBike resource allocation system by implementing a two-phase process, including offline training and online operations.
Figure 3 presents a flow chart of the research process.
Offline processing involved the cleaning of rental and crowd data through the removal of outliers and the filling in of missing values to ensure the quality and reliability of the data. K-means clustering was then used to group the stations according to similarities in bike rentals with the aim of reducing the number of LSTM models and the time required for model training. After Z-score normalization and dimensionality reduction, the silhouette coefficients were used to determine the optimal number of clusters. As shown in
Figure 4, any given station could belong to different clusters at different times.
We then used the random forest algorithm to obtain importance scores for the surrounding crowd grid. High-scoring crowd grids were selected as input features for the prediction of bike rental numbers using the LSTM model. For example, for Station 37 (MRT Dongmen Stn, Exit 4), crowd grids E, F, and I generated the highest importance scores (0.166, 0.127, and 0.138) and were therefore employed for predictions (see
Figure 5).
Finally, LSTM and Bi-LSTM models were used to predict YouBike rental volumes. The performance of these models in generating predictions for three stations located near a school, MRT station, and hospital (Stations 248, 344, and 63) was evaluated using mean squared error (MSE) and root mean squared error (RMSE) values. The Bi-LSTM model with four hidden layers performed the best, as evidenced by RMSE values ranging from 1 to 3.4 (see
Table 3).
Online processing was performed by a genetic algorithm generating bike drop-off recommendations based on the destinations that had been input by users. In addition to the destination data, the algorithm took continuously generated LSTM prediction results of YouBike rental volumes for the next hour as inputs. The process of generating recommendations was a multi-step process, involving chromosome encoding, initial chromosome generation, fitness score calculation, chromosome selection, crossover, and mutation. Of course, this system would not work if the algorithm were unaware of the user’s destination; therefore, the proposed system also includes incentives to encourage users’ participation in the process of submitting their destinations and complying with the recommendations that they receive. Note that the incentive scheme is meant to reduce the costs of manual redistribution.
The genetic algorithm was optimized by running multiple iterations of a simulation to assess the effects of algorithm parameters, including the number of generations, mutation rate, and crossover rate. Stable results were achieved using 100 generations, and the best learning results were achieved using a crossover rate of 0.7 and a mutation rate of 0.07 (see
Figure 6,
Figure 7 and
Figure 8).
The performance of the proposed system was assessed quantitatively using simulations based on historical YouBike data in Da’an District, Taipei. As shown in
Figure 9, this evaluation involved calculating the number of stations requiring manual adjustments as a function of the user compliance rate (ω%).
The feasibility of the proposed system was also assessed qualitatively using real-world data from 10 users inputting their destinations at noon, with the distance between the recommended return station and the original destination used as a metric. In this hypothetical scenario, the system yielded an adaptation value of 23.1698 when recommending stations within a 600 m radius to ensure user convenience and system efficiency (see
Figure 10).
4. Discussion
This paper proposes a novel approach to the allocation of YouBikes under resource constraints. The K-means algorithm was first used to cluster YouBike stations with similar rental profiles in order to reduce the number of models required. A random forest model then selected the three crowd grid factors with the most pronounced influence on YouBike rental volumes for use as input variables for the LSTM prediction model. A genetic algorithm then determined the optimal YouBike station configuration and return station recommendations, while taking into account the destinations of users, station bike dock ratios, and the need to minimize the manual redistribution of bikes.
In simulations, the proposed vehicle allocation system was shown to meet user needs and enhance the operational efficiency of the YouBike system, while significantly reducing manual bike redistribution costs. The methods developed in this study have practical applicability in guiding decisions by YouBike managers and staff members.
Future research directions include the assessment of similar methods in other cities in order to verify the generalizability of this scheme. The inclusion of more historical data could help to refine the models even further and in so doing enhance the accuracy of the recommendations. Researchers should also explore other more complex heuristic algorithms and compare their performance with that of the current method.
Author Contributions
Conceptualization, Y.-C.C. and W.-T.C.; methodology, C.Y., C.-T.L., Y.-C.H. and Y.-H.T.; writing—original draft preparation, Y.-C.J.; data curation, Y.-C.J. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data that support the findings of this study were obtained from Youbike Company and are not publicly available. Access to these data requires a formal request and approval from Youbike Company. Interested readers should contact the corresponding author to initiate the application process.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Chen, S.-L. Personal Interview—The Unsung Heroes Behind Convenient YouBike Rentals: 24-Hour Shift Dispatchers. 2020. [Google Scholar]
- Yang, X.-H. 2022. Available online: https://news.ltn.com.tw/news/life/paper/1532787 (accessed on 25 November 2024).
- Zhang, Z.; Chen, P.; Wang, Y.; Yu, G. A hybrid deep learning approach for urban expressway travel time prediction considering spatial-temporal features. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 795–800. [Google Scholar]
- He, R.; Liu, Y.; Xiao, Y.; Lu, X.; Zhang, S. Deep spatio-temporal 3D densenet with multiscale ConvLSTM-Resnet network for citywide traffic flow forecasting. Knowl.-Based Syst. 2022, 235, 109054. [Google Scholar] [CrossRef]
- Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F. Traffic Flow Prediction with Big Data: A Deep Learning Approach. IEEE Trans. Intell. Transp. Syst. 2015, 16, 865–873. [Google Scholar] [CrossRef]
- Ma, X.; Tao, Z.; Wang, Y.; Yu, H.; Wang, Y. Long short-term memory neural network for traffic speed prediction using remote microwave sensor data. Transp. Res. Part C Emerg. Technol. 2015, 54, 187–197. [Google Scholar] [CrossRef]
- Neethu, B.N.; Jayanthy, S. Greenhouse Monitoring and Controlling using Modified K Means Clustering Algorithm. In Proceedings of the 2019 Third International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 12–14 December 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 456–462. [Google Scholar]
- Ho, C.C.; Ting, C.Y. Time series analysis and forecasting of dengue using open data. In Proceedings of the International Visual Informatics Conference, Bangi, Malaysia, 17–19 November 2015; Springer: Berlin/Heidelberg, Germany, 2015; pp. 51–63. [Google Scholar]
- Wang, Z.; Zhou, Y.; Li, G. Anomaly Detection by Using Streaming K-means and Batch K-means. In Proceedings of the 2020 5th IEEE International Conference on Big Data Analytics (ICBDA), Xiamen, China, 8–11 May 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 11–17. [Google Scholar]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).