A SURVEY OF MACHINE LEARNING MODELS FOR TRAFFIC MOVEMENT PREDICTION

In most developed cities globally, traffic congestion has become a major challenge to commuters and road users. In most of the urbanized nations, there are traffic gridlock at certain periods of the day (peak periods). Road users spend alot of time at these gridlocks, wasting a lot of working hours. This gridlock has also resulted to air pollution and accident. Many researchers have develoed different vehicular movement prediction models for better traffic prediction. In this paper, we surveyed different traffic prediction model for congestion management.


INTRODUCTION
Today's public transit relies more and more on precise journey time prediction technologies.In an emergency, these systems are essential for getting people to the closest places, such hospitals, schools, and airports, as soon as feasible.Additionally, by being aware of the anticipated trip duration, people may choose their method of transportation with understanding.There are three types of travel time prediction models: model-based, data-driven, and historical.Through the use of vehicle-to-vehicle communication, this technique gathered data from every car and analysed it with fuzzy logic.Congestion was minimised by modifying vehicle routes once the likelihood of congestion was calculated.But a significant flaw in the strategy was that it overlooked accidents, which are a significant contributor to traffic jams.To anticipate journey times, a variety of techniques are employed, such as parametric and non-parametric hybrid solutions.Synchronising traffic signals with past data enables the prediction of future traffic flow, ensuring smooth traffic movement across the city.One such approach is the real-time router travel time prediction technique, which uses historical travel times and current travel times (Liu and Wang, 2010).In order to prevent congestion, a lot of traffic avoidance techniques rely on machine learning and computational intelligence.This article presents many systems that use machine learning to manage vehicular networks.When transport networks are crowded, these schemes frequently use efficient rerouting techniques.A cooperative approach to identify and reduce traffic congestion was proposed by Araujo et al. (2014).Meneguette et al. (2015) presented an urban congestion detection method that combined artificial neural networks (ANN) with vehicle-to-vehicle communication.The system analysed speed and traffic density as inputs, identified and categorised the amount of congestion on the roadways, and informed all cars of the congestion.But in dense traffic, the system struggled with accuracy and was unable to identify incidents.Through vehicle-infrastructure connection, De Souza et al. (2015) suggested a cooperative rerouting method based on K-Nearest Neighbour to reduce traffic congestion and enhance traffic flow.The method's efficacy was limited by inadequate traffic distribution even though it could identify congestion, categorise traffic, and recommend other routes.
In order to prevent congestion, De Souza et al. (2016) created the CHIMAERA algorithm, which combines rerouting and traffic categorization.By utilising probabilistic K-shortest path algorithm and vehicle-to-infrastructure communication, it offered customers an alternate route that decreased average travel time, halting time, and distance.However, the range was impacted by the inadequate traffic balance.A convolutional neural network (CNN) technique for traffic management and congestion monitoring was created by Ma et al. (2017).Even though it operated efficiently and accurately on big networks, mistakes in the training process caused delays in the identification of congestion.A deep CNN model that employs supervised learning to determine the number of automobiles on a road strip from video pictures was presented by Chung and Sohn (2018).Meneguette et al. (2014) designed a system using Artificial Neural Networks (ANN) for traffic congestion detection.The system combines the INCIDENT protocol and ANN to identify congestion levels, categorize them, and suggest alternate paths.Notably, the system does not account for accident detection and may not be highly effective in dense traffic scenarios.A methodology based on the k-Nearest Neighbour (KNN) algorithm was created by De Souza et al. (2016).For congestion avoidance and detection, the system uses a rerouting algorithm (CHIMAERA) and a traffic categorization mechanism.Congested regions are identified using the k-NN method, which is integrated with CHIMAERA and employs average road speed and car concentration.This method struggles to maintain a balanced traffic distribution even while it lowers average travel and halt times.De Souza et al. (2015) proposed alternate paths for reducing congestion using the KNN approach as well.The average journey duration and traffic balance are enhanced by this cooperative method.Nevertheless, it has problems with scaling and is susceptible to noisy data.CNN converts traffic flow into imagery using 2-D time-space matrices.While effective in large networks with high accuracy, this approach demands longer training times.Hong (2011) devised a traffic flow prediction system using a Support Vector Model that integrates regression and chaotic simulation algorithms.While accurate, this model suffers from long response times and significant memory consumption during training.Ke et al. (2017) used information from UAV footage to create a k-means clustering-based system for traffic flow analysis.This system evaluates stream characteristics and efficiently determines the directions of traffic flow.Its usefulness is, however, restricted to situations involving curving roads and dense traffic.Rajesh and Vishu (2017) estimated traffic density using location sensors and KNN and ANN.The mean absolute percentage errors (MAPEs) of the model's estimate and prediction are less.But there are difficulties, such scaling problems and susceptibility to noisy data.A Deep CNN model for car counting based on video images was created by Chung and Sohn (2018).Although this system counts automobiles properly, it is unable to distinguish between stopped and moving vehicles.Zhang et al. (2017)  The dynamics and abrupt swings of road traffic was considered by employing segmentation technique and shortterm forecasting.It operates effectively in terms of speed and average journey time even when used heavily (Wang et al., 2015).In complicated urban traffic conditions, real-time route planning is made possible by the suggested multimetric ACO algorithm.Real-world traffic scenario attributes allow for the generation of real-time route designs by acting as pheromone values for the ACO algorithm.In addition, on branched roads, the similarity order preference method with the ideal solution technique is used to select the optimal route.Non-Parametric Approach RFNN (Random Forest based on Near Neighbor) method was fine-tuned and validated using actual data extracted from GPS, encompassing factors like location and time.The RFNN model was trained with real-world data to calculate bus travel times between adjacent stops, as proposed by Yu et al. (2018).In order to anticipate cab trip times, Gupta et al. (2018) presented an ensemble learning approach.A thorough examination of several variables was conducted, along with a thorough feature extraction process from the database.For training and performance evaluation, random forest and gradient boosting techniques were both used.Taxi demand, duration, trip length distribution, and frequently travelled places were among the factors that were taken into account.Statistical study has shown that gradient boosting strategy outperformed random forest in terms of prediction efficiency.For larger training data sets, random forest reacted rapidly and showed promise for improving accuracy; but, the lack of realtime data limited its effectiveness.The development of the Bayesian Algorithm Approach broadens the selection of prediction models based on different algorithms, and it is also utilised in the prediction of journey times (Hamner, 2010) Hamner ( 2010) created a model for predicting future travel times on certain road segments using a context-dependent random forest technique.To train the model, features corresponding to both aggregate and local traffic flow were collected.The number of moving and stopped automobiles on various routes was one of the car status details included in the dataset.Even though the algorithm performed well, it was proposed that using a more sophisticated model of local traffic flow may lead to even more improvement.Methods such as Support Vector Algorithms, PCA, and KNN were investigated.Another strategy uses vehicle-infrastructure communication to forecast journey time, which helps with traffic volume and density estimation.Jenelius and Koutsopoulos (2017) presented a probe databased approach for predicting trip times in metropolitan networks.Based on the current trip time and network parameters, the future journey time was calculated.By splitting the network up into smaller networks, the method examined each one using Probabilistic Principal Component Analysis (PPCA).The average journey time within each network was modelled as a function of link, time of day, and day of the week.An efficiency, flexibility faced was used for lacking data, and resilience technique.The study made use of data that was gathered over a four-month period from the Shenzhen Urban Transport Planning Centre in China.The outcomes demonstrated that, with a high degree of accuracy, PPCA prediction beat KNN prediction.In the context of the Internet of Vehicles (IoV), Tian (2018) presented a model for journey time prediction based on support vector machines (SVM) and artificial neural networks (ANN).The findings showed that the least square SVM trip time predictor model performed satisfactorily.Philip et al. (2018) devised a data-driven methodology that demonstrated improved precision in estimating trip time in situations when traffic data was scarce and highly variable.Using information gathered over an 11-week period from Bluetooth sensors positioned along an urban arterial corridor in Chennai, India, urban commute times were estimated.This data-driven method used the Support Vector Regression (SVR) model.When it came to performance, the SVR model beat the ANN.An intelligent congestion prediction architecture combining data fusion and ANN was proposed by Elleuch et al. (2016).This method took into account both past GPS data and current unanticipated occurrences, such accidents, that affect traffic jams.An ANN known as a Jordan's sequential network was created by More et al. (2016) to anticipate traffic on roads.In order to predict future traffic flow, our ANN model took into account both aggregated historical values and real-time traffic data..The performance of the network was evaluated based on accuracy and speed, indicating that the prediction relying on both real-time traffic and historical data was more accurate than using historical data alone.

Multilayer Perceptron (MLP)
The multilayer perceptron (MLP) is made up of connected neurons arranged in layers, each with an activation function.It performs essential calculations: forward-backward passes.In the forward step, the network processes data from a training set, computes output error, and in the backward step adjusts the weights based on the error.Xu et al. (2017) predicted traffic flow on a highway using toll data by employing MLP and random forest algorithms.Pamula (2018) trained an MLP/Autoencoder combination A SURVEY OF MACHINE LEARNING… Ihama and Amenaghawon FJS using the Levenberg-Marquardt method to forecast traffic flow in Gliwice, Germany.A further version of MLP is the Back Propagation Neural Network, or BPNN.Xu et al. (2018) used historical vehicle trajectories using BPNN to forecast trip-oriented travel times in metropolitan networks, noting daily and weekly changes in travel times caused by weather and other variables.However, temporal and geographical connections, climatic change, and unidentified random variables generating abrupt shifts made it difficult to anticipate journey times.Convolutional neural networks use convolution to extract picture information and have fewer inputs coupled to neurons, therefore lowering network variables.Artificial Neural Network model training is the process of continuously iterating the neural network error to minimise mistakes as much as feasible.A statistical optimisation procedure is used in the ANN model training to lower the errors.A linear mathematical equation is used to establish a conjunction between the independent variables of datasets and their corresponding weights in order to calculate a weighted sum in the ANN model.The ANN model's hidden layer accepts the weighted sum and applies an allocation function, this functions through the process by summing up each layer.In ANN network, there are several kinds of allocation functionalities, which are logsigmoid and tan-sigmoid; these transfer function are used for training and testing of datasets.In the procedure, the output layer of the ANN model receives the new weighted sum through the hidden neurons and applies an additional allocation function, the output layer then computes the sum and the transfer function process continues.The input layers of the ANN model are connected to the hidden layers, which is connected to the output layers.The layers in the neural network accept constant input parameters from the bias.The neural network input to a unit j is mathematically expressed as: +   ,  = 1(1),  = 1(1) (1) where   is the connection weight between layers  and ,   is the bias, and   is the output from the layer before it.Before the ANN training process begins, the weights of the artificial neural network will be chosen at random and initialised throughout the network model by following a range.The selected (i.e., target values) input vector (Xij) and the output vector (Yij) were compared.The difference between Xij and Yij will be used to determine the MSE's error function parameters (Loss function).Thus,  =

Machine Learning Systems Approach in Traffic Movement Prediction
The majority of studies used a variety of methodologies and strategies to integrate or apply various machine learning algorithms, with a focus on the literature review.2018) traffic flow may be predicted by employing k-NN and C4.5 models and network positions that resemble videos.(Kong et al., 2018 andTian et al., 2018) proposed a model that references traffic flow using the LSTM approach.Based on our observations of the examined literature, the majority of researchers often employ CNN and LSTM models, indicating the models' capacity to effectively anticipate traffic congestion.

Hybridized Machine Learning Methods for Traffic Movement Prediction
It has been demonstrated by several researchers that combining ensemble and hybrid approaches with a variety of ML techniques may yield useful results.In 2020, Ranjan and colleagues created a hybrid neural network including CNN, Reverse CNN, and LSTM, and employed a systematic approach for gathering data.Because they lack a memory feature, traditional ANNs cannot handle sequential and time series data.However, RNNs are capable of handling time series data/sequences, as well as temporal-spatial difficulties.Recurrent neural networks are dynamic systems that maintain an internal state at each classification time step.Circular connections between higher-and lower-layer neurons, as well as optional self-feedback connections, contribute to this phenomenon.These feedback links allow RNNs to transfer data from previous events to the present processing step.Thus, RNNs create a memory for time series events.

Architecture of RNN
Figure 1: The Architecture of RNN (Zhang et al., 2019) The RNN can retain patterns over extended periods of time and handle both short-and long-term time series data, Long Short Term Memory Networks (LSTM) evolved into an improved form of RNN.LSTM was utilised by Zhao et al. (2017), Shao and Soong (2016), to anticipate several traffic metrics on Beijing motorways utilising a variety of detectors and cameras as data sources.. Kang et al. (2017) used LSTM for traffic flow prediction with the Adams optimization algorithm for training.Liu et al. (2017) pointed out the unpredictable nature of traffic flow predictions due to external factors like accidents, road work, demand changes, and weather.They suggested a deep learning method with time series analysis to improve prediction accuracy, using the LSTM-DNN model.Their results showed better accuracy in short-term predictions, especially in single-step forecasts.Stacked LSTM involves layering multiple LSTM layers to create a deep neural network, allowing for learning features from raw time data at each step and spreading model variables throughout the model space without increasing memory, which speeds up convergence.RNN has been critiqued for its inefficiency in leveraging lengthy historical specifics (Lin et al., 2020 andTian et al., 2018).
LSTM Structure and Formula (Lin, et al., 2020) (Tian, et al., 2018) Figure 2: LSTM structure and formula The σ is the sigmoidal function, bj is the bias, and W is the matrices weight.The key element of LSTM is the cell state, the Ct-1 or is the memory from the previous block, which goes into Ct, which is the actual block from the memory.xt is the present input, while ht-1 is the previous output.

Gated Recurrent Unit (GRU)
The Long Short Term Memory (LSTM) network encounters the vanishing gradient problem, prompting the development of the Gated Recurrent Unit (GRU).Unlike LSTM, GRU lacks a dedicated memory block, rendering it more efficient and easier to train.In 2018, Zhang and Kabuka employed GRU for predicting traffic flow using sensor data from California roads.
−   ),  = 1(1),  = 1(1).(2) CNN was employed by Yu et al. (2017) as an algorithm to forecast traffic flow on a Beijing roadway.The training algorithm was Adam's optimization method.CNN was utilized by Liao et al. (2018) to forecast California's traffic patterns.The supervised stochastic gradient descent approach was used for training.Chung et al. (2018) trained a backward gradient propagation technique and employed deep CNN to forecast traffic density.The data was recorded using video detectors.
Tu et al. (2021) devised a technique called SG-CNN.To improve the training dataset, they grouped the road sections together and used the CNN approachZhang et al. (2021).Xia, D. (2020) created a multi-task learning perception for mining spatial and time-based data in various metropolises.A combination of distributed long short term memory (LSTM) was created by Romo et al. (2020) using a period frame and regular distribution that was built on the Map Decrease architecture.They investigated the relative merits of three procedures XGBoost, LSTM-NN, and CNN in order to create a framework based on machine learning techniques.In 2020, Abdelwahab et al. created a computerised system that is effective for classifying congestion.CNN and succinct visual aids were used in its design.An LSTM model was presented by Abdel-Wahab et al. (2020) to estimate IoT traffic using time data.To anticipate traffic flow Lin et al. (2020) created a novel LSTM model based on LSTM_SPLSTM.An LSTM model for traffic prediction based on historical and spatially inattentive data was created by Shin et al. in 2022.Elleuch et al. (2020) presented the Intelligent Traffic Congestion Forecast System, a neural network technique that utilises floating vehicle data (FCD).A different technique known as SSGRU was developed using the pivotal area on several road sections, Sun et al. (2020).Zafar and Haq, (2020) piloted a situation by studying and matching several ML models for traffic congestion forecasting centered on the ETA jamming index.Yi and Bui (2019) used data analysis from a vehicle detection scheme to create a deep neural network model based on LSTM.Yang et al. (2019) used an end-to-end neural network to create the C-LSTM LSTM model.A CNN-based technique called MF-CNN was created by Gao et al. (2019).Based on spatiotemporal structures, it was able to perform a significant network measure predicting of traffic movement.A model known as MRes-RGNN was created by Chen et al. (2019); it was composed of several residual repeating neural network graphs.Bartlett et al. (2019) carried out a comparison analysis to determine the effectiveness of three machine learning models k-NN, ANN, and SVR in predicting traffic flow.Xu et al. ( Liu et al. (2020) employed a combination of LSTM and GCN techniques to develop a traffic flow forecasting model.A deep-stacked LSTM model called DE-LSTM was created by Chou et al. (2019) anticipated traffic flow during peak and non-peak periods using LSTM model.Wang et al. (2019) LSTM and CNN are combined to produce a traffic flow forecast for metropolitan areas that is based on spatiotemporal parameters.Jin et al. (2018) developed a STRNC system by combining LSTM and CNN processes.The model was able to estimate traffic flow while simultaneously capturing temporal dependency.Using a mix of LSTM and CNN, Duan et al.A SURVEY OF MACHINE LEARNING… Ihama and Amenaghawon FJS FUDMA Journal of Sciences (FJS) Vol. 8 No. 4, August, 2024, pp 172-178 175 (2018) constructed a deep neural network technique together with a greedy model for spatiotemporal data mining.Recurrent Neural Network Approach Zhang et al. (2019) employed CNN to estimate traffic flow on a New York City route.Recurrent Neural Network (RNN): In order to overcome convergence challenges in standard neural networks, Wavelet Neural Network (WNN) approaches seek to discover clusters of wavelets in feature space to reflect complicated connections in the original signal.Using a multi-layer learning technique, Stacked AutoEncoder (SAE) integrates many autoencoders in hidden layers.The SAE method was used by Yang et al. (2017) to forecast the volume of traffic on roads.An extension of the RNN, the LSTM can handle historical data thanks to the addition of a memory cell.The "gate" is used by the memory component to add or remove data to the A SURVEY OF MACHINE LEARNING… Ihama and Amenaghawon FJS FUDMA Journal of Sciences (FJS) Vol. 8 No. 4, August, 2024, pp 172-178 cell state.As seen in Fig. 2. they used the control formula, the three primary gates that make up the LSTM structure are the input, output, and forget gates.Abbas et al. (2018) used a layered Long Short-Term Memory Algorithm to forecast traffic flow.A stacked LSTM algorithm was employed by Zou et al. (2018) to anticipate traffic flow on a New York roadway.They made advantage of the Bike NYC GPS data.

Figure 3 :
Figure 3: (GRU) module design (Wen et al., 2023) Similarly, Fu et al. (2016) used a time-based backpropagation method for training and employed LSTM and a closed-loop recurrent unit approach for traffic flow prediction.

Table 1 : Summary of literature review
(2017).Although They used webcam footage to estimate vehicle count and density using fully convolutional networks (FCN) the MAPEs are much reduced the model is not robust in anticipating traffic density 8 , Zonoozi et al. (2018) They created a Periodic Convolutional Recurrent Network (PCRN)

Table 1 ,
above shows literatures on different approaches to traffic prediction using machine learning models, the models strength and waeknessCONCLUSIONIn this paper a surveyed several literatures were discussed for different traffic movement prediction models.The RNN can retain patterns over extended periods of time and handle both short-and long-term time series data.The Long Short Term Memory Networks (LSTM) evolved in order to improve on the RNN.The Long Short Term Memory (LSTM) network encounters the vanishing gradient problem, which prompted the development of the Gated Recurrent Unit (GRU).Unlike LSTM, GRU lacks a dedicated memory block, making it more efficient and easier to train.But the LSTM involves layering, multiple LSTM layers is use to create a deep neural network, which allows feature learning from raw time data at each step and spreading model variables throughout the model space without increasing memory, which speeds up convergence.Vehicular movement can be improved by applying predictions models, as the model optimizes traffic congestion prediction at peak periods.The LSTM is more efficient with large dataset, while GRU is less efficient with large dataset, but better for small dataset.