Travel time prediction in transport and logistics. Towards more efficient vehicle GPS data management using tree ensemble methods
Pages | 277-306 |
DOI | https://doi.org/10.1108/VJIKMS-11-2018-0102 |
Date | 12 August 2019 |
Published date | 12 August 2019 |
Author | Xia Li,Ruibin Bai,Peer-Olaf Siebers,Christian Wagner |
Subject Matter | Information & knowledge management |
Travel time prediction in
transport and logistics
Towards more efficient vehicle GPS data
management using tree ensemble methods
Xia Li and Ruibin Bai
School of Computer Science, University of Nottingham, Ningbo, China
Peer-Olaf Siebers
School of Computer Science, University of Nottingham, Nottingham, UK, and
Christian Wagner
University of Nottingham, Nottingham, UK
Abstract
Purpose –Many transport and logistics companies nowadays use raw vehicle GPS data for travel time
prediction. However, they face difficult challenges in terms of the costs of information storage, as well as the
quality of the prediction. This paper aims to systematically investigate various meta-data (features) that require
significantly less storage space but provide sufficient information for high-qualitytravel time predictions.
Design/methodology/approach –The paper systematically studied the combinatorial effects of
features and different model fitting strategies with two popular decision tree ensemble methods for travel
time prediction, namely, randomforests and gradient boosting regression trees. First,the investigation was
conducted usingpseudo travel time data that were generated using a pseudotravel time sampling algorithm,
which allows generating traveltime data using different noise processes so that the predictionperformance
under different travel conditions and noise characteristics can be studied systematically. The results and
findings werethen further compared and evaluated througha real-life case.
Findings –The paper provides empiricalinsights and guidelines about how raw GPS data can be reduced
into a small-sizedfeature vector for the purposes of vehicle travel time prediction.It suggests that, add travel
time observations from the previous departure time intervals are beneficial to the prediction, particularly
when there is no other types of real-timeinformation (e.g. trafficflow, speed) are available. It was alsofound
that modular model fitting does notimprove the quality of the prediction in all experimental settings used in
this paper.
Research limitations/implications –The findings are primarilybased on empirical studies on limited
real-life data instances,and the results may lack generalisabilities. Therefore, the researchers are encouraged
to test them furtherin more real-life data instances.
Practical implications –The paper includes implicationsand guidelines for the development of efficient
GPS data storageand high-quality travel time prediction underdifferent types of travel conditions.
Originality/value –This paper systematicallystudies the combinatorial feature effectsfor tree-ensemble-
based traveltime prediction approaches.
Keywords Machine learning, Random forests, GPS data management, Gradient boosting,
Travel time prediction
Paper type Research paper
The authors acknowledge the financial support from the National Natural Science Foundation of
China (71471092), Zhejiang Natural Science Foundation (LR17G010001), the International Doctoral
Innovation Centre, Ningbo Science and Technology Bureau (2014A35006, 2017D10034), China’s
MoST and The University of Nottingham.
Travel time
prediction
277
Received19 November 2018
Revised24 February 2019
Accepted17 April 2019
VINEJournal of Information and
KnowledgeManagement Systems
Vol.49 No. 3, 2019
pp. 277-306
© Emerald Publishing Limited
2059-5891
DOI 10.1108/VJIKMS-11-2018-0102
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2059-5891.htm
1. Introduction
Travel time is an essential metricfor transportation management. It is an essential index for
road network performance evaluation. For network authorities, accurate travel time
information can help make better traffic management strategies and tools. For individual
travellers, accurate travel time information can help make better travel plans (e.g.
transportation modes, departure time and route selection). It can also help public
transportation (e.g. bus, tram) and freight transportation service providers perform better
service planning and scheduling. We recognise that in some scenarios, travel time may be
closely correlated to other variables (e.g. transportation costs), in which cases, these
correlated variables should be analysed together. However, in this study, our focus is
primarily on the GPS data handling techniques for predicting fixed-route vehicle travel
times, such as buses and trams, in which case the transportation costs bear no significant
influence on travel time. Analysing travel time independently can avoid adding further
complexity to an alreadychallenging problem. Vehicle GPS data has beenused as one of the
primary sources of information for travel time prediction. However, the sheer size of data
being produced daily is now posing a real challenge. For example, a city with about 5,000
taxis would produce over 30 million GPS log entries in the database on a daily basis.
Terabytes of data are generated each month. That is coupled with terabytes of video data
being added into the data storage on a regular basis. One way to address this problem is
extracting more representative features which can be later used for various traffic
management applications including travel time prediction. This feature extraction
procedure is also essential for knowledge management that it can reduce not only the
knowledge database storage size but also reduce the time cost of particular data analysis
tasks involved. Such extraction is no easy task as the correlation among features has to be
carefully examined.
Decision tree ensemble methods are becoming more and more popular owing to their
efficiency and effectiveness of solving practical prediction and classificiation problems.
Random forests (RF) and gradient boosting regression trees (GBRT) are the two most
popular methods among them. BothRF and GBRT have been proven that they are capable
to solve prediction and classification problems in many other fields such as econometrics
(Varian, 2014;Guelman, 2012;),ecology (Cutler et al.,2007), bioinformatics (Díaz-Uriarte and
De, 2006;Ogutu et al., 2011). They have been recognised as one of the most successful
general-purpose algorithms in modern times (Howard and Bowles, 2012). Several studies
can be found using RF and GBRT for traffic and travel time predictions. However, the
discussion on the effect of different features on travel time prediction performance is very
limited. Although both RF and GBRT has feature importance analysis which helps to
identify important features to a certain degree, improper feature selection can have a
negative impact on the performancedue to combinatorial effects of features. Therefore, this
paper investigates travel time prediction performance by those two popular decision tree
ensemble methods, RF and GBRT, using different feature combinations. We first conduct
the investigation using pseudo travel time data sets generated using a pseudo travel time
sampling algorithm (PTTS). PTTS allows generating travel time data using different noise
processes so that we can study the prediction performance under the effect of different
noises. Then, a case study is performed to verify the findings based on the artificially
generated data. Besides, modularbased model fitting strategies are examined in both cases
to see if they are beneficial to the prediction.
The rest of this paper is organised as follows:Section 2 offers a review of existing travel
time prediction methods; Section 3 provides a brief introduction of RF and GBRT, along
with the model fitting approach for them; Section 4 introduces an algorithm for generating
VJIKMS
49,3
278
To continue reading
Request your trial