Travel time prediction in transport and logistics. Towards more efficient vehicle GPS data management using tree ensemble methods

Pages277-306
DOIhttps://doi.org/10.1108/VJIKMS-11-2018-0102
Date12 August 2019
Published date12 August 2019
AuthorXia Li,Ruibin Bai,Peer-Olaf Siebers,Christian Wagner
Subject MatterInformation & knowledge management
Travel time prediction in
transport and logistics
Towards more efficient vehicle GPS data
management using tree ensemble methods
Xia Li and Ruibin Bai
School of Computer Science, University of Nottingham, Ningbo, China
Peer-Olaf Siebers
School of Computer Science, University of Nottingham, Nottingham, UK, and
Christian Wagner
University of Nottingham, Nottingham, UK
Abstract
Purpose Many transport and logistics companies nowadays use raw vehicle GPS data for travel time
prediction. However, they face difcult challenges in terms of the costs of information storage, as well as the
quality of the prediction. This paper aims to systematically investigate various meta-data (features) that require
signicantly less storage space but provide sufcient information for high-qualitytravel time predictions.
Design/methodology/approach The paper systematically studied the combinatorial effects of
features and different model tting strategies with two popular decision tree ensemble methods for travel
time prediction, namely, randomforests and gradient boosting regression trees. First,the investigation was
conducted usingpseudo travel time data that were generated using a pseudotravel time sampling algorithm,
which allows generating traveltime data using different noise processes so that the predictionperformance
under different travel conditions and noise characteristics can be studied systematically. The results and
ndings werethen further compared and evaluated througha real-life case.
Findings The paper provides empiricalinsights and guidelines about how raw GPS data can be reduced
into a small-sizedfeature vector for the purposes of vehicle travel time prediction.It suggests that, add travel
time observations from the previous departure time intervals are benecial to the prediction, particularly
when there is no other types of real-timeinformation (e.g. trafcow, speed) are available. It was alsofound
that modular model tting does notimprove the quality of the prediction in all experimental settings used in
this paper.
Research limitations/implications The ndings are primarilybased on empirical studies on limited
real-life data instances,and the results may lack generalisabilities. Therefore, the researchers are encouraged
to test them furtherin more real-life data instances.
Practical implications The paper includes implicationsand guidelines for the development of efcient
GPS data storageand high-quality travel time prediction underdifferent types of travel conditions.
Originality/value This paper systematicallystudies the combinatorial feature effectsfor tree-ensemble-
based traveltime prediction approaches.
Keywords Machine learning, Random forests, GPS data management, Gradient boosting,
Travel time prediction
Paper type Research paper
The authors acknowledge the nancial support from the National Natural Science Foundation of
China (71471092), Zhejiang Natural Science Foundation (LR17G010001), the International Doctoral
Innovation Centre, Ningbo Science and Technology Bureau (2014A35006, 2017D10034), Chinas
MoST and The University of Nottingham.
Travel time
prediction
277
Received19 November 2018
Revised24 February 2019
Accepted17 April 2019
VINEJournal of Information and
KnowledgeManagement Systems
Vol.49 No. 3, 2019
pp. 277-306
© Emerald Publishing Limited
2059-5891
DOI 10.1108/VJIKMS-11-2018-0102
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/2059-5891.htm
1. Introduction
Travel time is an essential metricfor transportation management. It is an essential index for
road network performance evaluation. For network authorities, accurate travel time
information can help make better trafc management strategies and tools. For individual
travellers, accurate travel time information can help make better travel plans (e.g.
transportation modes, departure time and route selection). It can also help public
transportation (e.g. bus, tram) and freight transportation service providers perform better
service planning and scheduling. We recognise that in some scenarios, travel time may be
closely correlated to other variables (e.g. transportation costs), in which cases, these
correlated variables should be analysed together. However, in this study, our focus is
primarily on the GPS data handling techniques for predicting xed-route vehicle travel
times, such as buses and trams, in which case the transportation costs bear no signicant
inuence on travel time. Analysing travel time independently can avoid adding further
complexity to an alreadychallenging problem. Vehicle GPS data has beenused as one of the
primary sources of information for travel time prediction. However, the sheer size of data
being produced daily is now posing a real challenge. For example, a city with about 5,000
taxis would produce over 30 million GPS log entries in the database on a daily basis.
Terabytes of data are generated each month. That is coupled with terabytes of video data
being added into the data storage on a regular basis. One way to address this problem is
extracting more representative features which can be later used for various trafc
management applications including travel time prediction. This feature extraction
procedure is also essential for knowledge management that it can reduce not only the
knowledge database storage size but also reduce the time cost of particular data analysis
tasks involved. Such extraction is no easy task as the correlation among features has to be
carefully examined.
Decision tree ensemble methods are becoming more and more popular owing to their
efciency and effectiveness of solving practical prediction and classiciation problems.
Random forests (RF) and gradient boosting regression trees (GBRT) are the two most
popular methods among them. BothRF and GBRT have been proven that they are capable
to solve prediction and classication problems in many other elds such as econometrics
(Varian, 2014;Guelman, 2012;),ecology (Cutler et al.,2007), bioinformatics (Díaz-Uriarte and
De, 2006;Ogutu et al., 2011). They have been recognised as one of the most successful
general-purpose algorithms in modern times (Howard and Bowles, 2012). Several studies
can be found using RF and GBRT for trafc and travel time predictions. However, the
discussion on the effect of different features on travel time prediction performance is very
limited. Although both RF and GBRT has feature importance analysis which helps to
identify important features to a certain degree, improper feature selection can have a
negative impact on the performancedue to combinatorial effects of features. Therefore, this
paper investigates travel time prediction performance by those two popular decision tree
ensemble methods, RF and GBRT, using different feature combinations. We rst conduct
the investigation using pseudo travel time data sets generated using a pseudo travel time
sampling algorithm (PTTS). PTTS allows generating travel time data using different noise
processes so that we can study the prediction performance under the effect of different
noises. Then, a case study is performed to verify the ndings based on the articially
generated data. Besides, modularbased model tting strategies are examined in both cases
to see if they are benecial to the prediction.
The rest of this paper is organised as follows:Section 2 offers a review of existing travel
time prediction methods; Section 3 provides a brief introduction of RF and GBRT, along
with the model tting approach for them; Section 4 introduces an algorithm for generating
VJIKMS
49,3
278

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT