Accelerate proposal generation in R-CNN methods for fast pedestrian extraction

Pages435-453
Date03 June 2019
Published date03 June 2019
DOIhttps://doi.org/10.1108/EL-09-2018-0191
AuthorJuncheng Wang,Guiying Li
Subject MatterInformation & knowledge management,Information & communications technology,Internet
Accelerate proposal generation in
R-CNN methods for fast
pedestrian extraction
Juncheng Wang
Institute of Scientic and Technical Information of China, Haidian-qu, China, and
Guiying Li
Department of Computer Science and Engineering,
Southern University of Science and Technology, Shenzhen, China
Abstract
Purpose The purpose of this study isto develop a novel region-based convolutional neural networks(R-
CNN) approach that is more efcientwhile at least as accurate as existing R-CNN methods. In this way, the
proposed method, namely R
2
-CNN, provides a more powerful tool for pedestrian extraction for person re-
identication,which involve a huge number of images and pedestrian needsto be extracted efciently to meet
the real-timerequirement.
Design/methodology/approach The proposed R
2
-CNN is tested on two types of data sets. The rst
one the USC Pedestrian Detectiondata set, which consists of three sub-sets USC-A, UCS-B and USC-C,with
respect to their characteristics. This data set is used to test the performance of R
2
-CNN in the pedestrian
extraction task. The speed and performance of the investigated algorithms were collected. The second data
set is the PASCAL VOC 2007 data set, which is a common benchmarkdata set for object detection. This data
set was used to analyze characteristicsof R
2
-CNN in the case of general objectdetection task.
Findings This study proposes a novel R-CNNmethod that is both more efcient and more accurate than
existing methods.The method, when used as an object detector, would facilitatethe data preprocessing stage
of person re-identication.
Originality/value The study proposesa novel approach for object detection, which shows advantagesin
both efciency and accuracy for pedestrian detection task. It contributes to both data preprocessing for
person re-identicationand the research on deep learning.
Keywords Object proposal, Object detection, Convolutional neural network, R-CNN,
Computational efciency, Deep learning
Paper type Research paper
Introduction
The recent revival of neural networks with a deep structure has marked new advances in
pattern recognition and computer vision (Gatyset al., 2015;He et al., 2015;Krizhevsky et al.,
2012;Szegedy et al., 2015). Among the deep learning techniques to have emerged in these
domains, region-based convolutional neural networks (R-CNN) (Girshick, 2015;Girshick
et al., 2014;Ren et al., 2015) have demonstrated encouraging performances, particularly in
terms of its accuracy in object detection tasks; as a consequence, R-CNN has attracted a
considerable amount of attention.In general, once an R-CNN is trained, its testing phase can
be viewed as containing two major procedures: region proposal generation and
classication, respectively.Given an input image (from which one or more objectsof interest
is to be detected), the region proposal generationprocedure produces a number of regions of
interests (RoIs) (Girshick, 2015;Ren et al., 2015), which may contain objects. Then, the
Methods for
fast pedestrian
extraction
435
Received29 September 2018
Revised27 November 2018
23December 2018
30April 2019
Accepted16 May 2019
TheElectronic Library
Vol.37 No. 3, 2019
pp. 435-453
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-09-2018-0191
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
classication procedure will classify each of the RoIs as either a specic object or
background.
Despite its outstanding performance in terms of detection accuracy, R-CNN may be
computationally demanding when processing a new image. Specically, this potential
drawback is largely due to the region proposal generation procedure, which typically
involves iteratively generating(and later classifying) a large number of RoIs to increase the
possibility of identifying all the objects of interest. The high computational cost of R-CNN
hinders the application of R-CNN methods to more signicant real-world applications that
involves object detection as a basic task. For example, person re-identication (Farenzena
et al.,2010), which is one of the most important problems in video surveillance, requires an
object detector to rst extract a pedestrianfrom raw images collected by a camera. The RoIs
(or sub-images) that cover thepedestrian will then be fed into a classier to tell whether the
person has been observed at another place (time) by another camera. In this case object
detection functions as a data acquisition/pre-processing step.Considering the huge number
of images that need to be handledby a video surveillance system, the efciency of the object
detector (e.g. an R-CNN) could easilybecome the bottleneck of the whole system.
In the literature (Girshick, 2015;Ren et al.,2015), a few works have looked at enhancing
the efciency of R-CNN. For example, fast R-CNN (Girshick, 2015) coupled the region
proposal generation and classication procedure to avoid explicitly generating cropped
patches on the original images. FasterR-CNN (Ren et al., 2015), on the other hand, explicitly
trains a region proposal network (RPN), whichwill be used in the testing phase to generate
all-region proposals as a whole. Alternatively, some other traditional (but less costly)
methods (Hosang et al.,2016), such as edge boxes (Dollár and Zitnick, 2015;Zitnick and
Dollár, 2014) couldalso be used to generate region proposals.
The aim of this paper is similar to those works mentioned above; that is, to reduce the
computational cost of a trained R-CNN in the testing phase. The key idea behind
the proposed method, namely Relief R-CNN(R
2
-CNN), is that once an R-CNN is well trained,
the network for classicationcontains useful information for generating region proposals.In
particular, the feature mapsof the network can provide hints as to where an object might be
located in an image. Hence, region proposals for a new image could be efciently generated
using such information contained in the trained network, while no iterative generation
process is needed.
The rest of this paper is organized as follows.The next section introduces the framework
of R-CNN and reviews the relevant works on accelerating R-CNN. The section named RoI
generation of R
2
-CNN presents the main idea, the detailed steps, and the time complexity
analysis of R
2
-CNN. The third contains an overview of the whole pipeline, with an
introduction about a renement technique called recursive ne-tuning. The fourth section
shows the details about the experimentalresults to compare R
2
-CNN with relevant methods.
The nal section concludes the paperwith discussions.
Related work
Given a trained R-CNN, when applying it to an input image for object detection, the R-CNN
rst extracts a set of category independent RoIs, and then adoptsa trained CNN to classify
each RoI into different objects or the background. R-CNN transfers the task of object
detection in an image to two subtasks:RoI generation and RoI classication, respectively.In
the context of R-CNN, a RoI is basically a rectangular sub-imageof the original input image,
from which objects are to be detected. To generate a RoI, R-CNN needs to determine the
location, width, and height of the rectangularsub-image. Suppose the input image is of size
wh, there are in total Xw
w1Xh
h1ww1þ1
ðÞ
hh1þ1
ðÞ
¼Ow2h2
ðÞ
distinct RoIs
EL
37,3
436

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT