Identification of operational demand in law enforcement agencies. An application based on a probabilistic model of topics

Date01 July 2019
Publication Date01 July 2019
AuthorMarcio Pereira Basilio,Valdecy Pereira,Gabrielle Brum
SubjectLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Information & knowledge management,Information & communications technology,Internet
Identification of operational
demand in law
enforcement agencies
An application based on a probabilistic
model of topics
Marcio Pereira Basilio, Valdecy Pereira and Gabrielle Brum
Federal Fluminense University, Rio de Janeiro, Brazil
Purpose The purpose of this paper is to develop a methodology for knowledge discovery in emergency
response service databases based on police occurrence reports, generating information to help law
enforcement agencies plan actions to investigate and combat criminal activities.
Design/methodology/approach The developed model employs a methodology for knowledge discovery
involving text mining techniques and uses latent Dirichlet allocation (LDA) with collapsed Gibbs sampling to
obtain topics related to crime.
Findings The method used in this study enabled identification of the most common crimes that occurred in
the period from 1 January to 31 December of 2016. An analysis of the identified topics reaffirmed that crimes
do not occur in a linear manner in a given locality. In this study, 40 per cent of the crimes identified in
integrated public safety area 5, or AISP 5 (the historic centre of the city of RJ), had no correlation with AISP 19
(Copacabana RJ), and 33 per cent of the crimes in AISP 19 were not identified in AISP 5.
Research limitations/implications The collected data represent the social dynamics of neighbourhoods
in the central and southern zones of the city of Rio de Janeiro during the specific period from January
2013 to December 2016. This limitation implies that the results cannot be generalised to areas with
different characteristics.
Practical implications The developed methodology contributes in a complementary manner to the
identification of criminal practices and their characteristics based on police occurrence reports stored in
emergency response databases. The generated knowledge enables law enforcement experts to assess,
reformulate and construct differentiated strategies for combating crimes in a given locality.
Social implications The production of knowledge from the emergency service database contributes to the
government integrating information with other databases, thus enabling the improvement of strategies to
combat local crime. The proposed model contributes to research on big data, on the innovation aspect and on
decision support, for it breaks with a paradigm of analysis of criminal information.
Originality/value The originality of the study lies in the integration of text mining techniques and LDA to
detect crimes in a given locality on the basis of the criminal occurrence reports stored in emergency response
service databases.
Keywords Big data, Crime, Police, Text mining, Topic model, Latent Dirichlet allocation
Paper type Research paper
1. Introduction
Over the past few years, numerous studies about crime identification topic have been
performed, seeking to understand the causes of crime and its variations (Agnew, 2016;
Sherman et al., 1989; Weisburd and Eck, 2004; Haberman, 2017) and to identify practices and
strategies to combat crime. There has been discussion aboutthe effectivenessof the preventive
and repressive strategies adopted by different countries to control crime (Sherman et al., 1998;
Braga, 2001). Sherman et al. (1998) evaluated strategies used in the USA, quantifying their
Data Technologies and
Vol. 53 No. 3, 2019
pp. 333-372
© Emerald PublishingLimited
DOI 10.1108/DTA-12-2018-0109
Received 29 December 2018
Revised 3 March 2019
Accepted 19 June 2019
The current issue and full text archive of this journal is available on Emerald Insight at:
The authors thank the Federal Fluminense University and the Military Police of the State of Rio de
Janeiro for the unrestricted support received for conducting the research, as well as the Coordenação de
Aperfeiçoamento de Pessoal de Nível Superior Brasil (CAPES) for the partial financing of the
research Finance Code 001.
of operational
effectiveness in terms of the results obtained. Other studies have addressed criminal analysis,
arguing thatcrime does not occur uniformly in citiesand that there are significant clusters of
crimes in places called hot spots. Several researchers have argued that crime can be reduced
efficiently if strategies are directed toward places with higher crime concentrations (Braga,
2005; Sherman and Weisburd, 1995).
In general, studies have identified four strategies used by law enforcement agencies in
different contexts:
(1) Standard model of policing: in this strategy, police activities are directed toward the
effects of crime, believing that a reduction in the crime rate is directly proportional to
the number of arrests, regardless of the type of crime committed (Bayley, 1994;
Weisburd and Eck, 2004).
(2) Community policing: the fundamental idea behind community policing is that
effective collaborative work between the police and the community can play an
important role in reducing crime and promoting safety (Skolnick and Bayley, 1986;
Weisburd et al., 1988). Community policing emphasises that the citizens themselves
are the first line of defence in the fight against crime.
(3) Problem-oriented policing: the foundation of this strategy is the idea of policing to
solve problems, that is thinking and analysis needed to understand the problem that
lies behind the occurrences to which police are called. This approach takes seriously
the idea that situations can give rise to crime and that this can be avoided
by changing the situations that seem to be causing the calls (Goldstein, 1990;
Braga et al., 1999).
(4) Hot spots policing: this strategy assumes that sending patrols to areas with higher
criminal activity will reduce crime rates (Braga, 2001; Sherman and Weisburd, 1995).
The categorisation of criminal occurrences is essential for constructing maps that indicate
areas with higher criminal activity in a given locality (Calvo et al., 2017). This information is
often used by law enforcement agencies in operational planning to combat crime. However,
the enormous amount of data from reports about a crimes circumstances, location,
physical characteristics and dynamics stored everyday by emergency response services
around the world is an unstructured data source that can provide information to support the
planning of police activities, which contributes to determine the appropriate strategy for a
given locality and in criminal investigations. This study, thus, sought to answer the
following question: how can the police occurrence reports made by emergency response
services contribute to select a strategy to combat crime in a given locality?
The aim of this study was to develop a methodology for knowledge discovery in
emergency response service databases based on police occurrence reports, generating
informationto help law enforcement agenciesplan actions to investigateand combat criminal
activities. The developed methodology was used to solve the problem of integrated text
mining techniques and latent Dirichlet allocation (LDA) (Blei et al., 2003), which was used to
obtain topics related to crime. The developed methodology was applied to the metropolitan
region of the state capital of Rio de Janeiro, Brazil, in collaboration with the local law
enforcement agency. Because of the developed model, ten topics were identified; after being
validated by experts, they were labelled the most prevalent crimes in the areas studied.
2. Background
This section presents the fundamental concepts that guided this study. The intention is not
to cover all the subjects, but rather to provide essential supporting information for
understanding the research, the context and the results.
2.1 An overview of text mining
Traditional databases store large sets of information in the form of structured records
(Feldman et al., 1998). Grounded in this finding of Feldman et al. (1998), one can begin to
address the concept of knowledge discovery, which, according to experts (Gurusamy et al.,
2002; Morais and Ambrósio, 2007), is defined as the non-trivial extraction of implicit, previously
unknown and potentially useful information from data. Most previous studies regarding
knowledge discovery concerned structured databases. A large part of the available information
does not appear in structured databases, but rather in a collection of text articles extracted from
different sources (Gurusamy et al., 2002; Chen et al., 2013). Based on this statement, it is possible
to observe that the knowledge discovery process is strongly related to the way information is
processed. The volume of available information is very large, and automatic processing
mechanisms tend to make the knowledge discovery process more efficient. It has, thus, become
necessary to automate this process, primarily by using software and computers. This need has
led to the rise of computer-supported knowledge discovery, which is a data or information
analysis process with the primary objective of enabling people to acquire new knowledge by
manipulating large amounts of data. Basically, there are two approaches used in this area:
knowledge discovery from structured data and knowledge discovery from unstructured data.
Knowledge discovery from structured data is applied in corporate databases, involving
methods and tools that were developed based on statistical methods, artificial intelligence
methods and information retrieval methods. Regarding knowledge discovery from
unstructured data, the analysis of data stored in an unstructured format can be considered a
more complex activity compared to the analysis of structured data because the data are not
structured. For this reason, specific techniques and tools are needed to treat this type of
data. These techniques and tools are also part of information retrieval, more specifically,
knowledge discovery from text (KDT). In practice, KDT is focussed on text mining
(Feldman and Dagan, 1995; Banu and Chitra, 2015).
Text mining is the process of discovering important information and textual data resources
(Chen et al.,2013;Tsenget al., 2007). Another definition found in the literature conceptualises
text mining as the application of computational methods and techniques to textual data to find
intrinsically relevant information and knowledge (Capuano, 2009; Nishanth et al., 2012; Araújo
Júnior and Tarapanoff, 2006). The origin of text mining is related to KDT. Currently, text mining
can be considered synonymous with KDT. Nomenclatures such as data mining in text or
knowledge discovery in textual databases (Correia and Gonçalves, 2017) can also be found in the
literature. Other terms that have been used as synonyms for text mining can be found in the
literature: information search (Mayr et al., 2017), undiscovered public knowledge (Kostoff et al.,
2008) and knowledge retrieval (Renu and Mocko, 2016). This fields main contributions are
related to searching for specific information in documents, analysing large volumes of texts
qualitatively and quantitatively, and obtaining a better understanding of texts available in
documents. These texts can be represented in very diverse forms, including e-mails, different file
formats (e.g. pdf, doc and txt), webpages, text fields in databases and electronic texts scanned
from paper. Structured text mining is found in fields of knowledge such as bibliometrics,
scientometrics, informetrics, mediametrics, museometrics and webometrics (Capuano, 2009).
Recently, text mining has become an important field of research. Since much of human
knowledge and history is stored in documents that contain text, texts are a rich deposit of
precious information. Depending on the type of document, different pieces of valuable
information are hidden (Chen et al., 2013). The importance of using the text mining
technique can be verified through the different applications and methods that have been
developed, according to Alwidian et al. (2015). Examples include news categorisation (Day
and Lee, 2016), patent retrieval (Liu et al., 2011), e-mail security (Dey et al., 2013), scientific
document retrieval (Kaur et al., 2010), theme detection (Ibekwe-SanJuan, 2006), document
sentiment analysis (Medhat et al., 2014; Xianghua et al., 2013), authorship identification
of operational

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT