Managing mining project documentation using human language technology

DOIhttps://doi.org/10.1108/EL-11-2017-0239
Date10 December 2018
Published date10 December 2018
Pages993-1009
AuthorAleksandra Tomašević,Ranka Stanković,Miloš Utvić,Ivan Obradović,Božo Kolonja
Subject MatterInformation & knowledge management,Information & communications technology,Internet
Managing mining project
documentation using human
language technology
Aleksandra Tomaševi
cand Ranka Stankovi
c
Faculty of Mining and Geology, University of Belgrade, Belgrade, Serbia
MilošUtvi
c
Faculty of Philology, University of Belgrade, Belgrade, Serbia, and
Ivan Obradovi
cand Božo Kolonja
Faculty of Mining and Geology, University of Belgrade, Belgrade, Serbia
Abstract
Purpose This paper aimsto develop a system, which would enable efcient managementand exploitation
of documentation in electronic form, related to mining projects, with information retrieval and information
extraction(IE) features, using various language resourcesand natural language processing.
Design/methodology/approach The system is designed to integrate textual, lexical, semantic and
terminologicalresources, enabling advanced document searchand extraction of information. These resources
are integratedwith a set of Web services and applications, for differentuser proles and use-cases.
Findings The use of the system is illustratedby examples demonstrating keyword search supportedby
Web query expansion services,search based on regular expressions, corpus search based on localgrammars,
followed by extraction of information based on this search and nally, search with lexical masks using
domain and semanticmarkers.
Originality/value The presented system is the rst software solution for implementation of human
language technology in management of documentation from the mining engineering domain, but it is also
applicable to other engineering and non-engineering domains. The system is independent of the type of
alphabet (Cyrillic and Latin), which makes it applicable to other languages of the Balkan region related to
Serbian, and its support for morphological dictionaries can be applied in most morphologically complex
languages, such as Slavic languages.Signicant search improvements and the efciency of IE are based on
semanticnetworks and terminology dictionaries, with the support of local grammars.
Keywords Digital libraries, Information retrieval, Data mining, Human language technologies,
Project documentation
Paper type Research paper
Introduction
Mining, as a multidisciplinary branch of industry, uses and generates documentation in
various forms: textual, numerical, graphic and cartographic, multimedia and sometimes
even for near real-time operation.Mining engineers are alreadyoverburdened with extensive
documentation and data, so an efcient system for managing mining documentation
becomes an important issue with impacton productivity and quality of work andthe speed
and timelinessof making good businessdecisions in the miningindustry.
When it comes to documentation in Serbian, a specic feature of the mining documentation,
apart from multimedia content, is the fact that textual documentation is stored in two equally
Managing
mining project
documentation
993
Received14 November 2017
Revised7 February 2018
Accepted15 March 2018
TheElectronic Library
Vol.36 No. 6, 2018
pp. 993-1009
© Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-11-2017-0239
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
used alphabets: Cyrillic and Latin. This is something standard computer systems for handling
text documents basically do not support. The complex grammar of Serbian and different word
forms represents another challenge encountered every time documents in Serbian are being
searched. Other Slavic languages share similar issues.
Many of these problems can be resolved with the help of human language technology
(HLT). The goal of HLT is to master all forms of written and spoken language using
information technology (IT), thus helping people to cooperate, do business, exchange
knowledge and participate in politicaland social debates, regardless of language barriers or
their IT skills. HLT consists of a number of basic applications that allow the processing of
languages within the framework of the wider programming systems (Vitas et al.,2012).
Within the University of Belgrade HLT Group, a number of HLT tools and resources for
Serbian have been developed, among them comprehensive electronic morphological
vocabularies and corpora covering Serbian general lexica, while the development of
resources for specic domainsis an ongoing task.
In the past two decades, there has been considerable research related to managing
documentation using HLT, but the results in the eld of mining have been scarce. It has
been argued that document management in complex systems, such as e-government, may
be improved with the supportof semantic technologies (Stojanovic et al., 2005). The authors
propose new methods for semantic service annotation and discovery to improve the
usability of e-governmentservices, focussing on novel functionalities,such as verication of
service annotation and renement of searchresults. According to Zantout and Marir (1999)
the most important functions of current document management systems enable users to:
directly manipulate the documents,index and store to retrieve the documents, communicate
through the exchange of documents, collaborate around documents and model and
automate the ow of documents. The process for document management and computer-
assisted translation of documents described in Shreve (2002) used document corpora
constructed by intelligent agents, supported by a metalanguage to electronically tag the
source corpus. The corpus enhancementparameters are implemented as an intelligent agent
for searching external repositories to nd similar terms and structures and returning them
to the source corpora. Development of a controlledvocabulary for semantic interoperability
of mineral exploration geodata for mining projects (Ma et al., 2010) shows that a properly
organized controlled vocabulary allows for an efcient reconciliation of heterogeneous and
multi-source geodatain similar or related projects.
In this paper, the basic elementsof a system are outlined that enables efcient document
management in electronic form using HLT (Figure 1), illustrated by the example of mining
project documentationmanagement.
System overview
In view of the problems outlined and the possibilities offered by HLT, the research described in
this paper is aimed at developing a system which would enable efcient management of
documentation in electronic form, related to mining projects, with information retrieval (IR) and
information extraction (IE) features. In brief, IR implies withdrawal of texts that correspond to
an information request (user query), from a set of given texts (Manning et al., 2008), whereas the
task of IE involves analysing the information contained in a text, selecting them, marking them
and organizing them into structured data sets, such as ontologies and databases, for further
processing (Jurafsky and Martin, 2014). These two areas, IR and IE, although different, often
use the same resources, as well as tools.
The envisaged system should be independent of the alphabet (Cyrillic, Latin) and
support search in all grammaticalforms of the words given in a query. In addition, it should
EL
36,6
994

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT