Dynamic cataloguing of the old Arabic manuscripts by automatic extraction of metadata

Date19 June 2017
DOIhttps://doi.org/10.1108/LHT-07-2016-0076
Pages251-270
Published date19 June 2017
AuthorMohammed Ourabah Soualah,Yassine Ait Ali Yahia,Abdelkader Keita,Abderrezak Guessoum
Subject MatterLibrary & information science,Librarianship/library management,Library technology,Information behaviour & retrieval,Information user studies,Metadata,Information & knowledge management,Information & communications technology,Internet
Dynamic cataloguing of the old
Arabic manuscripts by automatic
extraction of metadata
Mohammed Ourabah Soualah and Yassine Ait Ali Yahia
Ecole Supérieure en Informatique,
Algiers, Algeria
Abdelkader Keita
University of Mali, Bamako, Mali, and
Abderrezak Guessoum
Université Saad Dahlab de Blida, Blida, Algeria
Abstract
Purpose The purpose of this paper is to obtain online access to the digitised Arabic manuscripts images,
which need to use a catalogue. The bibliographic cataloguing is unsuitable for old Arabic manuscripts, and it
is imperative to establish a new cataloguing model. In the research, the authors propose a new cataloguing
model based on manuscript annotations and transcriptions. This model can be an effective solution to
dynamic catalogue old Arabic manuscripts. In this field, the authors used the automatic extraction of the
metadata that is based on the structural similarity of the documents.
Design/methodology/approach This work is based on experimental methodology. The whole proposed
concepts and formulas were tested for validation. This, allows the authors to make concise conclusions.
Findings Cataloguing old Arabic manuscripts faces problem of unavailability of information. However,
this information may be found in another place in a copy of the original manuscript. Thus, cataloguing Arabic
manuscript cannot be done in one time, it is a continual process which require information updating. The idea
is to make a pre-cataloguing of a manuscript, then try to complete and improve it through a specific platform.
Consequently, in the research work, the authors propose a new cataloguing model, which the authors call
Dynamic cataloguing.
Research limitations/implications The success of the proposed model is confronted with the
involvement of all actors of the model. It is based on the conviction and the motivation of actors of
the collaborative platform.
Practical implications The model can be used in several cataloguing fields, where the encoding model is
based on XML. The model is innovative and implements a smart cataloguing model. The model is useful by
using a web platform. It allows an automatic update of a catalogue.
Social implications The model prompts the user to participate and enrich the catalogue. The user could
improve his social status from a passive to an active.
Originality/value The dynamic cataloguing model is a new concept. It has never been proposed in the
literature until now. The proposed cataloguing model is based on automatic extraction of metadata from user
annotations/transcription. It is a smart system which automatically updates or fills the catalogue with the
extracted metadata.
Keywords Transcription, Digital library, Annotations, Automatic extraction of metadata,
Dynamic cataloguing, Structural similarity
Paper type Research paper
1. Introduction
Arabic manuscript cataloguing has become an exciting field (Feodorov, 2006), and there are
several projects on the digitisation of Arabic manuscripts where the images were accessed
based on the catalogue. These projects aim at providing efficient online access to the
digitised resources. For example, the excellent works produced by the Bibliothèque
Nationale de France and by the Bodleian Library (Oxford University) in collaboration with
the Bibliotheca Alexandrina (Egypt) are well known. These solutions present comfortable
Library Hi Tech
Vol. 35 No. 2, 2017
pp. 251-270
© Emerald PublishingLimited
0737-8831
DOI 10.1108/LHT-07-2016-0076
Received 16 April 2016
Revised 8 February 2017
Accepted 16 February 2017
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0737-8831.htm
251
Old Arabic
manuscripts
and intuitive interfaces, but all of them present some weaknesses related to the access mode.
For instance, their notices fail to describe the manuscript content.
These institutions catalogue the digitised manuscripts using the following access modes:
free access dealing with the catalogue content and authority-list access.
The second access mode poses several problems caused by the lack of either the
authorsinformation or specification of the manuscripts. Sometimes, the same manuscript
presents several subjects which make using the headings-subject access mode difficult.
These aspects make it difficult to catalogue the manuscript using the authority list
(Soualah et al., 2012). This access mode is inspired from a classical cataloguing model,
which is called bibliographic cataloguing:
The bibliographic cataloguing of the old manuscripts consists of describing the manuscript in three
aspects: codicological aspect (material), palaeographic aspect (content) and historical aspect
(manuscript possessions).
Bibliographic cataloguing is based on the principle that the described document is stable.
With regards to the stability of the document, the document perception stays relatively
the same for various users over time. Thus, there is no new information added to the
bibliographic record of a catalogued document. Consequently, the cataloguing model is
related to the nature and stability of the document.
The manuscript is unique, and its cataloguing process is based on well-defined
metadata (codicological, palaeographic and historical). The philologist is interested in the
manuscript content and its authenticity (Auerbach, 1961), and his investigation may
require a long period to perform the research. Moreover, the manuscript cataloguing may
be incomplete; for example, the some folios may be missing the manuscript title or its
author. However, this information can be present in other copies of the manuscript
preserved elsewhere.
Therefore, the problem is determining how this information can be integrated into the
catalogue. To solve it, we propose an annotation and transcription method that uses
online-digitised manuscripts. An interface is provided to the user who can make his/her
comments, complete the metadata or transcribe the manuscript. Once the mediator
validates the inputted data, the system extracts information from the annotation
documents or the transcription documents and automatically updates the catalogue. We
call this procedure dynamic cataloguing. This procedure shows that the cataloguing of
old Arabic manuscripts is scalable and gives the catalogue a dynamic aspect.
1.1 The use of annotations and transcriptions for cataloguing manuscripts
Technology has become very developed and information has become available
everywhere at every time. However, Arabic manuscript cataloguing faces many
problems regarding information unavailability resulting from the manuscript description
based on dynamic metadata.
In our paper, we propose a new cataloguing model of old Arabic manuscripts.
This cataloguing model requires a permanent update of the catalogue, and the
implementation of such model requires the following:
online publishing of images from a database of the digitised manuscripts;
implementing the access and navigation tools for the manuscripts images database;
categorising annotators and transcribers (such as experts, researchers and scholars);
validating annotation strategy by the mediator; and
implementing the extraction model of the metadata for improving the catalogue.
252
LHT
35,2

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT