Assisting the appraisal of e-mail records with automatic classification

Pages293-313
Date21 November 2016
DOIhttps://doi.org/10.1108/RMJ-02-2016-0006
Publication Date21 November 2016
AuthorAndré Vellino,Inge Alberts
SubjectInformation & knowledge management,Information management & governance
Assisting the appraisal of e-mail
records with automatic
classication
André Vellino and Inge Alberts
School of Information Studies, University of Ottawa, Ontario, Canada
Abstract
Purpose – This paper aims to investigate how automatic classication can assist employees and
records managers with the appraisal of e-mails as records of value for the organization.
Design/methodology/approach The study performed a qualitative analysis of the appraisal
behaviours of eight records management experts to train a series of support vector machine classiers
to replicate the decision process for identifying e-mails of business value. Automatic classication
experiments were performed on a corpus of 846 e-mails from two of these experts’ mailboxes.
Findings Despite the highly contextual nature of record value, these experiments show that
classiers have a high degree of accuracy. Unlike existing manual practices in corporate e-mail
archiving, machine classication models are not highly dependent on features such as the identity of the
sender and receiver or on threading, forwarding or importance ags. Rather, the dominant
discriminating features are textual features from the e-mail body and subject eld.
Research limitations/implications The need to automatically classify corporate e-mails is
growing in importance, as e-mail remains one of the prevalent recordkeeping challenges.
Practical implications – Automated methods for identifying e-mail records promise to be of signicant
benet to organizations that need to appraise e-mail for long-term preservation and access on demand.
Social implications – The research adopts an innovative approach to assist employees and records
managers with the appraisal of digital records. By doing so, the research fosters new insights on the
adoption of technological strategies to automate recordkeeping tasks, an important research gap.
Originality/value – Our experiment show that a SVM classier can be trained to replicate an expert’s
decision process for identifying e-mails of business value with a reasonably high degree of accuracy. In
principle, such a classier could be integrated into a corporate Electronic Document and Records
Management System (EDRMS) to improve the quality of e-mail records appraisal.
Keywords E-mail, Machine learning, Record, Business value, Automatic classication
Paper type Research paper
1. Introduction
E-mail is one of the main sources of recordkeeping challenges in organizations (Winget
et al., 2006;Bailey, 2008;Lips et al., 2008;Alberts, 2013;Zwarich, 2014;Zhang, 2015).
According to a recent market survey, global business e-mail trafc will continue
growing by about 7 per cent per year to over 139.4 billion e-mails sent per day by 2018
(The Radicati Group, 2014). This increasing volume of e-mail exacerbates the many
The authors would like to thank the participants for their generous contribution to this research
and to the anonymous reviewers for their helpful suggestions.
Funding: This research was supported by a research grant from the Faculty of Arts, University
of Ottawa.
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0956-5698.htm
Appraisal of
e-mail records
293
Received 24 February 2016
Revised 24 May 2016
Accepted 25 May 2016
RecordsManagement Journal
Vol.26 No. 3, 2016
pp.293-313
©Emerald Group Publishing Limited
0956-5698
DOI 10.1108/RMJ-02-2016-0006
difculties that organizations face in its appraisal, management, maintenance and
effective retrieval. Some organizations believe that maintaining all e-mail records
indenitely is a viable solution, but doing so leads to a host of information discovery
issues, increased cost, litigation and employee dissatisfaction (Brogan and
Vreugdenburg, 2008, p. 65). These e-mail-related risks underscore the need for better
appraisal practices, as well as for scalable approaches for dealing with an increasing
volume of digital information.
In the past, archivists and records managers were primarily tasked with the
appraisal of the “worthiness” of paper-based records to preserve business-critical
information, satisfy compliance regulations and promote information retrieval in a
timely, effective manner (Eastwood, 2004, p. 202). In the digital environment, employees
are becoming increasingly involved in the appraisal process. For instance, the
Government of Canada guidelines on the management of e-mail (Library and Archives
Canada, 2014) state that “all staff are responsible for distinguishing between electronic
messages relating to the ofcial business of the Government of Canada and those
relating to activities of a personal nature” and emphasize “the responsibility of the
originators to classify their e-mail documents at creation into the departmental
classication system if one exists”. While such a policy clearly states that the onus of
corporate responsibility for managing e-mails lies with all employees, staff still lack
both the practical tools and the comprehensive strategies with which to identify which
records they must retain.
Existing tools for corporate e-mail records management rely primarily on the manual
tagging of e-mails by senders and recipients in accordance with their subjective
assessment of the relevance of the e-mail’s contents. These manual tagging procedures
thus force employees to shoulder a large share of the burden for ascertaining which
information is worth keeping. For example, solutions such as HP’s WorkSite e-mail
Management system (Hewlet Packard ECM, 2015) or Open Text Email Management
products (Open Text Email Solutions, 2015) are designed primarily for the end-user to
control the archiving and retention of e-mails. However, repeated research has shown
that employees lack motivation with respect to tasks related to the appraisal and
classication of records (Bailey, 2008;Mäkinen and Henttonen, 2011,Goldschmidt et al.,
2012;Jordan and deStricker, 2013;McKemmish and Piggott, 2013).
One of the reasons for this lack of motivation lies in current e-mail management
systems, which suffer from several deciencies. First, these systems are designed
according to traditional paper-based practices, where information is manually classied
into a hierarchical folder structure. This imposes a considerable cognitive load on the
end-user’s labelling decision for each and every e-mail that needs to be treated. A recent
report from the US National Archives and Records Administration concludes that to
depend on busy employees who are focused on fullling the mission of their
organization leads to inconsistent recordkeeping across the government (NARA, 2015,
p. 5). Second, e-mail management systems are not designed to take full advantage of
existing machine-classication methods to treat large volumes of e-mail in a consistent
and scalable manner. As suggested by Bailey (2009), future records management
practices need to put a greater emphasis on automation to address the growing volume
of digital information.
The objective of our research is to show how automatic classication can assist
organizations in the appraisal of e-mails for the purpose of records management. We
RMJ
26,3
294

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT