Assisting the appraisal of e-mail records with automatic classification
Pages | 293-313 |
Date | 21 November 2016 |
DOI | https://doi.org/10.1108/RMJ-02-2016-0006 |
Published date | 21 November 2016 |
Author | André Vellino,Inge Alberts |
Subject Matter | Information & knowledge management,Information management & governance |
Assisting the appraisal of e-mail
records with automatic
classication
André Vellino and Inge Alberts
School of Information Studies, University of Ottawa, Ontario, Canada
Abstract
Purpose – This paper aims to investigate how automatic classication can assist employees and
records managers with the appraisal of e-mails as records of value for the organization.
Design/methodology/approach – The study performed a qualitative analysis of the appraisal
behaviours of eight records management experts to train a series of support vector machine classiers
to replicate the decision process for identifying e-mails of business value. Automatic classication
experiments were performed on a corpus of 846 e-mails from two of these experts’ mailboxes.
Findings – Despite the highly contextual nature of record value, these experiments show that
classiers have a high degree of accuracy. Unlike existing manual practices in corporate e-mail
archiving, machine classication models are not highly dependent on features such as the identity of the
sender and receiver or on threading, forwarding or importance ags. Rather, the dominant
discriminating features are textual features from the e-mail body and subject eld.
Research limitations/implications – The need to automatically classify corporate e-mails is
growing in importance, as e-mail remains one of the prevalent recordkeeping challenges.
Practical implications – Automated methods for identifying e-mail records promise to be of signicant
benet to organizations that need to appraise e-mail for long-term preservation and access on demand.
Social implications – The research adopts an innovative approach to assist employees and records
managers with the appraisal of digital records. By doing so, the research fosters new insights on the
adoption of technological strategies to automate recordkeeping tasks, an important research gap.
Originality/value – Our experiment show that a SVM classier can be trained to replicate an expert’s
decision process for identifying e-mails of business value with a reasonably high degree of accuracy. In
principle, such a classier could be integrated into a corporate Electronic Document and Records
Management System (EDRMS) to improve the quality of e-mail records appraisal.
Keywords E-mail, Machine learning, Record, Business value, Automatic classication
Paper type Research paper
1. Introduction
E-mail is one of the main sources of recordkeeping challenges in organizations (Winget
et al., 2006;Bailey, 2008;Lips et al., 2008;Alberts, 2013;Zwarich, 2014;Zhang, 2015).
According to a recent market survey, global business e-mail trafc will continue
growing by about 7 per cent per year to over 139.4 billion e-mails sent per day by 2018
(The Radicati Group, 2014). This increasing volume of e-mail exacerbates the many
The authors would like to thank the participants for their generous contribution to this research
and to the anonymous reviewers for their helpful suggestions.
Funding: This research was supported by a research grant from the Faculty of Arts, University
of Ottawa.
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0956-5698.htm
Appraisal of
e-mail records
293
Received 24 February 2016
Revised 24 May 2016
Accepted 25 May 2016
RecordsManagement Journal
Vol.26 No. 3, 2016
pp.293-313
©Emerald Group Publishing Limited
0956-5698
DOI 10.1108/RMJ-02-2016-0006
difculties that organizations face in its appraisal, management, maintenance and
effective retrieval. Some organizations believe that maintaining all e-mail records
indenitely is a viable solution, but doing so leads to a host of information discovery
issues, increased cost, litigation and employee dissatisfaction (Brogan and
Vreugdenburg, 2008, p. 65). These e-mail-related risks underscore the need for better
appraisal practices, as well as for scalable approaches for dealing with an increasing
volume of digital information.
In the past, archivists and records managers were primarily tasked with the
appraisal of the “worthiness” of paper-based records to preserve business-critical
information, satisfy compliance regulations and promote information retrieval in a
timely, effective manner (Eastwood, 2004, p. 202). In the digital environment, employees
are becoming increasingly involved in the appraisal process. For instance, the
Government of Canada guidelines on the management of e-mail (Library and Archives
Canada, 2014) state that “all staff are responsible for distinguishing between electronic
messages relating to the ofcial business of the Government of Canada and those
relating to activities of a personal nature” and emphasize “the responsibility of the
originators to classify their e-mail documents at creation into the departmental
classication system if one exists”. While such a policy clearly states that the onus of
corporate responsibility for managing e-mails lies with all employees, staff still lack
both the practical tools and the comprehensive strategies with which to identify which
records they must retain.
Existing tools for corporate e-mail records management rely primarily on the manual
tagging of e-mails by senders and recipients in accordance with their subjective
assessment of the relevance of the e-mail’s contents. These manual tagging procedures
thus force employees to shoulder a large share of the burden for ascertaining which
information is worth keeping. For example, solutions such as HP’s WorkSite e-mail
Management system (Hewlet Packard ECM, 2015) or Open Text Email Management
products (Open Text Email Solutions, 2015) are designed primarily for the end-user to
control the archiving and retention of e-mails. However, repeated research has shown
that employees lack motivation with respect to tasks related to the appraisal and
classication of records (Bailey, 2008;Mäkinen and Henttonen, 2011,Goldschmidt et al.,
2012;Jordan and deStricker, 2013;McKemmish and Piggott, 2013).
One of the reasons for this lack of motivation lies in current e-mail management
systems, which suffer from several deciencies. First, these systems are designed
according to traditional paper-based practices, where information is manually classied
into a hierarchical folder structure. This imposes a considerable cognitive load on the
end-user’s labelling decision for each and every e-mail that needs to be treated. A recent
report from the US National Archives and Records Administration concludes that to
depend on busy employees who are focused on fullling the mission of their
organization leads to inconsistent recordkeeping across the government (NARA, 2015,
p. 5). Second, e-mail management systems are not designed to take full advantage of
existing machine-classication methods to treat large volumes of e-mail in a consistent
and scalable manner. As suggested by Bailey (2009), future records management
practices need to put a greater emphasis on automation to address the growing volume
of digital information.
The objective of our research is to show how automatic classication can assist
organizations in the appraisal of e-mails for the purpose of records management. We
RMJ
26,3
294
To continue reading
Request your trial