Social science data repositories in data deluge. A case study of ICPSR’s workflow and practices

DOIhttps://doi.org/10.1108/EL-11-2016-0243
Date07 August 2017
Published date07 August 2017
Pages626-649
AuthorWei Jeng,Daqing He,Yu Chi
Subject MatterInformation & knowledge management,Information & communications technology,Internet
Social science data
repositories in data deluge
A case study of ICPSR’s workow
and practices
Wei Jeng
School of Information Sciences, University of Pittsburgh, Pittsburgh,
Pennsylvania, USA, and
Daqing He and Yu Chi
School of Computing and Information, University of Pittsburgh, Pittsburgh,
Pennsylvania, USA
Abstract
Purpose Owing to the recent surge of interest in the age of the data deluge, the importance of researching
data infrastructures is increasing. The open archival information system (OAIS) model has been widely
adopted as a framework for creating and maintaining digital repositories. Considering that OAIS is a reference
model that requires customization for actual practice, this paper aims to examine how the current practices in
a data repository map to the OAIS environment and functional components.
Design/methodology/approach The authors conducted two focus-group sessions and one individual
interview with eight employees at the world’s largest social science data repository, the Interuniversity
Consortium for Political and Social Research (ICPSR). By examining their current actions (activities regarding
their work responsibilities) and IT practices, they studied the barriers and challenges of archiving and
curating qualitative data at ICPSR.
Findings The authors observed that the OAIS model is robust and reliable in actual service processes for data
curation and data archives. In addition, a data repository’s workow resembles digital archives or even digital
libraries. On the other hand, they nd that the cost of preventing disclosure risk and a lack of agreement on the
standards of text data les are the most apparent obstacles for data curation professionals to handle qualitative
data; the maturation of data metrics seems to be a promising solution to several challenges in social science data
sharing.
Originality/value The authors evaluated the gap between a research data repository’s current practices
and the adoption of the OAIS model. They also identied answers to questions such as how current
technological infrastructure in a leading data repository such as ICPSR supports their daily operations, what
the ideal technologies in those data repositories would be and the associated challenges that accompany these
ideal technologies. Most importantly, they helped to prioritize challenges and barriers from the data curator’s
perspective and to contribute implications of data sharing and reuse in social sciences.
Keywords Data sharing, Digital repositories, Open archival information system (OAIS),
Research data curation
Paper type Research paper
The authors thank the iFellowship, guided by the Committee on Coherence at Scale (CoC) for Higher
Education, sponsored by the Council on Library and Information Resources (CLIR) and Andrew W.
Mellon Foundations, as well as Beta-Phi-Mu Honor Society, which provided research funding for this
project. This study is also partially supported by the project titled Research on Knowledge Organization
and Service Innovation in the Big Data Environments funded by the National Natural Science
Foundation of China (No. 71420107026). The authors also thank Drs Nora Mattern, Liz Lyon, Sheila
Corrall, Jian Qin, Jung Sun Oh and Stephen Grifn for their invaluable comments and suggestions on
this research project. Last but not least, the authors thank all participants and people who helped
facilitate the eld study at ICPSR for their valuable input and assistance.
The current issue and full text archive of this journal is available on Emerald Insight at:
www.emeraldinsight.com/0264-0473.htm
EL
35,4
626
Received 1 November 2016
Revised 16 March 2017
Accepted 9 April 2017
TheElectronic Library
Vol.35 No. 4, 2017
pp.626-649
©Emerald Publishing Limited
0264-0473
DOI 10.1108/EL-11-2016-0243
1. Introduction
As the research paradigms in science disciplines become data-intensive and collaborative
(Hey et al., 2009), researchers are promoting data as the “infrastructure of science”, critical in
forming “the basis for good scientic decisions, wise management and use of resources and
informed decision-making” (Tenopir et al., 2011). Although disciplinary cultural differences
exist between social sciences and natural sciences, the former discipline is changing to
require greater access to data and more transparency (Elman and Kapiszewski, 2013;Guest
et al., 2012). All of this calls for a strong emphasis on data depositing and sharing.
Despite the recent surge of interest in the age of the data deluge, managing digital
resources inside a repository for the purposes of preservation and access is neither novel nor
unique. Since the late 1990s, digital library communities have been designing and improving
the concept of a trusted digital repository, which, by its denition, should possess key
attributes, such as “reliability”, “long-term accessibility”, “resource manageability” and
“availability (for the designated community)”, all are recognized as critical requirements for
data management and curation services (Borgman et al., 2007).
The open archival information system (OAIS) is a well-known and widely adopted
conceptual model for creating and maintaining a digital repository. OAIS was proposed two
decades ago, and, ever since, it has become a standard for “maintaining digital information
over the long-term” (Lavoie, 2004, p. 2). The OAIS model can be viewed at three different
levels of granularity. The rst level describes the external world with which OAIS interacts.
The second level denes the internal workow of OAIS, including six functional entities:
ingesting, archive storage, data management, preservation planning, access and
administration (i.e. day-to-day operation). The third level denes the format of possible
inputs to the OAIS services.
Considering the important status of OAIS in digital repositories, and the fact that OAIS is
a conceptual reference model that requires customization or “translation” into actual practice
or a service (Vardigan and Whiteman, 2007), it is important to closely examine the practices
in data repositories and review how they adopt the OAIS model. So far, there have been
reports from data repository management teams documenting their adoption of OAIS for
data curation services, but there are few third-party studies examining how data curation
practices map to the OAIS model. Therefore, the rst research question in this preliminary
study is:
RQ1. What are the current practices in a data repository?
To closely examine how data repository services support social science data sharing, it is
necessary to gather information about how data professionals carry out current practices at
a research data repository. We conducted a case study on Interuniversity Consortium for
Political and Social Research (hereafter, ICPSR), world’s largest social science data
repository. Further, the gathered information was mapped to the OAIS environment and
OAIS’s suggested functional entities to examine the current practices with a scaffolding
reference. The case study with ICPSR is an opportunity to examine the support technologies
in data repository services. Therefore, the second two-part research question is:
RQ2. What are the current challenges of the underlying technologies at a data
repository? What are the desired information technologies (ITs) perceived by
employees to support their data repository services?
In addition to RQ1 and RQ2, several interesting ndings on the challenges and opportunities
in social science data sharing–reuse cycles are also reported. The study attempts to address
the following critical inquiries: What are the challenges or barriers encountered by data
627
Social
science data
repositories

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT