Factors hindering shared files retrieval

Date16 December 2019
AuthorOfer Bergman,Tamar Israeli,Steve Whittaker
SubjectLibrary & information science,Information behaviour & retrieval,Information & knowledge management,Information management & governance,Information management
Ofer Bergman and Tamar Israeli
Department of Information Science, Bar-Ilan University, Ramat Gan, Israel, and
Steve Whittaker
Department of Computational Media, UCSC, Santa Cruz, California, USA
Purpose Personal information management (PIM) is an activity in which people store information items in
order to retrieve them later. The purpose of this paper is to test and quantify the effect of factors related to
collection size, file properties and workload on file retrieval success and efficiency.
Design/methodology/approach In the study, 289 participants retrieved 1,557 of their shared files in a
naturalistic setting. The study used specially developed software designed to collect shared filesnames and
present them as targets for the retrieval task. The dependent variables were retrieval success, retrieval time
and misstep/s.
Findings Various factors compromise shared files retrieval including: collection size (large number of files),
file properties (multiple versions, size of team sharing the file, time since most recent retrieval and folder
depth) and workload (daily e-mails sent and received). The authors discuss theoretical reasons for these
negative effects and suggest possible ways to overcome them.
Originality/value Retrieval is the main reason people manage personal information. It is essential for retrieval
to be successful and efficient, as information cannot be used unless it can be re-accessed. Prior PIM research has
assumed that factors related to collection size, file properties and workload affect file retrieval. However, this is the
first study to systematically quantify the negative effects of these factors. As each of these factors is expected to
be exacerbated in the future, this study is a necessaryfirststeptowardaddressingtheseproblems.
Keywords Workload, Personal information management,Collection size, File retrieval, Shared files, Versions
Paper type Research paper
Personal information management (PIM) is an activity in which people store information
items (e.g. files and e-mails) in order to retrieve them later. Early PIM research was often
qualitative and exploratory, relying on interviews and observational studies. Such methods
were important for identifying basic PIM phenomena, but as the research field matures, it
needs to be supported by more rigorous quantitative approaches (Kelly and Teevan, 2007).
Retrieval is the main reason people manage their personal information. It is essential for
retrieval to be both successful and efficient, as information cannot be used unless it can be
re-accessed(Bergman and Whittaker, 2016). Priorresearch suggests severalfactors that could
hinder retrieval.These factors relateto: collection size (Jensen et al., 2010; Bergman et al.,2009;
Berlin et al., 1993; Bao etal., 2006; Crowder et al., 2015; Jones et al., 2008), file-related attributes
such as versioning (Bergman et al., 2014, 2015; Karlson et al.,2011;Jensenet al., 2010; Jones,
2007; Kearns et al., 2014; Rasel and Ali, 2016; Bergman and Whittaker, 2016) and user
workload (Hauck et al., 2008; Hockey and Earle, 2006). However, that prior work has not
systematically examined and measured how these factors affect file retrieval. As part of a
larger research project, this study aimsto test the effect of these three factors on shared files
retrieval success and efficiency. We chose to sample shared files because they are an
increasingly important aspect of the modern work process due to the growing use of
cloud-based storage (Anderson and Rainie, 2012; Matthews et al., 2013), and not b ecause we
thought that these factors would have different effects on them than on personal files.
Aslib Journal of Information
Vol. 72 No. 1, 2020
pp. 130-147
© Emerald PublishingLimited
DOI 10.1108/AJIM-05-2019-0120
Received 17 May 2019
Revised 16 August 2019
7 November 2019
Accepted 20 November 2019
The authors thank the participants and research assistants for their time and efforts. This research
was funded by the Israeli Science Foundation Grant No. 1074/16.
To motivate our study, we now review prior research into these three factors: collection size,
properties of the files to be retrieved and user workload, suggesting why these factors
affect retrieval.
Collection size
Collection size has frequently been suggested as a factor affecting the success of retrieval
( Jensen et al., 2010; Bergmanet al., 2009; Be rlin etal., 199 3; Bao et al.,2006; Crowder et al., 2015;
Jones et al., 2008). For example, Jensen et al. (2010) describe our personal information
collections as black holeswhich grow in size due tothe availability of cheap storage which
makes finding specific information increasingly difficult. But why is retrieval harder when
done from larger collections? Despite advances in desktop search technology, users prefer
retrieving filesusing folder navigation rather than query-based search (Bergman et al., 2008;
Fitchett and Cockburn, 2015). Larger collections mean that people haveto both organize and
navigatethrough more files. As a result, we expectnavigational retrievals to be lesssuccessful
when done from largercollections; it is well known in the fieldof cognitive psychology that in
visual search, the number of irrelevant distracters (in this case, files and folders that are
unrelatedto the current retrieval) increasesthe time taken for people to identify a targetobject
(Neisser,1964; Treisman and Gelade, 1980).Prior work also indicates that peopletend to retain
relatively small numbers of files in each of their folders to optimize retrievalfrom that folder
(Bergman et al., 2010). One consequence of creating relatively compact folders, however, is
that larger overall collections lead to more complex folder hierarchies that are harder to
maintain, remember and hence access.Furthermore, these sizeeffects may not be restricted to
navigation; Jensen et al. also suggest that collection growth could compromise search based
retrieval;larger collections decreasethe chance of uniquely identifyingthe desired file through
a small set of keywords (Jensen et al., 2010). Regardles s of strategy, retr ieval is therefore
expected to suffer as personal collections grow in size (Bergman et al., 2009).
Numerous early descriptive PIM studies also argue for effects of collection size. Explorations
of how people manage paper documents suggest that organizational pilingstrategies that
work well for small collections do not scale up as collections expand (Malone, 1983; Whittaker
and Hirschberg, 2001). Likewise, studies of e-mail organization argue that strategies that are
effective for low volumes of messages are less successful at higher message volumes. For
example, many people leave e-mails unfiled in their inbox as visible reminders of todoitems
they need to execute. However, large e-mail volumes bury these messages out of sight and out
of mind,compromising their reminding function (Whittaker and Sidner, 1996; Whittaker et al.,
2007; Ducheneaut and Bellotti, 2001; Bellotti et al., 2003). These arguments suggest that an inbox
pilingorganization method does not scale well, although foldering does not yield better
retrieval results (Whittaker et al., 2011). Finally, Jones et al. (2008) found scalability problems to
be among the Top 5 factors most frequently contributing to the abandonment of information
management strategies. In total, 8 of their 22 participants reported abandoning an information
management strategy (such as using a notebook to-do list, a shared file naming system and
Microsoft Projects software) for reasons such as project growth and complexity, as well as
increased collaboration. However, to the best of our knowledge, the effect of collection size on
retrieval success and efficiency has never been systematically evaluated.
File properties
Another underexplored PIM variable concerns file properties. Although some PIM research
has examined the organization and retrieval of specific types of information items such as
e-mail (Whittaker et al., 2007, 2011; Whittaker and Sidner, 1996; Dabbish and Kraut, 2006;
Bellotti et al., 2003) or photographs (Kirk et al., 2006; Whittaker et al., 2010), most prior PIM
work has not systematically investigated file properties. In particular, we will review the
shared files

