Podcast search: user goals and retrieval technologies

Date22 June 2010
DOIhttps://doi.org/10.1108/14684521011054053
Published date22 June 2010
Pages395-419
AuthorJana Besser,Martha Larson,Katja Hofmann
Subject MatterInformation & knowledge management,Library & information science
Podcast search: user goals and
retrieval technologies
Jana Besser
VU University Amsterdam, Amsterdam, The Netherlands
Martha Larson
Delft University of Technology, Delft, The Netherlands, and
Katja Hofmann
University Amsterdam, Amsterdam, The Netherlands
Abstract
Purpose – This research aims to identify users’ goals and strategies when searching for podcasts
and their impact on the design of podcast retrieval technology. In particular, the paper seeks to explore
the potential to address user goals with indexing based on podcast metadata and automatic speech
recognition (ASR) transcripts.
Design/methodology/approach The paper conducted a user study to obtain an overview of
podcast search behaviour and goals, using a multi-method approach of an onlinesurvey, a diary study,
and contextual interviews. In a subsequent podcast retrieval experiment, the paper investigated the
retrievalperformance of the two choicesof indexing features for searchgoals identified during the study.
Findings – The paper found that study participants used a variety of search strategies, partially
influenced by available tools and their perceptions of these tools. Furthermore the experimental results
revealed that retrieval using ASR transcripts performed significantly better than metadata-based
searching. However, a detailed result analysis suggested that the efficacy of the indexing methods was
search-goal dependent.
Research limitations/implications The research constitutes a step towards a future framework
for investigating user needs and addressing them in an experimental set-up. It was primarily
qualitative and exploratory in nature.
Practical implications Podcast search engines require evidence about suitable indexing methods
in order to make an informed decision concerning whether it is worth the resources to generate speech
recognition transcripts.
Originality/value – Systematic studies of podcast searching have not previously been reported.
Investigations of this kind hold the potential to optimise podcast retrieval in the long term.
Keywords Information retrieval, User studies,Speech recognition, Search engines,Audio media
Paper type Research paper
Introduction
Technologies for access and retrieval have been identified as the bottleneck for
exploitation of spoken audio content, which is currently being created and stored at an
unprecedented rate (Koumpis and Renals, 2005; Chelba et al., 2008). On the internet,
The current issue and full text archive of this journal is available at
www.emeraldinsight.com/1468-4527.htm
The authors would like to thank the participants of the user study for participating in their
research. They are also grateful to Frank van Gils for giving them permission to use his podcast
retrieval system during the user interviews. Furthermore they would like to thank their
anonymous reviewers for their critical and helpful comments.
Podcast search
395
Refereed article received
4 June 2009
Approved for publication
8 January 2010
Online Information Review
Vol. 34 No. 3, 2010
pp. 395-419
qEmerald Group Publishing Limited
1468-4527
DOI 10.1108/14684521011054053
spoken audio often takes the form of a series of audio episodes delivered via a feed,
called a podcast (van Gils, 2008; Mizuno et al., 2008, Tsagkias et al., 2010). The number
of podcasts available online, which are collectively called the podosphere, has grown
steadily and users are downloading increasing numbers of podcasts (Madden, 2008).
As the podosphere expands, it becomes even more difficult for listeners to find
podcasts that interest them and the need for effective retrieval technologies becomes
increasingly urgent.
The challenge of podcast search is a distinctive one, due to the structure and nature of
the content of the podosphere. The process of podcasting involves a podcaster who
creates podcast episodes, ideally at regular intervals, and makes them available online by
adding them to a podcast feed. This process provides two sources of information that can
be exploited for finding podcast feeds and podcast episodes. First podcast metadata can
be indexed for use in retrieval. Metadata is created by the podcaster at both the feed level
(i.e. describing the full podcast) and the episode level (i.e. describing each new episode).
Examples of typical metadata fields are “title” and “description”. We refer to features (i.e.
words and phrases) derived from metadata as metadata-based indexing features. Second
podcast content can be indexed for use in retrieval. The audio content, i.e. the file that the
user downloads so they can listen to it, constitutes the core of a podcast episode. The
audio file can be indexed using audio processing technology. In our case we focus on
podcasts that contain spoken content. Automatic speech recognition (ASR) technology is
used to create a text transcript of the spoken content of each podcast episode. These
transcripts are then used as the source of indexing features for podcast retrieval. We refer
to these features as ASR indexing features or content-based indexing features.
The process of creating podcasts is straightforward and no specialised knowledge
or equipment is necessary to become a podcaster. As a result, although many podcasts
are distributed by professional broadcasters, a sizable proportion of podcasts contain
user-generated content. The quality and substance of content in the podosphere varies
because of the diversity in podcaster styles, production standards and domains of
interest. Although metadata forms an integral part of a podcast, it is often woefully
incomplete, both at feed and at episode level. Spoken audio is characterised by low
channel quality and a range of effects not present in professional or scripted speech,
but arising from contexts in which speech is informal, spontaneous and conversational.
Such effects include disfluencies, sentence-fragments and emotion-based variability
(see Shriberg, 2005; Besser and Alexandersson, 2007) – all posing challeng es for ASR
technology. In sum both metadata and speech recognition transcripts are sources of
indexing features that can be used for podcast retrieval, but both suffer from noise in
the form of omissions and errors.
Retrieval of podcasts containing spoken content is a relatively new field, but an
interesting bifurcation can already be observed. The research that has appeared to date
related to podcastretrieval has yet to grow beyond the first few significant contributions
(i.e. Goto et al., 2007; Mizuno et al., 2008; Ogata and Goto, 2009) but so far shows a clear
trend of concentrating exclusively on the use of speech recognition transcripts. In
practice however, metadata appears to be the method of choice for finding podcasts.
Although the exact algorithms used by web sites such as www.pluggd.tv and http://
suche.podcast.de are not public knowledge, interaction with such search engines gives
the impression that metadata and not speech transcripts are exploited for search
(however www.pluggd.tv does offer search within single episodes that is based on ASR
OIR
34,3
396

To continue reading

Request your trial

VLEX uses login cookies to provide you with a better browsing experience. If you click on 'Accept' or continue browsing this site we consider that you accept our cookie policy. ACCEPT