Does speaker’s voice enthusiasm affect social cue, cognitive load and transfer in multimedia learning?

Publication Date14 March 2020
AuthorTze Wei Liew,Su-Mae Tan,Teck Ming Tan,Si Na Kew
SubjectLibrary & information science,Librarianship/library management,Library & information services
Tze Wei Liew
Human-Centric Technology Interaction SIG, Multimedia University,
Melaka, Malaysia
Su-Mae Tan
Department of Information Science and Technology, Multimedia University,
Melaka, Malaysia
Teck Ming Tan
Oulu Business School, University of Oulu, Oulu, Finland, and
Si Na Kew
Language Academy, Universiti Teknologi Malaysia, Johor Bahru, Malaysia
Purpose This study aims to examine the effects of voice enthusiasm (enthusiastic voice vs calm
voice) on social ratings of the speaker, cognitive load and transfer performance in multimedia
Design/methodology/approach Two laboratory experiments were conducted in which learners
learned froma multimedia presentation about computer algorithmthat was narrated by either an enthusiastic
human voiceor a calm human voice.
Findings Results from Experiment 1 revealed that the enthusiastic voice narration led to higher social
ratings of the speaker and transfer performance when compared to the calm voice narration. Experiment 2
demonstrated that the enthusiasticvoice led to higher affective social ratings (human-like and engaging)and
transfer performance as comparedto the calm voice. Moreover, it was shown that a calm voice prompted a
higher germane load thanan enthusiastic voice, which conforms to the argument that prosodiccues in voice
can inuenceprocessing in multimedia learning among non-nativespeakers.
Originality/value This study extends from prior studies that examined voice effects related to
mechanization,accent, dialect, and slang in multimedia learning to examiningthe effects of voice enthusiasm
in multimedialearning.
Keywords Voice, Cognitive load, Enthusiasm, Immediacy, Multimedia learning,
Social agency theory
Paper type Research paper
This research was supported by Malaysian Ministry of Higher Education under Fundamental
Research Grant Scheme with ID: FRGS/1/2019/SSI09/MMU/03/5. The authors would like to thank
Jessica Tee, Wei Ming Pang, and Muhammad Mukhtar Bin Miz for their assistance in setting up the
computer laboratories. The authors thank the editor and anonymous reviewers for their valuable
comments. Last but not least, the authors thank the participants for their willingness to participate in
the experiments.
Received26 November 2019
Revised21 January 2020
Accepted5 February 2020
Informationand Learning
Vol.121 No. 3/4, 2020
pp. 117-135
© Emerald Publishing Limited
DOI 10.1108/ILS-11-2019-0124
The current issue and full text archive of this journal is available on Emerald Insight at:
1. Introduction
E-learning is a form of multimedia instruction in which information is representedby both
visual (e.g. diagrams, maps, animations and illustrations) and verbal elements (spoken
narrations and on-screen texts) (Mayer, 2017). Mayer and Moreno (1998) proposed the
cognitive theory of multimedia learning to describe the mechanism in which visual and
verbal information from the multimedia presentations are processed cognitively. Based on
this framework, multimedia learning involves three crucial cognitive processes, which are
selecting, organizing and integrating. Selecting refers to the process engaged by learners to
selectively focus on relevant visual and verbal information. After the selection process,
learners engage in the organizing process when visual and verbal information are formed
into a meaningful and coherent representation. Finally, the integration is when prior
knowledge is activated and used to build a connection between the newly presented
information and pre-existingknowledge schema.
In accordance with the framework of cognitive theory of multimedia learning, Mayer and
his colleagues have established a set of multimedia principles, which are evidenced-based
recommendations for instructional design that are aimed to effectively produce deep
meaningful learning (Clark and Mayer, 2016;Mayer, 2017). One of the multimedia principles is
the voice principle, which posits that people learn more deeply when the words in a multimedia
message are spoken in a human voice with standard accent rather than in a machine-
synthesized voice or a human voice with foreign accent (Mayer, 2005;Mayer et al., 2003;
Atkinson et al., 2005;Mayer and DaPra, 2012). This effect is attributed to the social agency
theory, which states that the multimedia instruction should be designed to trigger social
interaction schema in learners mind, which then leads learners to assume the computer source
as a social partner (Mayer et al., 2003;Mayer, 2005). Assuming a social interaction stance will
encourage learners to deeply engage in the process of selecting, organizing and integrating
instructional messages. Therefore, in accordance with social agency theory, using human voice
with standard accent fosters higher social agency than human voice with a foreign accent or
computer-synthesized voice; and thus, leads to a superior learning outcome.
However, beyond the voice characteristics of accents and mechanization (i.e. human or
machine-synthesized voice),there is a paucity of studies that investigatevoice enthusiasm in
the multimedia learning environment. Mayer et al. (2003,p.424)notedthatadditional work
is needed to pinpoint,which aspects of voice are most importantin promoting deep learning.
Furthermore, many multimedia presentations feature invisiblenarrators (disembodied
source speakers that have no visual features e.g. face and body) where voices are the only
source of social cues (e.g. Khan videos), with narrations delivered via a pleasant but calm
voice that does notconvey high enthusiasm. While thecurrent literature has shown thatthe
instructors high enthusiasm behaviors, which involve visual nonverbal cues such as body
gesture and facial expression can benet engagement and learning (Wang et al.,2019;Guo
et al.,2014;Liew et al.,2017); however, it is not known whether the positive effect of high
enthusiasmcan also manifest in a multimedia environment presentedby a voice-only virtual
speaker. Thus, grounded on the social agency theory, this paper aims to examine if an
enthusiastic voice as compared to a calm voice (low enthusiasm) will differently affect the
perceivedsocial cues, cognitive load andlearning outcome of learners.
2. Literature review
2.1 Social agency theory
According to the social agency theory, imbuing multimedia presentation with verbal and non-
verbal stimuli that convey social cues, can lead learners to interpret the multimedia message as
a social communication process, which, in turn, encourages learners to put more effort in

