2006 - Multimedia learning - Working memory and the learning of word and picture diagrams

•

UFF

Glaucio Aranha

21/06/2021

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 3, do total de 12 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 6, do total de 12 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

Você viu 9, do total de 12 páginas

Faça como milhares de estudantes: teste grátis o Passei Direto

Esse e outros conteúdos desbloqueados

16 milhões de materiais de várias disciplinas

Impressão de materiais

Agora você pode testar o

Passei Direto grátis

Você também pode ser Premium ajudando estudantes

E aí, curtiu este material?

Ajude a incentivar outros estudantes a melhorar o conteúdo

Gostou desse material? Compartilhe! 🧡

Estudos Culturais

1.851 Materiais compartilhados

Baixe o app para aproveitar ainda mais

Leia os materiais offline, sem usar a internet. Além de vários outros recursos!

Prévia do material em texto

Learning and Instruction 16 (2006) 526e537
www.elsevier.com/locate/learninstruc
Multimedia learning: Working memory and the
learning of word and picture diagrams
Stephan Dutke a,*, Mike Rinck b
a University of Kaiserslautern, Department of Psychology, Pfaffenbergstr. 95, D-67663 Kaiserslautern, Germany
b Radboud University Nijmegen, Clinical Psychology and Behavioural Science Institute, PO Box 9104, 6500 HE Nijmegen, The Netherlands
Abstract
From the cognitive model of multimedia learning proposed by [Schnotz, W., & Bannert, M. (2003). Construction and interference
in learning from multiple representation. Learning and Instruction, 13, 141e156], two hypotheses regarding the learning of spatial
arrangements of objects were derived: the integration hypothesis and the multiple source hypothesis. In the experiment, ninety-six
participants first studied spatial arrangements of five objects each. The complete arrangements had to be inferred from pairs of
objects, because participants were shown either word pairs or picture pairs depicting adjacent objects. Afterwards, they were tested
using either object pairs or complete arrangements, and the test items consisted either of words or of pictures. In addition, the par-
ticipants were divided into four groups according to their verbal and visuospatial working memory capacity. The results showed (a)
that integrating pairs of objects into complete spatial arrangements required more working memory resources than evaluating the
pairs, irrespective of the objects represented by words or pictures, (b) that integration of elements from different sources (verbal
descriptions and pictorial depictions) required more working memory resources than integrating only depictive elements. The results
yield evidence for the proposed internal structure of Schnotz and Bannert’s model. The results are discussed with regard to individual
differences in working memory capacity, cognitive load and the design of multimedia-supported learning tasks.
� 2006 Elsevier Ltd. All rights reserved.
Keywords: Multimedia learning; Working memory; Comprehension of text and graphics
Recently, Schnotz and Bannert (2003) proposed a cognitive model of multimedia learning, which integrates a con-
siderable amount of empirical findings from the text and picture comprehension literature. So far, the model has been
evaluated primarily using learning performance data (Schnotz & Bannert, 1999, 2003). Designing multimedia learn-
ing environments, however, is aimed not only at enhancing learning results but also at optimizing learning efficiency
(e.g., Mayer & Moreno, 2003; Paas, Renkl, & Sweller, 2003). Optimizing efficiency requires data about learning re-
sults and the cognitive resources that have to be invested to achieve these results. Especially the amount of working
memory resources required for achieving a learning task is critical because working memory resources are assumed to
be strictly limited (e.g., Baddeley, 1986; Engle, Cantor, & Carullo, 1992; Mayer, 2003). For reasons explained below,
Schnotz and Bannert’s model is especially suitable to derive hypotheses on working memory demands required for
learning based on verbal and pictorial materials.
* Corresponding author. Tel.: þ49 631 205 2721; fax: þ49 631 205 3910.
E-mail address: dutke@rhrk.uni-kl.de (S. Dutke).
0959-4752/$ - see front matter � 2006 Elsevier Ltd. All rights reserved.
doi:10.1016/j.learninstruc.2006.10.002
mailto:dutke@rhrk.uni-kl.de
http://www.elsevier.com/locate/learninstruc
527S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
Schnotz and Bannert’s (2003) model of multimedia learning is aimed at explaining how information from different
external representations is integrated. It consists of a descriptive and a depictive branch of processes (see Fig. 1). The
descriptive branch comprises processes of symbol analysis that first construct a surface representation and then a prop-
ositional representation of the externally represented text. The depictive branch comprises analog structure mapping
processes that first construct a visual image and then a mental model of the externally represented picture or diagram.
As in pure (multi-level) text comprehension theories (e.g., van Dijk & Kintsch, 1983; Gernsbacher, 1990; Graesser,
Millis, & Zwaan, 1997; Graesser, Singer, & Trabasso, 1994; Johnson-Laird, 1983; Kintsch, 1998; Zwaan, Langston, &
Graesser, 1995), the mental model is assumed to be constructed on the basis of the surface representation and the prop-
ositional representation of the text. In contrast to these representations, the mental model does not represent features of
the text itself, but of the entities the text is referring to (Glenberg, Meyer, & Lindem, 1987). Going beyond the scope of
pure text comprehension models, Schnotz and Bannert assume a third source of information for the construction of the
mental model: the visual image generated from the external picture or diagram. In line with Johnson-Laird’s theory
(1983, 1996), the mental model differs from the visual image in that (a) the mental model is not bound to specific
sensory modalities, (b) not all graphical or pictorial elements in the visual image are mapped onto the mental model,
but only to task-relevant elements, and (c) the mental model is enriched by general knowledge. To summarize, the
mental model is the representation that integrates propositions from the text base, pictorial elements from the visual
image, and general world knowledge into a new, coherent structure representing the entities that text and picture are
jointly referring to.
This integration process is assumed to require working memory resources. Correlations between reading compre-
hension and different working memory span measures have been reported for children (De Jonge & De Jonge, 1996) as
well as for adults (Hacker & Osterland, 1995). Individuals with higher reading abilities were shown to perform working
memory updating processes more reliably than low ability readers (Palladino, Cornoldi, De Beni, & Pazzaglia, 2001).
Moreover, with regard to mental model construction, Friedman and Miyake (2000) demonstrated that the efficiency of
evaluating text probes requiring spatial inferences showed higher correlations with spatial working memory span than
Fig. 1. A cognitive model of multimedia learning (Schnotz & Bannert, 2003, p. 145). Copyright Pergamon Press, reprinted with permission??
528 S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
with verbal working memory span, whereas the opposite was shown for text probes requiring nonspatial inferences. To
summarize, the construction, updating, and use of mental models seem to require quite different types of working
memory resources.
From Schnotz and Bannert’s model, at least two hypotheses about the amount of working memory resources re-
quired in multimedia comprehension may be derived. We refer to the first one as the integration hypothesis. It states
that a learning task involving the integration of several propositions into a coherent mental model requires more verbal
working memory resources than a task involving recognition of single propositions. A corresponding prediction
applies to pictorial information: a task involving the integration of several pictorial elements into a coherent mental
model requires more visuospatial working memory resources than a task involving recognition of single pictorial
elements. The integration hypothesis follows from Schnotz and Bannert’s model for two reasons: First, Schnotz
and Bannert (2003) assume that comprehension is a continuous process in which mental structures are constructed
step by step in the learning process and are updated by currently processed information (verbal or pictorial). This in-
tegration process requires old and new information to be simultaneously available, which taxes working memory re-
sources. Availability of the integrated mental model requires not only retrieval of a particular proposition or pictorial
element, butalso retrieving its (inferred) relation(s) to other elements. Although this argument clearly corresponds to
Schnotz and Bannert’s model it also fits several other mental model theories of comprehension (e.g., van Dijk &
Kintsch, 1983; Gernsbacher, 1990; Graesser et al., 1994, 1997; Johnson-Laird, 1983; Kintsch, 1998; Zwaan et al.,
1995). Another argument, however, is unique to Schnotz and Bannert’s model. Their model consists of two different
processing branches specialized to processing of different types of representations. Based on a concept of working
memory which differentiates subsystems suitable for manipulating verbal (symbolic) and visuospatial (analog) rep-
resentations (e.g., Baddeley, 1986, 1992), it is concluded that integration processes in the descriptive branch primarily
tax verbal working memory resources and that integration processes in the depictive branch primarily tax visuospatial
working memory resources. This argument is unique to Schnotz and Bannert’s model because pure text comprehen-
sion theories as well as multimedia-oriented theories such as Sweller’s (1994) cognitive load theory lack the differ-
entiation of representational formats in processing verbal and pictorial materials. Mayer’s cognitive theory of
multimedia learning (Mayer & Moreno, 2002, 2003) has a similar architecture as Schnotz and Bannert’s model in
that it distinguished two processing ‘‘channels’’, one for words or text and one for pictures (Fig. 2). However, the
distinction between the ‘‘verbal-auditory channel’’ and the ‘‘visual-pictorial channel’’ is based on a combination of
modality (how is the information perceived) and representational format (how is the perceived information externally
represented). In contrast, Schnotz and Bannert (2003) explicitly stated that the descriptive and depictive branch of
their model are specialized for processing information represented in specific formats irrespective of the modality
in which this information is perceived. This conception corresponds more closely to the distinction of verbal and
visuospatial working memory resources than Mayer’s theory.
The second hypothesis (multiple source hypothesis) refers to the interaction of the descriptive and the depictive
branch in the Schnotz and Bannert model. This feature can be best explained by comparing Schnotz and Bannert’s
model to Mayer and Moreno’s (2002) cognitive theory of multimedia learning (cf. Fig. 2). Mayer and Moreno assume
that processing in both channels results in two mental models, a ‘‘verbal mental model’’ and a ‘‘visual mental model’’.
In both channels, information is processed independently until the two mental models are established. Referential con-
nections between the models are constructed only at this level of processing. In contrast, Schnotz and Bannert (2003)
Fig. 2. A cognitive theory of multimedia learning (Mayer & Moreno, 2002, p. 111). Copyright Pergamon Press, reprinted with permission??
529S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
assume that comprehension results in only one mental model constructed from elements of the visual image and the
propositional representation. Even the text surface representation may contribute to the construction of a referential
mental model (see Fig. 1). Thus, although the Schnotz and Bannert model assumes two processing branches, both
branches work independently only at their most basic level, that is, in subsemantic processing of verbal materials
and perceptual processing of pictorial materials. At the higher levels, however, Schnotz and Bannert expect the de-
pictive and descriptive branches to interact, most intensively in creating a common modality-unspecific mental model.
Interrelating the two processing branches, however, produces coordination costs absorbing working memory
resources (Baddeley, 1986; Hagendorf & Sá, 1996; Oberauer, 1993). This assumption leads to the hypothesis that
integrating elements from different sources (verbal descriptions and pictorial depictions) requires more working mem-
ory resources than integrating elements from one source.
The multiple source hypothesis can be studied with diagrams in which either words or other textual elements are
depicted in a specific spatial relation to each other (‘‘word diagrams’’, see Fig. 3) or in which pictures are depicted in
a specific spatial relation to each other (‘‘picture diagrams’’, see Fig. 4). Learning word diagrams requires (a) the anal-
ysis of symbol structures (descriptive branch) to construct a surface representation and (b) structure mapping
processes (depictive branch) to map the spatial relation between the objects onto the mental model. Processing in
both branches needs to be highly interrelated, because mapping the spatial relations depicted in the diagram onto
the mental model requires a representation of the elements (denoted by words) that constitute the spatial relation.
Because of this interrelatedness, working memory demands are assumed to be high. In learning picture diagrams,
processing is restricted to the depictive branch. Because no costs of interrelating descriptive and depictive processing
emerge, working memory demands are predicted to be lower for word diagrams. Mayer and Moreno’s (2002) theory
would not predict this difference. In learning word diagrams, processing would stop at the level of word base. Accord-
ing to their theory, a verbal mental model cannot be constructed because the words alone (in a word diagram) provide
no information about the spatial relation between the denoted objects. The spatial relation is processed in the visual-
pictorial channel and is finally (and solely) represented in the visual mental model. Thus, no coordination costs can
emerge between the processing channels, neither with word diagrams nor with picture diagrams. Note that corre-
sponding word and picture diagrams have the same informational content and involve the same modality. They differ
only with regard to the type of internal representations required to build a meaningful mental model.
The integration hypothesis can also be tested with these diagrams. In this study, participants learned (a) elements of
word diagrams (two words denoting objects in a specific spatial relation) and (b) elements of picture diagrams (a color
drawing depicting two objects in a specific spatial relation), whereas the complete object arrangement consisted of five
words or pictures. However, during the learning phase the complete arrangement was never shown to the participants.
After learning all elements of an object arrangement separately, participants were tested for recognition of (a) the pre-
sented elements and (b) the complete, integrated object arrangement. Both elements and integrated arrangements were
presented as word diagrams or as picture diagrams. According to the integration hypothesis, testing of integrated ob-
ject arrangements will yield longer recognition times than testing of object pairs, irrespective of whether they are
tested with word or picture diagrams. According to the multiple source hypothesis, testing of word diagrams will
cause longer recognition times than testing of picture diagrams, irrespective of whether they are tested with elements
or with complete object arrangements. The hypothesis that increased recognition times are due to increased demands
on working memory was tested by differentiating between individuals with higher and lower verbal or visuospatial
working memory capacity. It was expected that the hypothesized differences between recognizing (a) elements vs.
integrated arrangements and (b) word diagrams vs. picture diagrams should be greater for participants with lower
working memory resources. Thus, performance differences in individuals differing with regard to their working mem-
ory resources will be interpreted as indicating different working memory demands.
Strawberries Apple Banana
Pear Pineapple
Fig. 3. A sample arrangement of objects denoted by words (word diagram). Note: Complete arrangements wereused for testing only; they were
not studied by the participants. Original words were in German.
530 S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
1. Method
1.1. Participants
Ninety-six students (37 men, 59 women) from various departments of Dresden University of Technology partic-
ipated in the experiment, either for course credit or for a small monetary compensation. Depending on their verbal
working memory capacity and their visuospatial working memory capacity, the participants were divided into four
groups of comparable size (between n¼ 23 and n¼ 26, low vs. high verbal capacity combined with low vs. high vi-
suospatial capacity).
1.2. Study materials
A total of seven spatial arrangements were created for the experiment (a practice arrangement and six experimental
ones). Each spatial arrangement consisted of five objects located at five of the six possible positions created by a matrix
of two rows and three columns. Fig. 3 shows a sample experimental arrangement of fruit items denoted by words. The
other experimental arrangements contained desk items, toys, animals, musical instruments, or tools. Across the six ex-
perimental arrangements, each of the six possible positions remained unfilled exactly once. There were two versions of
each arrangement, yielding word diagrams such as the one shown in Fig. 3 and structurally equivalent picture diagrams,
in which the words were replaced by simple color drawings (see Fig. 4). It is very important to note, however, that
during the study phase of the experiment, participants never saw the complete arrangements. Instead, for each arrange-
ment, four pairs of adjacent objects were shown to them, and they had to infer the complete spatial arrangement from
these four pairs. For half of the arrangements, all pairs contained words, for the rest, all pairs showed colored pictures.
1.3. Test materials
Directly after learning of each arrangement, 24 test items were presented to assess memory for the arrangement just
studied. Six test items each consisted of word pairs, picture pairs, complete word arrangements (Fig. 3), and complete
Fig. 4. A sample arrangement of objects denoted by pictures (picture diagram), equivalent to the word diagram shown in Fig. 3. Note: Complete
arrangements were used for testing only, they were not studied by the participants. Original pictures were in color.
531S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
picture arrangements (Fig. 4), irrespective of whether the participant had just studied pictures or words. Of the six
picture pairs shown, three were correct and three were incorrect. Of the six complete picture arrangements, one
was the correct one, and the others were incorrect. The same was true for word pairs and complete word arrangements,
respectively.
1.4. Working memory tests
The reading span test (RST) employed in this study is similar to the sentence span test introduced by Daneman and
Carpenter (1980). Unlike the sentence span test, however, the RST ensures that both the processing function and the
memory function of verbal working memory are tapped by the task. In the RST, single sentences are presented to the
participants. Each sentence is shown for 5 s, and the participant has to understand the sentence as well as memorize its
final word. The RST starts with what is called level 2 of difficulty: two sentences are presented successively, and the
participant first reads both sentences. Then he or she writes down the meanings and the final words on a sheet of paper.
For the meaning, a few keywords are sufficient, whereas the final word has to be recalled literally. The RST starts with
five trials at level 2 and gets increasingly more difficult, until five trials each at levels 2, 3, 4, 5, and 6 are completed (if
performance breaks down earlier, the test may be aborted). Each sentence for which the participant gets both the
meaning and the final word correct earns a point, yielding a maximum of 100 points in the RST.
The spatial span test (SST) followed the procedure introduced by Shah and Miyake (1996). It is formally equivalent
to the RST in that it assesses both processing and retention of information, in this case visuospatial information. In the
SST, participants receive sequences of mental rotation tasks. In its easiest version, a single letter is presented for 3 s.
The letter is either correct or mirror imaged, and it is rotated around its vertical axis by 0, 45, 90, 135, 180, 225, 270, or
315�. Within the 3-s limit, the participant has to indicate by pressing a key whether the letter is correctly printed, and
simultaneously memorize the rotation angle of the letter. For instance, if the correct letter F would be presented rotated
by 180�, the participant has to respond by pressing the ‘‘correct’’ key on the computer keyboard and by memorizing
that the ‘‘head’’ of the letter (the upper horizontal line of the F) points exactly downwards. This direction is then writ-
ten down on an answer sheet. At level 2 of the SST, two letters are presented successively, and the participant first
judges whether they are correctly printed and then writes down the directions of the letters’ heads. At levels 3, 4,
and 5, the number of letters presented on each trial is increased correspondingly. The test starts with level 2 and
gets increasingly more difficult. In the SST used here, each participant received a maximum of 20 trials (levels 2,
3, 4, and 5 five times each). Each time the participant responds correctly to the mirror question and indicated the
direction of this letter correctly, he or she receives a point, yielding a maximum of 70 points in the SST.
1.5. Procedure
The study consisted of two separate sessions on consecutive days. During the first session, the experiment was
conducted. The participants were informed that they would learn five spatial arrangements of objects (the practice
arrangement followed by four experimental ones selected randomly from the six existing ones). The practice arrange-
ment was identical for all participants, whereas the experimental arrangements differed according to a rotation scheme
which ensured that each arrangement was presented equally often across participants. As mentioned above, partici-
pants only saw four adjacent pairs of objects for each arrangement, and they had to infer the complete spatial arrange-
ment from these four pairs. The pairs contained either two words or two colored pictures, and each pair was presented
in the center of the computer screen for 3 s. Thus, the complete arrangement could not be inferred from the absolute
positions of objects on the screen. The order of the four pairs varied between arrangements, but for each arrangement,
the order was identical for all participants. The orders were chosen such that argument overlap between consecutive
pairs was maximized (e.g., for the sample arrangement shown in Figs. 3 and 4, the order was strawberrieseapple,
appleebanana, bananaepineapple, pineappleepear). Pilot tests had shown that other orders were too difficult for
many potential participants. For each of the five arrangements shown to each participant, the study phase was followed
directly by the test phase. In this phase, the 24 test items were presented in random order. For each test item, partic-
ipants had to indicate correctly and as quickly as possible whether the depicted spatial relations were correct. No feed-
back was given. After completion of the test phase, participants were free to either take a short break or continue with
the next arrangement. It took participants about 60 min to complete the experiment.
532 S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
During the second session on the next day, participants took the reading span test and the spatial span test, designed
to measure verbal working memory capacity and visuospatial working memory capacity, respectively. The order of the
two tests was counterbalanced across participants. It took participants about 45 minto complete both working mem-
ory tests. Afterwards, they were debriefed, thanked, and compensated for their participation.
1.6. Design
The experiment followed a mixed design with the within-subjects factors ‘study materials’ (pictures vs. words),
‘test materials’ (pictures vs. words), and ‘test complexity’ (object pairs vs. complete arrangements). ‘Verbal working
memory capacity’ and ‘visuospatial working memory capacity’ were used as between-subjects factors by creating
four groups (verbal capacity low or high and visuospatial capacity low or high, defined by median splits according
to the participants’ scores in the RST and the SST). The study materials were varied across arrangements, such
that each participant studied two arrangements of pictures and two arrangements of words. Test materials and test
complexity were varied within arrangements, such that each arrangement was tested with picture pairs, word pairs,
complete picture arrangements, and complete word arrangements. Reaction times to test probes in the test phase of
the experiment and error rates of these reactions were recorded as dependent variables.
2. Results
2.1. Preliminary analyses
Both the reading span test (RST) and the spatial span test (SST) yielded considerable variance in the participants’
scores, allowing for the separation of low vs. high-capacity participants (ranges 4e67 on the RST and 0e53 on the
SST). For the median splits, the limits were set between 22 and 23 for the RST, and between 9 and 10 for the SST,
in order to yield four groups of comparable sample size. The mean scores and sample sizes for each of the four groups
are shown in Table 1. RST scores and SST scores were only marginally correlated (r¼ .17, p¼ .09).
Reaction times (RTs) to test probes were subjected to analyses of variance (ANOVA) after outlier RTs, and RTs of
incorrect responses had been excluded from the analyses. Outliers were defined as the lower and upper 2% of the cor-
rect RTs in each group, respectively. The RTs were analyzed according to a mixed-factors 2� 2� 2� 2� 2-ANOVA
with the factors reading span group, spatial span group, study materials, test materials, and test complexity. Mean RTs
and standard deviations for this analysis are shown in Table 2. Corresponding analyses were computed for the error
rates depicted in Table 3. These analyses yielded very similar results, although as expected, the observed effects were
generally smaller for error rates than for RTs. Moreover, there was no indication of any speed-accuracy trade-offs in
the data. Therefore, we only report the results for RTs.
Verbal and visuospatial capacities yielded the expected main effects on RTs: participants with higher verbal capac-
ity responded more quickly than those with lower capacity [3182 vs. 3826 ms; F(1,92)¼ 13.99, p< .001], and the
same was true for participants with higher visuospatial capacity compared to those with lower capacity [3158 vs.
3913 ms; F(1,92)¼ 16.59, p< .001]. These beneficial effects were additive, as there was no interaction of verbal
and visuospatial capacity, F(1,92)< 1.
A marginally significant main effect of study materials on RTs in the test phase [F(1,92)¼ 3.72, p¼ .057]
suggested that studying pictures was easier than studying words. This was true for all participant groups, as there
was no interaction of study materials with either reading span [F(1,92)< 1] or spatial span [F(1,92)¼ 2.67, p¼ .106].
Table 1
Mean scores in the reading span test and the spatial span test for the four participant groups defined by median splits
Groups
Verbal low,
spatial low (n¼ 26)
Verbal low,
spatial high (n¼ 24)
Verbal high,
spatial low (n¼ 23)
Verbal high,
spatial high (n¼ 23)
Reading span test 13.5 (6.0) 17.4 (4.7) 33.6 (8.9) 35.9 (11.0)
Spatial span test 3.7 (3.0) 20.6 (9.9) 3.4 (2.9) 23.0 (10.4)
533S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
2.2. Hypothesis testing
The integration hypothesis predicts that complete arrangements should be more demanding to evaluate than pairs.
Actually, the main effect of test complexity was highly significant [F(1,92)¼ 94.20, p< .001]. Participants were
faster to judge object pairs than complete arrangements. As expected, this effect interacted with both verbal span
[F(1,92)¼ 11.80, p< .01] and spatial span [F(1,92)¼ 5.25, p< .05]: The disadvantage for complete arrangements
decreased with increasing verbal and visuospatial resources, and it almost vanished for participants who scored
high in both working memory tests (see Fig. 5). Consequently, these latter participants showed only a marginally sig-
nificant disadvantage for tests of complete arrangements [F(1,22)¼ 4.24, p¼ .051], whereas the other three groups
showed highly significant disadvantages [all F> 22; all p< .001].
The multiple source hypothesis predicts that evaluating word diagrams should be more demanding than picture
diagrams. Actually, participants responded more quickly to pictures than to words, yielding a significant main effect
of test materials [F(1,92)¼ 27.49, p< .001]. We also observed a marginally significant interaction of test materials
and verbal span [F(1,92)¼ 3.79, p¼ .055]. The disadvantage for words was smaller for participants with higher
Table 2
Mean RTs in ms the test phase (with standard deviations), broken down by reading span group, spatial span group, test complexity, study materials,
and test materials
Study materials and test materials
Study pictures Study words
Test pictures Test words Test pictures Test words
Verbal low, spatial low
Test object pairs 3012 (944) 3898 (1316) 3535 (1308) 3788 (1353)
Test arrangements 4111 (1934) 5270 (2370) 5184 (2452) 5234 (2408)
Verbal low, spatial high
Test object pairs 2546 (813) 3222 (1170) 2852 (806) 3224 (1269)
Test arrangements 3406 (1444) 4418 (2181) 4110 (1721) 3952 (1754)
Verbal high, spatial low
Test object pairs 2618 (1024) 3305 (1052) 3223 (1383) 3357 (1210)
Test arrangements 3593 (1627) 3912 (1333) 4164 (2087) 4040 (2256)
Verbal high, spatial high
Test object pairs 2472 (1005) 2825 (969) 2579 (954) 2783 (961)
Test arrangements 2791 (1125) 3418 (1715) 3039 (1659) 2790 (1455)
Table 3
Mean percent error in the test phase (with standard deviations), broken down by reading span group, spatial span group, test complexity, study
materials, and test materials
Study materials and test materials
Study pictures Study words
Test pictures Test words Test pictures Test words
Verbal low, spatial low
Test object pairs 5.9 (9.5) 5.0 (8.5) 10.1 (12.5) 9.5 (12.0)
Test arrangements 4.8 (8.1) 6.6 (8.8) 7.1 (8.1) 6.9 (7.9)
Verbal low, spatial high
Test object pairs 2.3 (5.9) 3.1 (6.6) 4.3 (8.4) 4.3 (8.6)
Test arrangements 2.4 (5.7) 4.3 (6.7) 4.2 (6.2) 3.0 (5.6)
Verbal high, spatial low
Test object pairs 2.0 (5.3) 2.5 (5.5) 6.0 (10.2) 4.9 (9.1)
Test arrangements 2.2 (5.7) 2.5 (7.0) 4.4 (7.6) 4.2 (5.8)
Verbal high, spatial high
Test object pairs 1.8 (5.8) 2.7 (5.8) 3.8 (7.8) 3.4 (7.8)
Test arrangements 1.3 (3.5) 1.1 (3.3) 2.5 (5.2) 3.1 (5.7)
534 S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
verbal working memory resources. However, no interaction of test materials and spatial span was observed
[F(1,92)< 1]. This pattern of interactions is illustrated by Fig. 6, and it was corroborated by post hoc tests: both groups
with low verbal span showed a highly significant disadvantage for words [both F> 8.7, both p< .01], whereas the two
groups with high verbal span did not [both F< 2.87, n.s.]. Finally, a study-test compatibility effect occurred: RTs were
shorter when study materials and test materials (pictures vs. words) were identical, yielding a significant interaction
[F(1,92)¼ 54.73, p< .001].
3. Discussion
The present experiment was designed to test hypotheses about the demands on working memory resources in in-
tegrating verbal and/or pictorial information into a coherent mental model. As predicted in the integration hypothesis,
evaluating complete spatial arrangements of words or pictures denotingobjects required more working memory
resources than evaluating pairs of words or pictures. Whereas a main effect with longer reaction times for complete
arrangements may be predicted from other theories of human memory as well, we also found that test complexity
interacted with both reading span and spatial span: integrating five objects into a single representation was most dif-
ficult for participants with lower verbal capacity and lower visuospatial capacity. This result corresponds specifically
to Schnotz and Bannert’s (2003) model of multimedia learning because the interaction between test complexity and
2500
3000
3500
4000
4500
5000
Low verbal,
Low spatial
Low verbal,
High spatial
High verbal,
Low spatial
High verbal,
High spatial
Verbal Span and Spatial Span
Complete Arrangements Object Pairs
Fig. 5. Mean reaction times in ms depending on test complexity, verbal span, and spatial span.
2500
3000
3500
4000
4500
5000
Low verbal,
Low spatial
Low verbal,
High spatial
High verbal,
Low spatial
High verbal,
High spatial
Verbal Span and Spatial Span
Pictures Words
Fig. 6. Mean reaction times in ms depending on test materials, verbal span, and spatial span.
535S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
working memory capacity held for the descriptive and depictive processing branch: study and test materials involved
words and pictures, and test complexity interacted with verbal and visuospatial working memory capacity. As far as
we know, this result is new, because it is based on a full combination of verbal vs. pictorial elements in the study and
the test phase. As assumed by Schnotz and Bannert (2003), learning descriptive and depictive external representations
of spatial arrangements of objects seem to require a stepwise and capacity-consuming construction process.
The results also provide evidence for the multiple source hypothesis. Participants were faster to judge diagrams
(pairs or arrangements) consisting of pictures rather than words. We argued that this effect is specific to Schnotz
and Bannert’s model, because processing word diagrams would require the coordination of the depictive and the de-
scriptive processing branch, whereas picture diagrams only tax the depictive branch. Even more corroborative is the
finding that the effect of test materials interacted with verbal span, but not with spatial span. This is quite plausible
because visuospatial resources are needed for both processing word diagrams and picture diagrams, whereas verbal
resources are needed only with word diagrams. Thus, low spatial resources will reduce performance with both mate-
rials, whereas low verbal resources can reduce performance only with word diagrams. This pattern fits the architecture
of the model proposed by Schnotz and Bannert better than the multimedia learning theory by Mayer and Moreno,
because the latter did not specify whether and how the auditory/verbal processing channel and the visual/pictorial
processing channel may interact.
In summary, the results reported here supply evidence for two hypotheses derived from the internal structure of
Schnotz and Bannert’s cognitive model of multimedia learning. Both the integration hypothesis and the multiple
source hypothesis were supported by main effects of experimental variations and by interactions of the experimental
factors with verbal and visuospatial working memory capacity. However, the present experiment also provides infor-
mation about the relation between Schnotz and Bannert’s model and the competing theory of multimedia learning
(Mayer & Moreno, 2002, 2003). While in the latter model the auditory/verbal processing channel and the visual/pic-
torial processing channel are defined by a combination of perceptual modality and representational format, the dis-
tinction of the descriptive and depictive processing branch in Schnotz and Bannert’s model is exclusively defined
by representational format. In the present experiment, modality was kept constant: verbal and visuospatial materials
were presented visually, in the study phase as well as in the test phase. As all effects are compatible with Schnotz and
Bannert’s model although only representational format but not modality was varied, Mayer and Moreno’s theory
seems to require some additional specifications with regard to the role of differences in modality.
The theories by Schnotz and Bannert, Mayer and Moreno, and many others are used to derive recommendations for
the design of multimedia learning systems e in experimental as well as real-world settings. One may argue that the
simple spatial arrangements employed here do not adequately reflect learning from texts and pictures in multimedia
environments because they are not complex enough. However, picture diagrams represent the central dimension, spa-
tiality, which nearly all sorts of graphical representations are based on, and word diagrams are prototypical instances
of a great class of graphical representations incorporating verbal labels or explanations, such as the ones used in
science, education, business communication, public information, and many other fields. Moreover, we believe that
it is exactly the simplicity of the spatial stimuli employed here which allows for a strict test of the model of text
and picture comprehension proposed by Schnotz and Bannert (2003). First, picture and word diagrams represented
exactly the same spatial scenes and differed only in the way the objects were denoted. Second, we were able to avoid
complications introduced by differences in prior knowledge between participants. Earlier research has shown that
large knowledge-dependent effects in mental model construction may occur during learning of spatial scenes (e.g.,
Dutke, 1993, 1996). Thus, simple experimental materials that nevertheless allow theoretically relevant manipulations
may enhance the power of empirical tests. The more rigorously such theories are empirically tested (and eventually
modified), the more reliably they are a basis for design decisions.
Beyond theory evaluation, the present work has some implications for research on educational and instructional
design. The first is related to Sweller’s (1994) concept of element interactivity. Material that can be understood
and learned only considering several elements simultaneously (high element interactivity) is hypothesized to contrib-
ute to intrinsic cognitive load. Usually, effects of the degree of element interactivity are investigated by varying the
learning materials (e.g., Pollock, Chandler, & Sweller, 2002). In the present experiment, element interactivity was not
varied in the learning materials, but in the way learning performance was tested. During the learning phase, only pairs
of words or pictures (isolated elements) were presented. However, during the learning test, participants evaluated el-
ements (pairs) and complete spatial arrangements. Even in this setting, the element interactivity effect was replicated:
testing complete arrangements (high element interactivity) required more working memory capacity than testing
536 S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
elements alone (low element interactivity). The practical conclusion is that high element interactivity (and thus, in-
creased load) can be established by the way the learning task is designed and by the way the learning test is designed.
Moreover, it was demonstrated that the element interactivity effect also held for pure picture diagrams involving no
other processes than identifying pictures of objects and relating them spatially in a way that represents all pair rela-
tions correctly. This corroborates the assumption that element interactivity is a quite basic feature of learning materials
to be attended to in designing learning tasks. This interpretation is also in line with working memory research iden-
tifying the coordination of formerly independent elements as a specific class of demands on working memory (Dutke,
2005; Hagendorf & Sá,1996; Halford, Wilson, & Philipps, 1998; Mayr & Kliegl, 1993; Oberauer, 1993; Oberauer,
Süß, Schulze, Wilhelm, & Wittmann, 2000).
Second, the result that evaluating word diagrams required more working memory resources than evaluating picture
diagrams is of high practical relevance. As mentioned above, word diagrams, that is, depictions of spatial relations
combined with verbal descriptions of the elements constituting such a diagram are frequently used forms of diagrams
(e.g., see Pollock et al., 2002). Based on the Schnotz and Bannert model, this effect can be interpreted as costs emerg-
ing from the coordination of the depictive and the descriptive processing branch. However, the present study was not
designed to maximize this effect because the presentation time of 3 s per pair was chosen to ensure sufficient learning
even with word diagrams, even for participants with lower working memory spans. Rather, long presentation time was
chosen here because the present study focused mainly on performance in the test phase of the experiment. Future stud-
ies should try to complement the present results by employing self-paced presentation times. According to the mul-
tiple source hypothesis, in this case, presentation times should be longer for word diagrams than for picture diagrams,
especially in participants with low reading span.
Third, with regard to research strategies, the present experiment demonstrated a fruitful way to test hypotheses in-
volving assumptions about the cognitive load of learning tasks. Particularly, in the context of the cognitive load theory
(Sweller, 1994) it has become more and more important to assess the load caused by differently designed learning tasks
or learning environments (e.g., Mayer & Moreno, 2003; Paas et al., 2003). Usually, subjective ratings of learners are
used as indicators of cognitive load (e.g., van Merriënboer, Schuurmann, de Crook, & Paas, 2002). As an alternative,
Brünken, Plass, and Leutner (2003) proposed a dual task technique. In the present experiment, however, a third strategy
was demonstrated: we measured individual differences in cognitive resources and investigated how this variation af-
fected learning performance, in conjunction with experimental manipulations of the learning task and the test situation.
Although this combined approach is costly in terms of number of participants, it has at least two advantages. Compared
to the subjective rating approach, assessing individual differences with evaluated instruments enhances the construct
validity and allows more precise conclusions about the type of load generated by a learning task. In the present study,
for example, we could differentiate between task demands tapping the verbal and the visuospatial working memory
capacity, respectively. Compared with the dual task approach, assessing individual differences does not encounter
problems arising from task-specific (and hard to predict) interactions between the primary and the secondary task.
Acknowledgements
We are grateful to Ulrich Herzberg, Saskia Schanz, and Karin Wolf for their invaluable help in preparing and con-
ducting the experiment. We would also like to thank two anonymous reviewers for helpful comments on an earlier
version of this article. This research was supported by grants Du 312/1-1 and Ri 600/9-1 from the German Research
Foundation (DFG) to the authors.
References
Baddeley, A. D. (1986). Working memory. Oxford, UK: Clarendon Press.
Baddeley, A. D. (1992). Is working memory working? The fifteenth Bartlett lecture. The Quarterly Journal of Experimental Psychology, 44A, 1e31.
Brünken, R., Plass, J. L., & Leutner, D. (2003). Direct measurement of cognitive load in multimedia learning. Educational Psychologist, 38,
53e61.
Daneman, M., & Carpenter, P. A. (1980). Individual differences in working memory. Journal of Verbal Learning and Verbal Behavior, 19,
450e466.
De Jonge, P., & De Jonge, P. F. (1996). Working memory, intelligence and reading ability in children. Personality and Individual Differences, 21,
1007e1020.
van Dijk, T. A., & Kintsch, W. (1983). Strategies of discourse comprehension. New York: Academic Press.
537S. Dutke, M. Rinck / Learning and Instruction 16 (2006) 526e537
Dutke, S. (1993). Mentale Modelle beim Erinnern sprachlich beschriebener räumlicher Anordnungen: Zur Interaktion von Gedächtnisschemata
und Textrepräsentation. [Mental models in remembering verbally described spatial arrangements: Towards the interaction of memory sche-
mata and text representation]. Zeitschrift für experimentelle und angewandte Psychologie, 40, 44e71.
Dutke, S. (1996). Generic and generative knowledge: memory schemata in the construction of mental models. In W. Battmann, & S. Dutke (Eds.),
Processes of the molar regulation of behavior (pp. 35e54). Lengerich: Pabst Science Publishers.
Dutke, S. (2005). Remembered duration: working memory and the reproduction of intervals. Perception & Psychophysics, 67, 1404e1422.
Engle, R. W., Cantor, J., & Carullo, J. (1992). Individual differences in working memory and comprehension: a test of four hypotheses. Journal of
Experimental Psychology: Learning, Memory and Cognition, 18, 972e992.
Friedman, N. P., & Miyake, A. (2000). Differential roles for visuospatial and verbal working memory in situation model construction. Journal of
Experimental Psychology: General, 129, 61e83.
Gernsbacher, M. A. (1990). Language comprehension as structure building. Hillsdale, NJ: Erlbaum.
Glenberg, A. M., Meyer, M., & Lindem, K. (1987). Mental models contribute to fore grounding during text comprehension. Journal of Memory
and Language, 26, 69e83.
Graesser, A. C., Millis, K. K., & Zwaan, R. A. (1997). Discourse comprehension. Annual Review of Psychology, 48, 163e189.
Graesser, A. C., Singer, M., & Trabasso, T. (1994). Constructing inferences during narrative text comprehension. Psychological Review, 101,
371e395.
Hacker, W., & Osterland, D. (1995). Mentale Koordinationskapazität: Einfluß von Text- und Arbeitsgedächtnismerkmalen auf das Verstehen von
Instruktionstexten. [Mental capacity for coordination: Effects of text features and working memory resources on the comprehension of instruc-
tional texts]. Zeitschrift für Experimentelle Psychologie, 42, 646e671.
Hagendorf, H., & Sá, B. (1996). Coordination in visual working memory. Psychological Research, 58, 294e306.
Halford, G. S., Wilson, W. H., & Philipps, S. (1998). Processing capacity defined by relational complexity: implications for comparative,
developmental, and cognitive psychology. Behavioral and Brain Sciences, 21, 803e864.
Johnson-Laird, P. N. (1983). Mental models. Cambridge, Great Britain: Cambridge University Press.
Johnson-Laird, P. N. (1996). Images, models, and propositional representations. In M. de Vega, M. J. Intons-Peterson, P. N. Johnson-Laird,
M. Denis, & M. Marschark (Eds.), Models of visuospatial cognition (pp. 90e127). Oxford: Oxford University Press.
Kintsch, W. (1998). Comprehension. New York: Cambridge University Press.
Mayer, R. E. (2003). The promise of multimedia learning: using the same instructional design methods across different media. Learning and
Instruction, 13, 125e139.
Mayer, R. E., & Moreno, R. (2002). Aids to computer-based multimedia learning. Learning and Instruction, 12, 107e119.
Mayer, R. E., & Moreno, R. (2003). Nine ways to reduce cognitive load in multimedia learning. Educational Psychologist, 38, 43e52.
Mayr, U., & Kliegl, R. (1993). Sequential and coordinative complexity: age-based processing limitations in figural transformations. Journal of
Experimental Psychology: Learning, Memory, and Cognition, 19, 1297e1320.
van Merriënboer, J. J. G., Schuurman, J. G., de Crook, M. B. M., & Paas, F. G. W. C. (2002). Redirecting learner’s attention during training: effects
on cognitive load, transfer test performance and training efficiency. Learning and Instruction, 12, 11e37.
Oberauer, K. (1993). Die Koordination kognitiver Operationen e eine Studie über die Beziehungzwischen Intelligenz und ‘‘working memory’’.
[Coordination of cognitive operations e a study on the relation of intelligence and working memory]. Zeitschrift für Psychologie, 201, 57e84.
Oberauer, K., Süß, H.-M., Schulze, R., Wilhelm, O., & Wittmann, W. W. (2000). Working memory capacity e facets of a cognitive ability con-
struct. Personality and Individual Differences, 29, 1017e1045.
Paas, F., Renkl, A., & Sweller, J. (2003). Cognitive load theory and instructional design: recent developments. Educational Psychologist, 38, 1e4.
Palladino, P., Cornoldi, C., De Beni, R., & Pazzaglia, F. (2001). Working memory and updating processes in reading comprehension. Memory and
Cognition, 29, 344e354.
Pollock, E., Chandler, P., & Sweller, J. (2002). Assimilating complex information. Learning and Instruction, 12, 61e86.
Schnotz, W., & Bannert, M. (1999). Einfüsse der Visualisierungsform auf die Konstruktion mentaler Modelle beim Text- und Bildverstehen. [In-
fluence of the type of visualization on the construction of mental models during picture and text comprehension]. Zeitschrift für Experimentelle
Psychologie, 46, 217e236.
Schnotz, W., & Bannert, M. (2003). Construction and interference in learning from multiple representation. Learning and Instruction, 13,
141e156.
Shah, P., & Miyake, A. (1996). The separability of working memory resources for spatial thinking and language processing: an individual
differences approach. Journal of Experimental Psychology: General, 125, 4e27.
Sweller, J. (1994). Cognitive load theory, learning difficulty, and instructional design. Learning and Instruction, 4, 295e312.
Zwaan, R. A., Langston, M. C., & Graesser, A. C. (1995). The construction of situation models in narrative comprehension: an event-indexing
model. Psychological Science, 6, 292e297.
Multimedia learning: Working memory and the learning of word and picture diagrams
Method
Participants
Study materials
Test materials
Working memory tests
Procedure
Design
Results
Preliminary analyses
Hypothesis testing
Discussion
Acknowledgements
References