Spatiotemporal coupling between speech and manual motor actions (Parrell, Goldstein, Lee & Byrd, 2014) – HIBAR

Another summary and HIBAR post on our most recent lab group reading.

Speech and motor activity co-ordination has long been observed and has been empirically demonstrated across the lifespan, from co-occurrence of babbling and rhythmic limb movement (e.g. Iverson, Hall, Nickel & Wozniak, 2007) to adult co-production of gestures with speech to a level of high temporal resolution. For example, gestures such as pointing tend to align exactly at the point of stressed syllable in a word (Rouchet-Capellan, Laboissiere, Galvan, & Schwarz, 2008).

The importance of motor coupling. Donald showing us how it's done.

The utility of speech and gesture coupling. Donald showing us how it’s done.

Parrell et al. (2014) aimed to probe a bit further the role of prosodic structure in the spatiotemporal coordination of speech and gesture.


The experimenters had four participants tap their right finger on their left shoulder, while synchronously repeating a monosyllable. Midway through a 30s trial, they were required to impose a stress either on the finger tapping movement or in the production of the word, but maintain an unaffected performance in the other domain.  This was achieved by asking participants to watch a clock dial and impose the stress as it reached one of the quarter markers.

The kinematics being monitored were of lip aperture (LA) and fingertip movement (FT) and these were tracked by transducers attached to the articulators.  In order to address spatial effects, the magnitude of the emphasised repetition and mean magnitude of unemphasised trial repetitions were compared. Magnitude in LA and FT were characterised as the wideness/height (amplitude) of the aperture or tap.
Inter-response interval (IRI) was used to measure temporal effects of the emphasis and this was calculated as the time between the onsets of concurrent articulator repetitions.

Spatial Effects

While participants had been instructed to maintain unstressed movement in the uninstructed domain, three subjects showed increased magnitude in articulator repetitions concurrent with an emphasis in the other articulator.  There was also found to be a significant correlation between general movement magnitudes of both articulators (i.e. during unstressed repetitions).

Temporal Effects

A lengthening of IRIs were found near the stress boundary for both LA and TP regardless of stress domain, but a tendency was found for the lengthening in the unstressed domain to be delayed by one repetition.  Similarly to the spatial effects, a general correlation was found between temporal effects in both articulators.  Consistent with previous literature, they also found more robust effects of speech emphasis on finger tapping than vice versa, suggesting a closer coupling in this direction.

Theoretical Implications

The authors venture some possible interpretations of the data, including the π-gesture model of speech prosody. This proposes that all motor activity is controlled by a single internal clock recruited in accordance with prosodic structure. While comparable IRI-lengthening within and across modalities might indicate a common clock, a single driver does not account for the observed delay in the unstressed modality. In addition, the asymmetrical effects of stress across domain indicate that simple coupling the speech and finger articulators is an insufficient explanation.

The alternative hypothesis is that emphasis acted as a perturbation to the coordinative dynamics of the articulators. The delayed IRI in the unstressed domain would reflect a restoration of relative phase between articulators. Prosody is suggested to act as a means of grouping certain information and making it salient, therefore it is possible that this process recruits wider bodily resources than speech apparatus to achieve this. This hypothesis would also account for the asymmetry of stress domain effect, as the prosodic structure of language is stress in the speech domain. This then calls on a wider set of motor resources which form part of a larger prosodic architecture, where stress in the finger tapping domain will not do the reverse (1).

There are some points we would have raised, had we been reviewers of this paper:

1. We were unsure about why the sample size (four) for this study was so small, especially as there was a large amount of individual variation in performance and also that the observed effects were relatively small. This makes it harder to interpret “majority” (3/4) effects such as the augmented cross-modal effect for speech stress (see paper for a number of these instances)

2. The authors note that for each condition (stressed finger tap/syllable) they presented two blocks with the syllables /ma/ and /mop/. The reason cited was to investigate effects of syllable coda (the optional final part of a syllable) on the amplitude and timing of articulator movement. This was not elaborated on and it is unclear to us what justified including this factor, as no previous literature was cited (2).

3. While the authors aimed to reproduce a more natural prosodic context than previous studies (3), it could be argued that timing a stress according to an external stimulus (point on a clock dial) may simply be unnaturalistic in another manner. While the emphasis is ostensibly quasi-linguistic (i.e. a specific point in a speech string), it is possible that explicitly constraining the placement of the stress could have dynamical consequences on the preceding and following movements that would not be present in a natural language activity. The authors cite some previous literature for using this type of stress, but I currently don’t have access to these references. It is unclear whether participants in these cases were volitionally choosing which gesture to stress or whether they also had to time this to a clock

(1) General coupling principles do explain the domain-general effects in which a smaller amplitude change is seen in the unstressed domain and also the correlation between domains of spatial and temporal effects during unstressed repetitions
(2) N.B.The authors found no effect of presence of coda on spatial or temporal effects
(3) These typically employed alternating stressed-unstressed patterns which imposed a rhythm, something which the current authors were keen to avoid


Iverson, J. M., Hall, A. J., Nickel, L., & Wozniak, R. H. (2007). The relationship between reduplicated babble onset and laterality biases in infant rhythmic arm movements. Brain and language, 101(3), 198-207.

Parrell, B., Goldstein, L., Lee, S., & Byrd, D. (2014). Spatiotemporal coupling between speech and manual motor actions. Journal of phonetics, 42, 1-11.

Rochet-Capellan, A., Laboissière, R., Galván, A., & Schwartz, J. L. (2008). The speech focus position effect on jaw–finger coordination in a pointing task. Journal of Speech, Language, and Hearing Research, 51(6), 1507-1521.


Olmstead, Viswanathan, Aicher & Fowler (2009)

Fellow Leeds Met PhD student Liam Cross and I have collaborated this week to review a paper investigating a simulation theory approach to language comprehension.  This theory comes under the umbrella term of embodiment, but is distinct from RECS and will be revisited in future posts.

Liam conducts research on the role of synchrony in social behaviour and his blog can be found at:

HIBAR – Sentence comprehension affects the dynamics of bimanual co-ordination: Implications for embodied cognition

Simulation theory (Barsalou, 1999) is a subset of embodied approaches that attempts to reconcile embodiment within a representational framework.  This account seeks to ground high-level cognition in sensorimotor representations, in an attempt to overcome issues of how purely symbolic or abstract representations might be instantiated in the brain.  Within the domain of language, it predicts that mental simulation of the appropriate sensorimotor representation is integral to the process of comprehension.  This would predict interaction of activities which draw on the same representations, in either an interfering or facilitatory manner (e.g. Glenberg & Kaschak, 2002).  For example, motions congruent with an action specified in a statement should facilitate comprehension, while incongruent motions should interfere.


Olmstead et al. (2009) used a pre-established bilateral rhythmic co-ordination task (Kugler & Turvey, 1987) involving well-understood behavioural measures to investigate how language comprehension of performable sentences interferes with the behavioural organisation.

ᶲ = relative phase, V = potential

ᶲ = relative phase, V = potential

The task involved bimanual movements, where participants were required to swing two pendulums from both wrists.  This was tested when participants were swinging the pendulums in-phase (00) and anti-phase (1800).  Without interference, people show stable relative phase during these particular movements, but 00 is more stable.  1800 tends to see more variability in performance around its attractor location; this variability is the standard deviation of relative phase (SDRP).

While performing this task, participants were asked to read sentences on a screen and indicate verbally whether they were plausible or implausible sentences.  While plausibility was varied, the variable under investigation was that of performability vs. inanimacy of the sentences.  Performability sentences implied movements concerning the hands/fingers/arms.  Differences in the dynamic of the movement were then compared to a baseline, where participants engaged in a swinging only condition.

The researchers also had a detuned condition, in which the preferred frequencies of the limbs were manipulated by giving participants pendulums of different natural frequencies.  Detuning has the effect of shifting the attractors away from 00 and 1800, captured by “relative phase shift” (RPS), and also has the effect of increasing SDRP.

Under simulation theory, it would be predicted that movements would differ in the performable sentences condition compared to the inanimate sentences condition and the baseline (swinging only).  Since performable sentence comprehension and the movement task should be using overlapping neural resources with respect to the limb involved, simulation theory would expect a change in SDRP, but not a shift in RPS.  Sentence comprehension is presumed to cause intermittent interruption of the continuous neural oscillators active in the control of the movement task (Grossberg, Pribe, & Cohen, 1997).  In other words, performable sentence comprehension would act as a perturbation to the system and thus increase variability (SDRP).


Performability of sentences did not interact with detuning or required phase for either RPS or SDRP leading the researchers to aggregate the detuned and non-detuned conditions, comparing the single and dual-task conditions.

Further analysis revealed a main effect of comprehension task on RPS, but none on SDRP.  Judging performable sentences was found to affect relative phase (RP) of the co-ordination, where inanimate sentences saw no shift from baseline RP.  There was no effect of task on SDRP; in other words, the attractor shape did not change, it merely shifted significantly from baseline when judging performable sentences, but not when judging inanimate sentences.

The authors conclude that the results are inconsistent with predictions made by simulation theory as it currently stands, since an unexpected shift in attractor occurred and no increase in SDRP was observed.

While this is indeed an interesting effect, there are some questions and issues we would have raised, had we been reviewers of this paper:

Nature of Landscape Shift:

The authors plot task against difference scores between single/dual task conditions on grounds of clarity.  We argue that the figure is in fact misleading, especially coupled with its use of graph lines between conditions; these give the impression of RPS directionality i.e. that the significant shift in RP for performable sentences followed the n.s. rightward deviation in the inanimate condition.  We feel it odd that the authors did not explicitly mention in the results that the significant shift was actually a leftward shift from the baseline.

graphWithout access to the raw data for the detuned and non-detuned conditions at baseline and during sentence comprehension, it is also unclear whether this can be characterised as a leftward shift or one of facilitation towards the desired phase, as the raw RPS scores reported are aggregates of the two.  Because of this, the aggregated baseline RPS lies at -8.660, not 00, inanimate sentence comprehension lies at -9.07 and performable sentence comprehension at -4.78.  The leftward shift is briefly mentioned in the discussion, but could have benefitted from further explication in the results.  The authors report that right-handed participants appeared to be the culprit for the left-leading behaviour in the performable condition, but this is not substantiated by any formal analysis.

Temporal Resolution of Movement Analysis

The authors indicate that they chose to measure the global effects of the sentence types (averaging across all performable sentences vs. inanimate sentences), rather than adopt an event-based design, but give no explanation as to why this was opted for, whether for practicality’s sake or otherwise.  The current design does not allow analysis of potentially interesting data about the course of the movement change over time during comprehension.

An event-based design would allow observation of when movement changes occur and how these directly link on a moment-to-moment basis to presentation of the stimulus. Here, sentences are not considered as discrete events, rather some sort of continuous activity averaged over three sentence instances.  Temporal resolution could be further facilitated through auditory stimuli or eye-tracking for visual stimuli, as this means movement data can be accurately time stamped with relation to the information present in the environment at any one time.  It is curious that the lack temporal resolution was not mentioned, given the time spent in the discussion considering what sort of influence transient perturbations (comprehension of performable sentences) would have on the continuous control task under a simulation account.

Control of Facilitatory/Competition Effects

It would be helpful to investigate sentence comprehension that involves the same effector when comprehending both congruent or incongruent actions; there are key predictions in simulation theory about how this should affect neural resource-sharing.  Despite it being unclear why the shift took the form it did in the performable sentences condition, it would have been interesting to see whether actions predicted to facilitate or compete for resources under simulation theory would show qualitatively distinct shifts in behaviour.

The authors present this as a paradigm to be utilised in the study of embodied cognition.  While we applaud the use of a well-defined movement task with clear behavioural measures, we feel that the design could be smoothed out for future use.


Barsalou, L. W. (1999). Perceptual symbol systems. Behavioral and brain sciences, 22(04), 577-660.

Grossberg, S., Pribe, C., & Cohen, M. (1997). Neural control of interlimb oscillations: 1. Human bimanual coordination. Biological cybernetics, 77, 131–140.

Kugler, P. N., & Turvey, M. T. (1987). Information, natural law, and the self-assembly of rhythmic movement. Lawrence Erlbaum Associates, Inc.
Olmstead, A. J., Viswanathan, N., Aicher, K. A., & Fowler, C. A. (2009). Sentence comprehension affects the dynamics of bimanual coordination: Implications for embodied cognition. The Quarterly Journal of Experimental Psychology, 62(12), 2409-2417.