[2009.06591] Understanding Gesture and Speech Multimodal Interactions for Manipulation Tasks in Augmented Reality Using Unconstrained Elicitation
This research establishes a better understanding of the syntax choices in
speech interactions and of how speech, gesture, and multimodal gesture and
speech interactions are produced by users in unconstrained object manipulation
environments using augmented reality. The work presents a multimodal
elicitation study conducted with 24 participants. The canonical referents for
translation, rotation, and scale were used along with some abstract referents
(create, destroy, and select). In this study time windows for gesture and
speech multimodal interactions are developed using the start and stop times of
gestures and speech as well as the stoke times for gestures. While gestures
commonly precede speech by 81 ms we find that the stroke of the gesture is
commonly within 10 ms of the start of speech. Indicating that the information
content of a gesture and its co-occurring speech are well aligned to each
other. Lastly, the trends across the most common proposals for each modality
are examined. Showing that the disagreement between proposals is often caused
by a variation of hand posture or syntax. Allowing us to present aliasing
recommendations to increase the percentage of users’ natural interactions
captured by future multimodal interactive systems.
This content was originally published here.