Automated Co-Speech Gesture Recognition using Gemini 2.0 Flash and 2.5 Preview

Talk given at Radboud University in Nijmegen, Netherlands for the International Society of Gesture Studies 10th Conference. July 10th, 2025.

Presentation

Link to the github repository for this project

References

Arnheim, R., & McNeill, D. (1994). Hand and mind: What gestures reveal about thought. Leonardo, 27(4), 358. https://doi.org/10.2307/1576015

Bressen, Jana. (2016). Overview of Notation Conventions For Notation of Form In Gestures. Retrieved November 4, 2024, from http://www.janabressem.de/wp- content/uploads/2016/10/Bressem_notational-system-overview_final.pdf

Bulcaen, C. (1995). Rethinking Context: Language as an Interactive Phenomenon by Alessandro Duranti & Charles Goodwin (eds), 1992, Cambridge University Press, Cambridge, (Studies in the Social and Cultural Foundations of Language II), pp. 363, ISBN 0 521 42288 4. Language and Literature, 4(1), 61–64. https://doi.org/10.1177/096394709500400105

Cao, Z., Hidalgo, G., Simon, T., Wei, S.-E., & Sheikh, Y. (2018). OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields. ArXiv.org. https://arxiv.org/abs/1812.08008#

Carston, R. (1999). Herbert H. Clark, Using language. Cambridge: Cambridge University Press,

  1. Pp. xi+432. Journal of Linguistics, 35(1), 167–222. https://doi.org/10.1017/s0022226798217361

Creider, C. (1994). Hand and Mind: What Gestures Reveal about Thought. Hand and Mind: What Gestures Reveal about Thought. Journal of Linguistic Anthropology, 4(1), 81–82. https://doi.org/10.1525/jlin.1994.4.1.81

ELAN (Version 6.8) [Computer software]. (2024). Nijmegen: Max Planck Institute for Psycholinguistics, The Language Archive. Retrieved from https://archive.mpi.nl/tla/elan”

Goldin-Meadow, S. (2003). Hearing Gesture: How Our Hands Help Us Think. Harvard University Press. https://www.jstor.org/stable/j.ctv1w9m9ds

Gullberg, M., & De Bot, K. (2010) Gestures in Language Development. John Benjamins Publishing Company. https://doi.org/10.1075/bct.28

Hegde, S., Prajwal, K. R., Kwon, T., & Zisserman, A. (2025). Understanding Co-speech Gestures in-the-wild. Arxiv. https://arxiv.org/pdf/2503.22668

Kendon, A. (1980). Gesticulation and speech: Two aspects of the process of utterance. In M. R. Key (Ed.), The Relationship of Verbal and Nonverbal Communication (pp. 207-228). Berlin: De Gruyter. https://doi.org/10.1515/9783110813098.207

McNeill, D. (1992). Hand and Mind: What Gestures Reveal About Thought. University of Chicago Press. https://psycnet.apa.org/record/1992-98214-000

Mittelberg, I. (2018). Gestures as image schemas and force gestalts: A dynamic systems approach augmented with motion-capture data analyses. Cognitive Semiotics, 11(1). https://doi.org/10.1515/cogsem-2018-0002

Steen, F. F., Hougaard, A., Joo, J., Olza, I., Cánovas, C. P., Pleshakova, A., Ray, S., Uhrig, P., Valenzuela, J., Woźny, J., & Turner, M. (2018). Toward an infrastructure for data-driven multimodal communication research. Linguistics Vanguard, 4(1). https://doi.org/10.1515/lingvan-2017-0041

Turchyn, S., Olza Moreno, I., Pagán Cánovas, C., Steen, F., Turner, M., Valenzuela, J., & Ray, S. (2018). Gesture Annotation With a Visual Search Engine for Multimodal Communication Research. Proceedings of the AAAI Conference on Artificial Intelligence, 32(1). https://doi.org/10.1609/aaai.v32i1.11421

Turner, M. B., & Steen, F. F. (2012). Multimodal Construction Grammar. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.2168035

Slides

slide1 slide1 slide1 slide1 slide1 slide1 slide1 slide1 slide1 slide1


<
Previous Post
Gemini API (gemini-2.5-flash-preview-05-20) for Co-Speech Gesture Annotation
>
Next Post
Project Hiatus until 8/10/2025