El big data en los estudios del lenguaje

  1. Valenzuela, Javier 1
  1. 1 Universidad de Murcia
    info
    Universidad de Murcia

    Murcia, España

    ROR https://ror.org/03p3aeb86

    Geographic location of the organization Universidad de Murcia
Journal:
Estudios de Lingüística del Español (ELiEs)

ISSN: 1139-8736

Year of publication: 2022

Issue Title: Metodologías lingüísticas: de los datos empíricos a la teoría del lenguaje

Issue: 45

Pages: 241-260

Type: Article

DOI: 10.36950/ELIES.2022.45.8857 DIALNET GOOGLE SCHOLAR lock_openOpen access editor

More publications in: Estudios de Lingüística del Español (ELiEs)

Abstract

This paper examines the possibilities that big data-based approaches offer to language research. In a nutshell, the term “big data” makes reference to the massive amount of data that users generate in their digital interactions and whose great volume and heterogeneous nature typically requires a specialized treatment. The chapter starts by reviewing the main characteristics of big data, and then focuses on the possible problems arising from the use of big data in linguistic analysis. The following section offers a review of specific studies that apply this big-data approach to the study of multimodality: an approach to language study that includes not only the verbal component but also multimodal aspects such as gestures or intonation. The paper concludes with a review of the advantages and problems of using this type of data.

Bibliographic References

  • Alcaraz Carrión, Daniel; Valenzuela, Javier. 2021. Distant time, distant gesture: speech and gesture correlate to express temporal distance. Semiotica 241. DOI: 10.1515/sem-2019-0120
  • Álvarez García, Esther. 2022. Lo que esconden tus ojos: la metodología eye-tracking aplicada al estudio del lenguaje. Estudios de Lingüística del Español 45: 205-239.
  • Atkins, Sue; Clear, Jeremy; Ostler, Nicholas. 1992. Corpus design criteria. Literary and Linguistic Computing 7.1: 1-16.
  • Biber, Douglas. 1993. Representativeness in Corpus Design. Literary and Linguistic Computing 8.4: 243-257.
  • Boersma, Paul; Weenink, David. 2021. Praat: doing phonetics by computer [Computer program]. Version 6.1.50.
  • Brunner, Marie-Louise; Diemer, Stefan. 2018. “You are struggling forwards, and you don’t know, and then you … you do code-switching…” – Code-switching in ELF Skype conversations. Journal of English as a Lingua Franca 7.1: 59-88.
  • Cao, Zhe; Hidalgo, Gines; Simon, Tomas; Wei, Shih-En; Sheikh, Yaser. 2021. OpenPose: Realtime multi-person 2D pose estimation using part affinity fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 43 .1: 172-186.
  • García-Miguel, José M. 2022. Lingüística de corpus: de los datos textuales a la teoría lingüística. Estudios de Lingüística del Español 45: 11-42:
  • Hardie, Andrew. 2010. Big data in language studies: from cargo-cult science to phantom revolution. Conferencia plenaria en el 7 Congreso de AELINCO 2015, Universidad de Valladolid.
  • Keevallik, Leelo; Ogden, Richard. 2020. Sounds on the Margins of Language at the Heart of Interaction. Research on Language and Social Interaction 53.1: 1-18. DOI: 10.1080/08351813.2020.1712961
  • Knight, Dawn. 2010. The future of multimodal corpora. Revista Brasileira de Linguística Aplicada 11.2: 391-415.
  • Krishnamurthy, Ramesh. 2001. Size Matters: creating Dictionaries from the World’s Largest Corpus. 8th Annual KOTESOL Conference Proceedings. Taegu: KOTESOL: 169-180.
  • Igoa, José Manuel. Las tareas conductuales en la investigación sobre el procesamiento del lenguaje. Estudios de Lingüística del Español 45: 133-158.
  • Leech, Geoffrey. 1991. The state of the art in corpus linguistics. En K. Aijmer y B. Altenberg, eds. English Corpus Linguistics, Londres: Longman, pp. 8-29.
  • Olza, Inés; Valenzuela, Javier; Pagán-Cánovas, Cristobal. 2017. Automatic visual analysis and gesture recognition: Two preliminary pilots. Universidad de Navarra: Instituto Cultura Sociedad.
  • Pagán Cánovas Cristóbal; Valenzuela Javier; Alcaraz Carrión Daniel; Olza Inés; Ramscar Michael. 2020. Quantifying the speech-gesture relation with massive multimodal datasets: Informativity in time expressions. PLOS ONE 15.6: e0233892.
  • Rumelhart, David E.; McClelland, James L.; PDP Research Group. 1986. Parallel distributed processing: Explorations in the microstructure of cognition. Vol. 1. Cambridge: MIT Press.
  • Sinclair, John. 1991. Corpus, Concordance, Collocation. Oxford: Oxford University Press.
  • Tognini-Bonelli, Elena. 2001. Corpus Linguistics at Work. Amsterdam: Benjamins.
  • Turchyn Sergiy; Olza Moreno, Inés; Pagán Cánovas, Cristóbal; Steen, Francis F; Turner Mark; Valenzuela, Javier; Ray, Soumya. 2018. Gesture Annotation with a Visual Search Engine for Multimodal Communication Research. En The Thirtieth AAAI Conference on Innovative Applications of Artificial Intelligence (IAAI-18) [Internet]. 2018. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16703/16398
  • Valenzuela, Javier; Pagán-Cánovas, Cristóbal; Olza, Inés; Alcaraz, Daniel. 2020. Gesturing in the wild: spontaneous gestures co-occurring with temporal demarcative expressions provide evidence for a flexible mental timeline. Review of Cognitive Linguistics 18.2: 289-316. DOI: 10.1075/rcl.00061.val