Overview of EmoSPeech at IberLEF 2024Multimodal Speech-text Emotion Recognition in Spanish

  1. Pan, Ronghao
  2. García-Díaz, José Antonio
  3. Rondríguez-García, Miguel Ángel
  4. García-Sánchez, Francisco
  5. Valencia-García, Rafael
Zeitschrift:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Datum der Publikation: 2024

Nummer: 73

Seiten: 359-368

Art: Artikel

Andere Publikationen in: Procesamiento del lenguaje natural

Zusammenfassung

Este articulo resume la tarea EmoSPeech 2024, organizada en el taller IberLEF 2024, dentro del marco de la 40ª Conferencia Internacional de la Sociedad Española de Procesamiento del Lenguaje Natural (SEPLN 2024). El objetivo de esta tarea es investigar el campo del Reconocimiento Automático de Emociones, que está adquiriendo cada vez más importancia debido a su impacto en diversos campos, como la sanidad, la psicología, las ciencias sociales y el marketing. En concreto, se proponen dos subtareas que se evalúan por separado. La primera subtarea se refiere al análisis de emociones a partir de texto, que se centra en la extracción de características y la identificación de las más representativas características de cada emoción en un conjunto de datos creado a partir de situaciones de la vida real. La segunda subtarea se centra en el análisis de emociones desde una perspectiva multimodal, lo que requiere la construcción de una arquitectura más compleja para resolver este problema de clasificación. La clasificación incluye los resultados de 13 equipos diferentes, cada uno de los cuales propuso un enfoque novedoso del problema.

Bibliographische Referenzen

  • Almela, A., P. Cantos-Gómez, D. G.-M. no, and G. Alcaraz-Mármol. 2024. LACELL at EmoSPeech-IberLEF2024: Combining Linguistic Features and Contextual Sentence Embeddings for Detecting Emotions from Audio Transcriptions. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Boyd, R. L., A. Ashokkumar, S. Seraj, and J. W. Pennebaker. 2022. The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin, 10:1–47.
  • Cañete, J., G. Chaperon, R. Fuentes, J. Ho, H. Kang, and J. Pérez. 2023. Spanish Pre-trained BERT Model and Evaluation Data. CoRR, abs/2308.02976.
  • Casals-Salvador, M., F. Costa, M. India, and J. Hernando. 2024. BSC-UPC at EmoSPeech-IberLEF2024: Attention Pooling for Emotion Recognition. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Cedeño-Moreno, D., M. Vargas-Lombardo, A. Delgado-Herrera, C. Caparrós-Láiz, and T. Bernal-Beltrán. 2024. UTP at EmoSPeech–IberLEF2024: Using Random Forest with FastText and Wav2Vec 2.0 for Emotion Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Chaves-Villota, A., A. Jimenez, and A. Bahillo. 2024. UAH-UVA at EmoSPeech-IberLEF2024: A Transfer Learning Approach for Emotion Recognition in Spanish Texts based on a Pre-trained DistilBERT Model. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Chenchah, F. and Z. Lachiri. 2016. Speech emotion recognition in noisy environment. In 2nd International Conference on Advanced Technologies for Signal and Image Processing, ATSIP 2016, Monastir, Tunisia, March 21-23, 2016, pages 788–792. IEEE.
  • Chiruzzo, L., S. M. Jiménez-Zafra, and F. Rangel. 2024. Overview of IberLEF 2024: Natural Language Processing Challenges for Spanish and other Iberian Languages. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEURWS.org.
  • Conneau, A., A. Baevski, R. Collobert, A. Mohamed, and M. Auli. 2021. Unsupervised Cross-Lingual Representation Learning for Speech Recognition. In H. Hermansky, H. Cernock´y, L. Burget, L. Lamel, O. Scharenborg, and P. Motl´ıcek, editors, 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30 - September 3, 2021, pages 2426–2430. ISCA.
  • Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised Crosslingual Representation Learning at Scale. In D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 8440–8451. Association for Computational Linguistics.
  • de la Rosa, J., E. G. Ponferrada, M. Romero, P. Villegas, P. G. de Prado Salas, and M. Grandury. 2022. BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling. Proces. del Leng. Natural, 68:13–23.
  • Ekman, P. 1992. Facial expressions of emotion: New findings, new questions. Psychological Science, 3(1):34–38.
  • Esteban-Romero, S., J. Bellver-Soler, I. Martín-Fernández, M. Gil-Martín, L. F. D’Haro, and F. Fernández-Martínez. 2024. THAU-UPM at EmoSPeech-IberLEF2024: Efficient Adaptation of Mono-modal and Multi-modal Large Language Models for Automatic Speech Emotion Recognition. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Fahad, M. S., A. Ranjan, J. Yadav, and A. Deepak. 2021. A survey of speech emotion recognition in natural environment. Digit. Signal Process., 110:102951.
  • García-Baena, D., M. A. García-Cumbreras, and S. M. Jiménez-Zafra. 2024. SINAI at EmoSPeech-IberLEF2024: Evaluating Popular Tools and Transformers Models for Multimodal Speech-Text Emotion Recognition in Spanish. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Gladun, A., J. Rogushina, and R. Martínez-Béjar. 2024. UKR at Emo-SPeech–IberLEF2024: Using Fine-tuning with BERT and MFCC Features for Emotion Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Hu, E. J., Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  • Lagos-Ortiz, K., J. Medina-Moreira, and O. Apolinario-Arzube. 2024. UAE at EmoSPeech–IberLEF2024: Integrating Text and Audio Features with SVM for Emotion Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs/1907.11692.
  • Lugovic, S., I. Dunder, and M. Horvat. 2016. Techniques and applications of emotion recognition in speech. In P. Biljanovic, Z. Butkovic, K. Skala, T. G. Grbac, M. Cicin-Sain, V. Sruk, S. Ribaric, S. Gros, B. Vrdoljak, M. Mauher, E. Tijan, and D. Lukman, editors, 39th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2016, Opatija, Croatia, May 30 - June 3, 2016, pages 1278–1283. IEEE.
  • Martinez-Romo, J., J. F. Huesca-Barril, L. Araujo, and E. de La Cal Marin. 2024. UNED-UNIOVI at EmoSPeech-IberLEF2024: Emotion Identification in Spanish by Combining Multimodal Textual Analysis and Machine Learning Methods. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Mohammad, S. M. and F. Bravo-Marquez. 2017. WASSA-2017 Shared Task on Emotion Intensity. In A. Balahur, S. M. Mohammad, and E. van der Goot, editors, Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSAEMNLP 2017, Copenhagen, Denmark, September 8, 2017, pages 34–49. Association for Computational Linguistics.
  • Nguyen, N., X. Vu, C. Rigaud, L. Jiang, and J. Burie. 2021. ICDAR 2021 Competition on Multimodal Emotion Recognition on Comics Scenes. In J. Llad´os, D. Lopresti, and S. Uchida, editors, 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part IV, volume 12824 of Lecture Notes in Computer Science, pages 767–782. Springer.
  • Pan, R., J. A. García-Díaz, M. ´A. Rodríguez-García, and R. Valencia-García. 2024. Spanish MEACorpus 2023: A multimodal speech-text corpus for emotion analysis in Spanish from natural environments. Computer Standards & Interfaces, page 103856.
  • Paredes-Valverde, M. A. and M. d. P. Salas-Zárate. 2024. Team ITST at EmoSPeech-IberLEF2024: Multimodal Speech-text Emotion Recognition in Spanish Forum. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), collocated with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Pérez, J. M., D. A. Furman, L. A. Alemany, and F. M. Luque. 2022. RoBERTuito: a pre-trained language model for social media text in Spanish. In N. Calzolari, F. B´echet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, and S. Piperidis, editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022, pages 7235–7243. European Language Resources Association.
  • Plaza del Arco, F. M., S. M. Jiménez-Zafra, A. Montejo-Ráez, M. D. Molina-González, L. A. Ureña López, and M. T. Martín-Valdivia. 2021. Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021. Proces. del Leng. Natural, 67:155–161.
  • Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108.
  • Soto, M., C. Macias, M. Cardoso-Moreno, T. Alcántara, O. García, and H. Calvo. 2024. CogniCIC at EmoSPeech-IberLEF2024: Exploring Multimodal Emotion Recognition in Spanish: Deep Learning Approaches for Speech-Text Analysis. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Varghese, A. A., J. P. Cherian, and J. J. Kizhakkethottam. 2015. Overview on emotion recognition system. In 2015 international conference on soft-computing and networks security (ICSNS), pages 1–5. IEEE.
  • Villegas, M. 2023. MarIA: Spanish Language Models. In A. P. Rocha, L. Steels, and H. J. van den Herik, editors, Proceedings of the 15th International Conference on Agents and Artificial Intelligence, ICAART 2023, Volume 1, Lisbon, Portugal, February 22-24, 2023, page 9. SCITEPRESS.
  • Zhang, D., Y. Yu, C. Li, J. Dong, D. Su, C. Chu, and D. Yu. 2024. MM-LLMs: Recent Advances in MultiModal Large Language Models. CoRR, abs/2401.13601.
  • Zheng, T. F., G. Zhang, and Z. Song. 2001. Comparison of Different Implementations of MFCC. J. Comput. Sci. Technol., 16(6):582–589.