Overview of EmoSPeech at IberLEF 2024Multimodal Speech-text Emotion Recognition in Spanish

  1. Pan, Ronghao
  2. García-Díaz, José Antonio
  3. Rondríguez-García, Miguel Ángel
  4. García-Sánchez, Francisco
  5. Valencia-García, Rafael
Journal:
Procesamiento del lenguaje natural

ISSN: 1135-5948

Year of publication: 2024

Issue: 73

Pages: 359-368

Type: Article

More publications in: Procesamiento del lenguaje natural

Abstract

This paper presents the EmoSPeech 2024 shared task, which was organized in the IberLEF 2024 workshop within the framework of the 40th International Conference of the Spanish Society for Natural Language Processing (SEPLN 2024). The objective of this shared task is to study the field of Automatic Emotion Recognition (AER), which is becoming increasingly important due to its impact on various fields, such as healthcare, psychology, social sciences, and marketing. Specifically, two tasks are proposed and evaluated separately. The first task deals with AER from text, which focusing on feature extraction and identifying the most representative feature of each emotion in a dataset created from real-life situations. The second task deals with AER from a multimodal perspective, which requires the construction of a more complex architecture to solve this classification problem. The ranking includes the results of 13 different teams, each of which proposed a novel approach to the problem.

Bibliographic References

  • Almela, A., P. Cantos-Gómez, D. G.-M. no, and G. Alcaraz-Mármol. 2024. LACELL at EmoSPeech-IberLEF2024: Combining Linguistic Features and Contextual Sentence Embeddings for Detecting Emotions from Audio Transcriptions. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Boyd, R. L., A. Ashokkumar, S. Seraj, and J. W. Pennebaker. 2022. The development and psychometric properties of LIWC-22. Austin, TX: University of Texas at Austin, 10:1–47.
  • Cañete, J., G. Chaperon, R. Fuentes, J. Ho, H. Kang, and J. Pérez. 2023. Spanish Pre-trained BERT Model and Evaluation Data. CoRR, abs/2308.02976.
  • Casals-Salvador, M., F. Costa, M. India, and J. Hernando. 2024. BSC-UPC at EmoSPeech-IberLEF2024: Attention Pooling for Emotion Recognition. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Cedeño-Moreno, D., M. Vargas-Lombardo, A. Delgado-Herrera, C. Caparrós-Láiz, and T. Bernal-Beltrán. 2024. UTP at EmoSPeech–IberLEF2024: Using Random Forest with FastText and Wav2Vec 2.0 for Emotion Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Chaves-Villota, A., A. Jimenez, and A. Bahillo. 2024. UAH-UVA at EmoSPeech-IberLEF2024: A Transfer Learning Approach for Emotion Recognition in Spanish Texts based on a Pre-trained DistilBERT Model. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Chenchah, F. and Z. Lachiri. 2016. Speech emotion recognition in noisy environment. In 2nd International Conference on Advanced Technologies for Signal and Image Processing, ATSIP 2016, Monastir, Tunisia, March 21-23, 2016, pages 788–792. IEEE.
  • Chiruzzo, L., S. M. Jiménez-Zafra, and F. Rangel. 2024. Overview of IberLEF 2024: Natural Language Processing Challenges for Spanish and other Iberian Languages. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEURWS.org.
  • Conneau, A., A. Baevski, R. Collobert, A. Mohamed, and M. Auli. 2021. Unsupervised Cross-Lingual Representation Learning for Speech Recognition. In H. Hermansky, H. Cernock´y, L. Burget, L. Lamel, O. Scharenborg, and P. Motl´ıcek, editors, 22nd Annual Conference of the International Speech Communication Association, Interspeech 2021, Brno, Czechia, August 30 - September 3, 2021, pages 2426–2430. ISCA.
  • Conneau, A., K. Khandelwal, N. Goyal, V. Chaudhary, G. Wenzek, F. Guzmán, E. Grave, M. Ott, L. Zettlemoyer, and V. Stoyanov. 2020. Unsupervised Crosslingual Representation Learning at Scale. In D. Jurafsky, J. Chai, N. Schluter, and J. R. Tetreault, editors, Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020, pages 8440–8451. Association for Computational Linguistics.
  • de la Rosa, J., E. G. Ponferrada, M. Romero, P. Villegas, P. G. de Prado Salas, and M. Grandury. 2022. BERTIN: Efficient Pre-Training of a Spanish Language Model using Perplexity Sampling. Proces. del Leng. Natural, 68:13–23.
  • Ekman, P. 1992. Facial expressions of emotion: New findings, new questions. Psychological Science, 3(1):34–38.
  • Esteban-Romero, S., J. Bellver-Soler, I. Martín-Fernández, M. Gil-Martín, L. F. D’Haro, and F. Fernández-Martínez. 2024. THAU-UPM at EmoSPeech-IberLEF2024: Efficient Adaptation of Mono-modal and Multi-modal Large Language Models for Automatic Speech Emotion Recognition. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Fahad, M. S., A. Ranjan, J. Yadav, and A. Deepak. 2021. A survey of speech emotion recognition in natural environment. Digit. Signal Process., 110:102951.
  • García-Baena, D., M. A. García-Cumbreras, and S. M. Jiménez-Zafra. 2024. SINAI at EmoSPeech-IberLEF2024: Evaluating Popular Tools and Transformers Models for Multimodal Speech-Text Emotion Recognition in Spanish. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Gladun, A., J. Rogushina, and R. Martínez-Béjar. 2024. UKR at Emo-SPeech–IberLEF2024: Using Fine-tuning with BERT and MFCC Features for Emotion Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Hu, E. J., Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen. 2022. LoRA: Low-Rank Adaptation of Large Language Models. In The Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29, 2022. OpenReview.net.
  • Lagos-Ortiz, K., J. Medina-Moreira, and O. Apolinario-Arzube. 2024. UAE at EmoSPeech–IberLEF2024: Integrating Text and Audio Features with SVM for Emotion Detection. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Liu, Y., M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V. Stoyanov. 2019. RoBERTa: A Robustly Optimized BERT Pretraining Approach. CoRR, abs/1907.11692.
  • Lugovic, S., I. Dunder, and M. Horvat. 2016. Techniques and applications of emotion recognition in speech. In P. Biljanovic, Z. Butkovic, K. Skala, T. G. Grbac, M. Cicin-Sain, V. Sruk, S. Ribaric, S. Gros, B. Vrdoljak, M. Mauher, E. Tijan, and D. Lukman, editors, 39th International Convention on Information and Communication Technology, Electronics and Microelectronics, MIPRO 2016, Opatija, Croatia, May 30 - June 3, 2016, pages 1278–1283. IEEE.
  • Martinez-Romo, J., J. F. Huesca-Barril, L. Araujo, and E. de La Cal Marin. 2024. UNED-UNIOVI at EmoSPeech-IberLEF2024: Emotion Identification in Spanish by Combining Multimodal Textual Analysis and Machine Learning Methods. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Mohammad, S. M. and F. Bravo-Marquez. 2017. WASSA-2017 Shared Task on Emotion Intensity. In A. Balahur, S. M. Mohammad, and E. van der Goot, editors, Proceedings of the 8th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, WASSAEMNLP 2017, Copenhagen, Denmark, September 8, 2017, pages 34–49. Association for Computational Linguistics.
  • Nguyen, N., X. Vu, C. Rigaud, L. Jiang, and J. Burie. 2021. ICDAR 2021 Competition on Multimodal Emotion Recognition on Comics Scenes. In J. Llad´os, D. Lopresti, and S. Uchida, editors, 16th International Conference on Document Analysis and Recognition, ICDAR 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings, Part IV, volume 12824 of Lecture Notes in Computer Science, pages 767–782. Springer.
  • Pan, R., J. A. García-Díaz, M. ´A. Rodríguez-García, and R. Valencia-García. 2024. Spanish MEACorpus 2023: A multimodal speech-text corpus for emotion analysis in Spanish from natural environments. Computer Standards & Interfaces, page 103856.
  • Paredes-Valverde, M. A. and M. d. P. Salas-Zárate. 2024. Team ITST at EmoSPeech-IberLEF2024: Multimodal Speech-text Emotion Recognition in Spanish Forum. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), collocated with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Pérez, J. M., D. A. Furman, L. A. Alemany, and F. M. Luque. 2022. RoBERTuito: a pre-trained language model for social media text in Spanish. In N. Calzolari, F. B´echet, P. Blache, K. Choukri, C. Cieri, T. Declerck, S. Goggi, H. Isahara, B. Maegaard, J. Mariani, H. Mazo, J. Odijk, and S. Piperidis, editors, Proceedings of the Thirteenth Language Resources and Evaluation Conference, LREC 2022, Marseille, France, 20-25 June 2022, pages 7235–7243. European Language Resources Association.
  • Plaza del Arco, F. M., S. M. Jiménez-Zafra, A. Montejo-Ráez, M. D. Molina-González, L. A. Ureña López, and M. T. Martín-Valdivia. 2021. Overview of the EmoEvalEs task on emotion detection for Spanish at IberLEF 2021. Proces. del Leng. Natural, 67:155–161.
  • Sanh, V., L. Debut, J. Chaumond, and T. Wolf. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR, abs/1910.01108.
  • Soto, M., C. Macias, M. Cardoso-Moreno, T. Alcántara, O. García, and H. Calvo. 2024. CogniCIC at EmoSPeech-IberLEF2024: Exploring Multimodal Emotion Recognition in Spanish: Deep Learning Approaches for Speech-Text Analysis. In Proceedings of the Iberian Languages Evaluation Forum (IberLEF 2024), co-located with the 40th Conference of the Spanish Society for Natural Language Processing (SEPLN 2024), CEUR-WS.org.
  • Varghese, A. A., J. P. Cherian, and J. J. Kizhakkethottam. 2015. Overview on emotion recognition system. In 2015 international conference on soft-computing and networks security (ICSNS), pages 1–5. IEEE.
  • Villegas, M. 2023. MarIA: Spanish Language Models. In A. P. Rocha, L. Steels, and H. J. van den Herik, editors, Proceedings of the 15th International Conference on Agents and Artificial Intelligence, ICAART 2023, Volume 1, Lisbon, Portugal, February 22-24, 2023, page 9. SCITEPRESS.
  • Zhang, D., Y. Yu, C. Li, J. Dong, D. Su, C. Chu, and D. Yu. 2024. MM-LLMs: Recent Advances in MultiModal Large Language Models. CoRR, abs/2401.13601.
  • Zheng, T. F., G. Zhang, and Z. Song. 2001. Comparison of Different Implementations of MFCC. J. Comput. Sci. Technol., 16(6):582–589.