Análisis de sentimientos en español en tuits relacionados con las enfermedades infecciosas

  1. Apolinario Arzube, Oscar Omar
Dirixida por:
  1. Rafael Valencia García Director

Universidade de defensa: Universidad de Murcia

Fecha de defensa: 11 de novembro de 2021

Tribunal:
  1. Juan Miguel Gómez-Berbís Presidente/a
  2. Francisco García Sánchez Secretario/a
  3. María del Pilar Salas-Zárate Vogal
Departamento:
  1. Informática y Sistemas

Tipo: Tese

Resumo

Linguistic engineering is the tool that, through computers, allows us to investigate or understand what is expressed in natural language on social networks. Natural language processing is an area of artificial intelligence focused on understanding and modeling the human brain. In this environment, opinion mining or sentiment analysis encompasses techniques of natural language processing, computational linguistics and text mining that aim to extract subjective information from content generated on social networks. From the study of the state of the art it is concluded that, although there are related studies on natural language processing in health, they are not enough to verify the new information classification techniques for the proposed corpus; that allow maximizing predictive models of sentiment analysis by both researchers and health professionals. The motivation of this study is to provide new resources for the analysis of feelings in medicine, the creation of two corpus; one for infectious diseases such as ZIKA and another for COVID-19; thus, applying the study of different technologies to see how feelings can be classified in these domains and expand the study of these same technologies for the detection of SATIRA. Objectives. The main objective of this doctoral thesis is the application of techniques for the classification of feelings in predictive models for the processing of corpus language in the domain of infectious diseases such as ZIKA and COVID-19; extending this same analysis on a corpus of the literary genre SATIRA to achieve a better precision in the prediction of feelings about what is expressed in social networks and the understanding of natural language. To achieve this objective, the following sub-objectives were proposed: • Obtaining a corpus on the domain of infectious diseases Zika, dengue and chikungunya. • Obtaining a corpus on the domain of Covid-19 infectious diseases. • Obtaining a corpus on the domain in the literary genre of satire. • Obtaining classification models for the prediction of feelings in each of the corpus. • Obtaining the best precision classifier in each of the corpus by type of classifiers. Methodology. This doctoral thesis was developed mainly through 3 phases: the first phase was the study of the state of the art that has been shown in this chapter; the second was the elaboration of classification and prediction methods and artifacts that allow the treatment of the proposed corpus; and the third the validation of the proposal. • Study of the state of the art: Study of the concepts and terms of artificial intelligence in the field of natural language, techniques of supervised and unsupervised machine learning, models for the prediction of sentiment and tools for text classification. In addition, the different investigations in the field of opinion mining on models and their applicability in different domains were analyzed. • Elaboration of laboratories to obtain models that make the prediction of feelings about the corpus in the context of this doctoral thesis; Confusion matrices were also developed to evaluate the real sentiment vs. the prediction obtained in each of the corpus. Results. Validation of the laboratories: With the detail of the hyper-parameters used and the result obtained from the predictions for each one of the executions carried out. Here we will find the results of the best model used to evaluate the validation and test partition. Each one has the classification report with the accuracy, precision, recall and f1 of each class.