Agrupamiento y descubrimiento de subgrupos para fenotipado de pacientes

López Martínez-Carrasco, Antonio

Agrupamiento y descubrimiento de subgrupos para fenotipado de pacientes

López Martínez-Carrasco, Antonio

Supervised by:

Manuel Campos Martínez Director
José Manuel Juarez Herrero Director

Defence university: Universidad de Murcia

Fecha de defensa: 30 January 2024

Committee:

Daniel Ruiz Fernández Chair
Raquel Martínez España Secretary
Gregor Stiglic Committee member

Department:

Computer Science and Systems Engineering

Type: Thesis

Teseo: 833692 DIALNET DIGITUM editor

Abstract

According to principal healthcare organisations, antimicrobial resistance (AMR) is a significant danger to human health worldwide, and is a critical issue in the medical field. AMR occurs when microorganisms become resistant to antimicrobial treatments, making the latter unable to combat infections effectively. Some of the principal causes of AMR are the inappropriate use of antimicrobials and the transfer of resistant microorganisms between humans, animals or the environment. This means that, despite the use of antimicrobial drugs to treat patients infected with resistant microorganisms, their excessive use and inadequate regulation promote the spread of these resistant microorganisms. On the one hand, from the health and hospital point of view, it is essential to have resources, tools and procedures with which to monitor, detect and control possible cases of AMR, in addition to eradicating all potential threats to both patients and the rest of society. On the other hand, many efforts have been made in the clinical research field to address the AMR problem and to mitigate the effects and problems that it causes. In this context, finding sets of patients with interesting characteristics has become a core issue. This task is denominated as the patient phenotyping process, and these patient characteristics are denominated as phenotypes. Machine Learning (ML) is a promising area in the field of computer science, since it provides a mechanism with which to research and develop new solutions when confronting certain problems such as that described in this work. More precisely, ML can be used for the automatic generation of patient phenotypes. The hypothesis of this PhD thesis is that clustering and subgroup discovery (SD), which are two ML techniques, are effective as regards supporting the patient phenotyping process in the clinical context of antibiotic resistance. We hypothesize that refined and adapted versions of such techniques can generate phenotypes that are helpful and understandable for clinicians. In order to prove this hypothesis, we therefore establish the following objectives: (1) the use of clustering or SD as the basis on which to propose ML techniques for phenotyping whose results would be useful for clinical experts and easy for them to understand; (2) the generation of patient phenotypes by designing a new unsupervised ML technique based on clustering; (3) the identification of patient phenotypes by proposing a new methodology that would allow clinical experts to become involved in the process; (4) the extraction of phenotypes by creating a new and efficient SD algorithm; (5) the definition of patient phenotypes by proposing the new problem of mining diverse top-k subgroup lists; (6) the facilitation of the use of all the SD algorithms developed in this research, along with others already existing in literature, by developing a public, accessible and open-source Python library, and (7) a guarantee of the reproducibility of the research by extracting and using clinical data related to the antibiotic resistance problem from a public repository. Finally, the main conclusions of this PhD thesis in relation to the objectives proposed are that: (1) the new ML techniques created in this work can be successfully applied to the antibiotic resistance problem and their results are easy for clinicians to interpret; (2) the Trace-based clustering technique generates patient phenotypes; (3) the new 5-step methodology provides a straightforward guide with which to identify and rank patient phenotypes, and allows clinical experts to be involved in the discovery process; (4) the VLSD algorithm can be used either to directly extract patient phenotypes or as part of other phenotyping techniques; (5) the new problem of mining diverse top-k subgroup lists provides a new approach for patient phenotyping; (6) the `subgroups' library can be easily accessed, since it is available on GitHub and PyPI and can be used by data scientists, ML researchers and end-users for tasks such as phenotyping, and (7) the MIMIC-III database is an excellent data source that provides rich data concerning the antibiotic resistance problem, helps researchers in this field, and ensures the reproducibility of research.