Plataforma inteligente de diseño para todos para control de teléfonos móviles mediante habla en lenguaje natural

Vivancos Vicente, Pedro Jose

Plataforma inteligente de diseño para todos para control de teléfonos móviles mediante habla en lenguaje natural

Vivancos Vicente, Pedro Jose

unter der Leitung von:

Rafael Valencia García Doktorvater
Jesualdo Tomás Fernández Breis Doktorvater

Universität der Verteidigung: Universidad de Murcia

Fecha de defensa: 08 von Januar von 2016

Gericht:

Carlos Alberto Cruz Corona Präsident/in
Francisco Garcia Sanchez Sekretär/in
Dagoberto Castellanos Nieves Vocal

Fachbereiche:

Informática y Sistemas

Art: Dissertation

Teseo: 121554 DIALNET DIGITUM editor

Zusammenfassung

"OBJETIVOS El objetivo principal de esta tesis doctoral radica en el desarrollo de un sistema de reconocimiento de voz en lenguaje natural para la interacción con diversas aplicaciones de un terminal móvil. Los terminales móviles han mejorado mucho en los últimos años, siendo actualmente potentes dispositivos capaces de ofrecer diversas funcionalidades a los usuarios que antes estaban limitadas sólo a grandes ordenadores. Sin embargo, al ritmo que crece la potencia de estos terminales y sus aplicaciones, crece la complejidad de uso. Aunque se ha avanzado mucho en el desarrollo de interfaces para estos dispositivos gracias a las nuevas pantallas táctiles y los nuevos sistemas operativos para móviles (iOS, Android,...) estos dispositivos son complicados de usar en ciertos entornos (por ejemplo, mientras se va andando) o incluso prohibido en otros (por ejemplo, la manipulación del móvil mientras se conduce). Por lo tanto, el objetivo de esta tesis doctoral es el desarrollo de un sistema de reconocimiento de voz en lenguaje natural que permita al usuario poder controlar y manejar ciertos aspectos del móvil usando su propia voz, mediante comandos sencillos y en lenguaje natural, que es la forma de comunicación más común para las personas, haciendo además accesible el dispositivo a personas discapacitadas y personas mayores que les cueste interactuar con la tecnología. METODOLOGÍA Para cumplir los objetivos anteriormente descritos se realizaron las siguientes acciones: Análisis del estado del arte en Ontologías, Procesamiento del Lenguaje Natural, Reconocimiento de Voz, lo que implicó el estudio y análisis de los antecedentes de las tecnologías a incorporar en el trabajo. Definición y formalización de una arquitectura general para crear el la interfaz de diseño para todos basados en comandos de voz en lenguaje natural. Esta tarea se llevó a cabo mediante la definición e interconexión de módulos independientes que interactúan entre ellos para llevar a cabo el objetivo general del sistema. Los módulos definidos son: 1. El módulo de procesamiento de lenguaje natural tiene como objetivo principal el análisis del texto de los comandos de voz. 2. El módulo de interpretación de expresiones temporales utiliza estándar TIMEX2 para la representación de expresiones temporales. Este sistema es capaz de detectar, anotar y representar anotaciones temporales usando reglas. 3. Para el sistema de reconocimiento de voz se realizaron estudios que determinaron que la mejor tecnología de reconocimiento a emplear es un sistema independiente del hablante y de transcripción. De esta forma, el sistema procesa el audio obtenido sin conocer la identidad del hablante (no existe perfil de voz previo) y realiza la transcripción literal del audio, empleando para ellos diccionarios creados para esta tesis. RESULTADO Este trabajo comenzó a realizarse a lo largo del año 2010. Por aquellos entonces los smartphones de Apple y Google estaban comenzando a penetrar en el mercado y el líder indiscutible eran las Blackberry. Ahora, con el lanzamiento de Apple Siri, a principios de 2013, y Google Now un año después, los asistentes virtuales manejados por voz en los móviles se han convertido ya en algo común. Lógicamente, muchas de las decisiones tomadas en esta tesis doctoral se realizaron con un estado de la tecnología disponible que nada tiene que ver con la fotografía actual, donde cualquier dispositivo móvil de Microsoft, Apple o Android incorpora asistentes inteligentes. Sin embargo, como resultado de esta tesis doctoral se ha obtenido una interesante tecnología, muy modular, que puede ser fácilmente ampliada con nuevos comandos, y eso significa que es una tecnología que sirve de base no sólo para desarrollar asistentes específicos para teléfonos móviles, sino también otro tipo de servicios inteligentes como sistemas IVR para call centers que sean más “inteligentes”. GOALS The main goal of this dissertation is the development of a natural language speech recognition system capable of interacting with different and predefined mobile applications. Smartphones have greatly improved in recent years and becoming powerful devices which are able to offer a large range of apps (functionalities) to users that a few years ago were just available for common computers. However, as smartphones and their applications become more powerful, they become more complex to use as well. Although there has been a lot of improvement in new user interfaces (touch screens, new operative systems, and so on), these devices are difficult to use in certain circumstances (for example, while walking) or even prohibited in others (for example, using the smartphone while driving). Therefore, the objective of this PhD thesis is the development of a natural language, speech recognition based system that allows users to control and manage certain aspects of the mobile using user own voice, using simple and natural language voice commands, as the most common communication mechanism for people, and making the device accessible to disabled and elderly people who find it hard to interact with technology. METHODOLOGY To fulfill the objectives of this dissertation the following actions were taken: Analysis of the state of the art in ontologies, natural language processing and speech recognition, which involved a deep study and analysis of the technologies to be included in this dissertation. Definition and formalization of a general architecture to create a natural language speech recognition system design for all interface. This task is performed by the definition and interconnection of independent modules which interact with each other to accomplish the overall objective of the system. The modules defined are: 1. Natural language processing module, capable of processing voice commands text to get their information. 2. Time expressions module, which uses TIMEX2 standard for representing temporal expressions, is able to detect, record and represent temporal annotations using logical rules. 3. Speech recognition module. In that module, different tests were accomplished to determine the best recognition technology to be used. The one selected was a independent speaker transcription/dictation speech recognition engine. This module processes the audio obtained without knowing the identity of the speaker (not exist prior voice profile) and performs the transcription of the audio, using several language resources created for this dissertation. RESULT This work began throughout the year 2010. At that time Apple and Google smartphones had started to penetrate the market and the undisputed leader was Blackberry. Today, after Apple's Siri launch in early 2013, and Google Now a year later, virtual voice assistants for smartphones are well known and accessible for most users. Obviously, many of the decisions made in this thesis were performed using a state of the available technology which is rather different from the current situation, in which any smartphone from Microsoft, Apple or Android incorporates intelligent assistants. However, as a result of this thesis, a very modular framework for virtual voice assistant has been developed, with a great natural language module for processing time expressions, so this system can be easily extended with new commands and functionalities, which it has already been used to develop an intelligent assistant for schedule appointment in a IVR call center. call centers that are more ""intelligent"". "