Annual Conference 2022Research talks

Venue: Lecture Theatre 1, Diamond Building

Note: all research talks will be delivered in-person.

Development of a Benchmark Corpus to Support Entity Recognition in Job Descriptions

Author: Tom Green
Co-authors: Dr Diana Maynard, and Dr Chenghua LIn

Talk scheduled for: 12:30pm - 12:50pm on Tuesday 7 June

Abstract: We present the development of a benchmark suite consisting of an annotation schema, training corpus and baseline model for Entity Recognition (ER) in job descriptions, published under a Creative Commons license. This was created to address the distinct lack of resources available to the community for the extraction of salient entities, such as skills, from job descriptions. The dataset contains 18.6k entities comprising five types (Skill, Qualification, Experience, Occupation, and Domain). We include a benchmark CRF-based ER model which achieves an F_1 score of 0.59. Through the establishment of a standard definition of entities and training/testing corpus, the suite is designed as a foundation for future work on tasks such as the development of job recommender systems.

Speech and the Effects of the Comorbidity of Depression and Mild Cognitive Impairment

Author: Meg Thomas
Co-authors: Prof Heidi Christensen, Dr Traci Walker, and Dr Panos Georgiou

Talk scheduled for: 11:20am - 11:40am on Wednesday 8 June

Abstract: Mild cognitive impairment (MCI) is a precursor to Alzheimer's Disease and other cognitive deficiencies. The comorbidity of depression and MCI needs further investigation, although it is generally accepted that there is a link between the two. Often, depression goes untreated in people with MCI as their symptoms are missed, or thought to be caused by the MCI due to the overlap in symptoms of the two diseases. This results in a lower quality of life for patients. As with depression, there are certain vocal phenomena associated with MCI that may aid in the diagnosis and assessment of the disease. Changes to a person’s speech are commonly found to be present in both depression and MCI, with many experts noting that speech changes (such as problems remembering words, taking longer pauses whilst speaking, and generally talking more slowly) are often the first signs of cognitive impairment. This project aims to investigate these vocal phenomena. In particular speech disfluencies, which have successfully been used to aid in the automatic assessment and diagnosis of depression. Differences in speech disfluencies could hold the key to untangling the effects of depression in patients with MCI.

Representation Learning for Relational and Cross-Lingual Data

Author: Xutan Peng
Co-authors: Dr Mark Stevenson and Prof Aline Villavicencio

Talk scheduled for: 2:20pm - 2:40pm on Wednesday 8 June

Abstract: Utilising real-world relational signals (e.g., structured knowledge) and understanding different languages are two (of the many) fundamental goals of Artificial General Intelligence. The corresponding explorations, however, have been relatively separate. In this talk, I will introduce our recent research outputs on bridging this gap:

  • We discovered the unnoticed connections between embedding algorithms for entities and cross-lingual lexicons;

  • We theoretically and empirically justified that the success of cross-lingual encodings relies on the consistency of relational encodings;

  • Based on the above findings and insights, we accomplished complex tasks such as Cross-Lingual Knowledge Graph Alignment and Knowledge-Aware Cross-Lingual Language Model Pretraining.

Speech Analytics for Detecting Neurological Conditions in Global English

Author: Samuel Hollands
Co-authors: Prof Heidi Christensen and Dr Daniel Blackburn

Talk scheduled for: 11:40am - 12pm on Wednesday 8 June

Abstract: It has been discovered that both the content of an individual’s discourse and the acoustics of their produced speech can be automatically analysed to detect dementia and other neurological conditions. Existing research into the conventional cognitive tests used to detect dementia in clinics have demonstrated consistently detrimental efficacy from individuals of different language, educational, and socioeconomic backgrounds. Similarly, experiments evaluating ASR performance, as a modular component of a detection pipeline, on L1 and L2 English speakers show, generally, strong ASR performance in countries with English as their primary native language with much poorer performance in individuals from countries where English is a secondary language. This research aims to understand these inequalities and propose potential solutions to the following key issues. To explore the impact of different evaluation metrics on ASR and dementia classification performance. To understand and expand upon the impact of language background diversity on state-of-the-art approaches to automatic dementia detection through speech. To uncover and employ different methods for improving the functionality of speech technologies across language diverse groups. Finally, to understand the degree to which language agnostic methodologies can be used in an automatic environment to elicit detectable features of dementia.

Commonsense Reasoning from Multimodal Data

Author: Peter Vickers
Co-authors: Dr Emilio Monti, Dr Loïc Barrault, and Dr Nikolaos Aletras

Talk scheduled for: 2pm - 2:20pm on Wednesday 8 June

Abstract: Following the astonishing success of recent Machine Learning approaches to the NLP and Computer Vision domains, focus has increasingly turned to multimodal approaches. Such models, which interpret and reason over multiple input modalities, come closer the human experience of interpreting, interacting, and communicating in a multimodal manner. Furthermore, leading theories of cognition stress the importance of inter-modal coordination. Building on these theories and ML research, we study machine multimodal reasoning in an area long known to be problematic to AI: commonsense reasoning, the task of making ordinary, everyday judgements. Through the investigation of vision, language, and Knowledge Graphs (KG), we aim to quantify the contributions of these additional modalities to models performing commonsense reasoning tasks. We explicitly respond to the following questions: Can existing models for commonsense reasoning, which operate on synthetic data, be transferred to real world datasets and achieve acceptable accuracy? Can external knowledge bases improve the accuracy of state-of-the-art models for multimodal AI Commonsense tasks? To what extent can AI Commonsense models be made interpretable, allowing for their process of inference to be examined?

Deep Learning Approaches for Automatic Sung Speech Recognition

Author: Gerardo Roa Dabike
Co-authors: Prof Jon Barker

Talk scheduled for: 10:20am - 10:40am on Wednesday 8 June

Abstract: Automatic sung speech recognition is a challenge that remains largely unsolved. Challenges are due to the poor intelligibility of singing and the accompaniment masking. Deep neural network techniques have revolutionised speech recognition systems through acoustic modelling, and audio source separation advances. This work evaluated the adaptation of these techniques for sung speech recognition. First, the lack of a large and standardised sung speech dataset was addressed by employing the novel Smule's DAMP-MVP dataset. However, this weakly-labelled and weakly-annotated data presented many challenges. Solutions were presented in this work. Second, the problem of sung speech acoustic modelling was reconsidered. Musically-motivated features, like pitch, voicing degree, voice quality, and beat-based features, were considered. The pitch and voicing degree features are helpful for recognition performances. Third, accompanied sung speech recognition poses a challenging source separation problem. I investigated modern time-domain source separation networks and whether instrument embedding, motivated by speaker embedding, can be employed for music source separation. Finally, a system that combines the source separation and speech recognition components was jointly evaluated, dealing with the mismatch between the distorted sung speech originating from the separation network and the clean sung speech used for acoustic modelling.

Multi-Modal Acoustic-Articulatory Feature Fusion For Dysarthric Speech Recognition

Author: Zhengjun Yue
Co-authors: Dr Erfan Loweimi, Prof Zoran Cvetkovic, Prof Heidi Christensen, and Prof Jon Barker

TimeTalk scheduled forslot: 10:40am - 11am on Wednesday 8 June

Abstract: Building automatic speech recognition (ASR) systems for speakers with dysarthria is a very challenging task. Although multi-modal ASR has received increasing attention recently, incorporating real articulatory data with acoustic features has not been widely explored in the dysarthric speech community. This paper investigates the effectiveness of multi-modal acoustic modelling for dysarthric speech recognition using acoustic features along with articulatory information. The proposed multi-stream architectures consist of convolutional, recurrent and fully-connected layers allowing for bespoke per-stream pre-processing, fusion at the optimal level of abstraction and post-processing. We study the optimal fusion level/scheme as well as training dynamics in terms of cross-entropy and WER using the popular TORGO dysarthric speech database. Experimental results show that fusing the acoustic and articulatory features at the empirically found optimal level of abstraction achieves a remarkable performance gain, leading to up to 4.6% absolute (9.6% relative) WER reduction for speakers with dysarthria.

Improving Temporal Convolutional Networks for Reverberant Speech Processing

Author: William Ravenscroft
Co-authors: Dr Stefan Goetze and Prof Thomas Hain

Talk scheduled for: 10am - 10:20am on Wednesday 8 June

Abstract: Reverberation is a problem in many areas of speech processing due to interferences introduced into the speech signal. Temporal convolutional networks have demonstrated good performance at sequence modelling for a number of signal enhancement tasks including speech dereverberation. In this work temporal convolutional networks are analysed in terms of their inherent properties affect their performance, such as how their receptive field relates to their performance at speech dereverberation given a specific range of RT60 values. From this analysis further improvements, such as training the network to focus on specific parts of the receptive field, are proposed and demonstrated to lead to improved performance for monaural speech dereverberation.

Controlling Extra-Textual Attributes about Dialogue Participants: A Case Study of English-to-Polish Neural Machine Translation

Author: Sebastian Vincent
Co-authors: Dr Carolina Scarton and Dr Loïc Barrault

Talk scheduled for: 2:40pm - 3pm on Wednesday 8 June

Abstract: Unlike English, morphologically rich languages can reveal characteristics of speakers or their conversational partners, such as gender and number, via pronouns, morphological endings of words and syntax. When translating from English to such languages, a machine translation model needs to opt for a certain interpretation of textual context, which may lead to serious translation errors if extra-textual information is unavailable. We investigate this challenge in the English-to-Polish language direction. We focus on the under-researched problem of utilising external metadata in automatic translation of TV dialogue, proposing a case study where a wide range of approaches for controlling attributes in translation is employed in a multi-attribute scenario. The best model achieves an improvement of +5.81 chrF++/+6.03 BLEU, with other models achieving competitive performance. We additionally contribute a novel attribute-annotated dataset of Polish TV dialogue and a morphological analysis script used to evaluate attribute control in models.