Alumni

Our alumni

Tom Green

PhD Thesis: Using NLP to Resolve Mismatches Between Jobseekers and Positions in Recruitment

Supervisor: Dr Diana Maynard

Industry partner: Tribepad

Cohort: 1

Thesis examiners: Dr Mark Stevenson (University of Sheffield) and Dr Rudy Arthur (University of Exeter)

Viva date: 29 November 2023

Current employer: Department for Work and Pensions, Government of the UK

Biography

Tom joined the CDT with a background in Psychology and Product Management. Keen to apply NLP techniques to live settings, Tom collaborated with recruitment software provider TribePad to investigate job recommendation and candidate ranking algorithms. He was supervised by Diana Maynard and Chenghua Lin.

PhD Summary

Recruiting through online portals has seen a dramatic increase in recent decades and it is challenging for jobseekers to evaluate the overwhelming amount of data to efficiently identify positions that align with their skills and qualifications. This research addresses this issue by investigating automatic approaches that leverage recent developments in Natural Language Processing (NLP) that search, parse, and evaluate the often unstructured data in order to find appropriate matches.

Impact

This research informs the early stages of the recruitment process when candidates are searching for jobs to submit an application for that align with their particular skills and experience. This research improves upon simplistic methods that are commonly used, by leveraging state-of-the-art NLP techniques that are able to learn from past application outcomes, and considering the semantic content of salient entities in candidate profiles and job descriptions. Jobseekers will benefit from this research in terms of a more appropriate ranking of the available jobs which will reduce the amount of data to review to find appropriate matches, and recruiting agents will benefit from this research in terms of more appropriate ranking of candidates which will reduce the amount of data to review to find appropriate matches.

Links

Will Ravenscroft

PhD Thesis: Speech Separation in Noisy Reverberant Acoustic Environments

Supervisor: Professor Thomas Hain

Industry partner: 3M Health Information Systems

Cohort: 1

Thesis examiners: Dr Yoshi Gotoh (University of Sheffield) and Prof Patrick Naylor (Imperial College London)

Viva date: 2 May 2024

Current employer: ConnexAI (previously Bose)

Biography

I got my start in academia in electronic engineering and mathematics. I also had a keen interest in audio signal processing. I came to Sheffield to dive deeper into the machine learning of audio processing with respect to speech. My thesis was focused on speech enhancement and separation technologies with a view to multi speaker speech recognition.

PhD Summary

Speech separation has seen significant advancements in recent years due to advanced deep learning techniques. However speech and reverberation still degrade the performance of these models. In this thesis, some inherent assumptions about the design of these models is challenged and a number new techniques for designing and training these models are proposed. These techniques result in improved model generalization, robustness and computational efficiency.

Impact

This thesis demonstrated the massive redundancies in some common approaches to speech separation research. It showed that performance of lightweight models can by improved by more intelligent analysis instead of massively increasing computational requirements. It also showed the same from the opposite end where large models contained significant redundancies because researchers had as yet failed to challenge preconceived notions about other model architectures.

The main benefit here is in the reduced training requirements for these models. Firstly, this is better for the environment which is better for society as a whole. Secondly, it demonstrates to researchers and practitioners they can save time and cost if they design their models more intelligently and take the time to challenge preconceived notions that have dominated this research field for many years now.

Links

Meg Thomas

PhD Thesis: A multidisciplinary investigation of conversation and disfluencies in cognitive decline

Supervisors: Dr Traci Walker and Professor Heidi Christensen

Industry partner: Apple

Cohort: 1

Thesis Examiners: Dr Stuart Cunningham (University of Sheffield) and Dr Leendert Plug (University of Leeds)

Viva date: 13 June 2024

Current employer: Forensic Voice Centre, York, UK

Cohort 1 Student Representative

Biography

Dr Megan Thomas is a forensic speech and audio analyst at the Forensic Voice Centre. She recently completed her PhD in speech and language technology from the University of Sheffield where she was investigating the viability of using speech disfluencies as an early indicator of cognitive decline. Before this, she completed an MSc in Forensic Speech Science at the University of York, and a BSc in Arabic and Linguistics from the University of Westminster.

PhD Summary

This thesis examined cognitive decline, with a focus on neurodegenerative dementias such as Alzheimer’s disease, which often present with language impairments. Speech was explored as a non-invasive biomarker, with disfluencies like pauses offering insights into disease progression. The study investigated the diagnostic utility of disfluency analysis and task-specific speech elicitation. Advances in machine learning have enabled automatic cognitive decline classification systems, but challenges in generalisation and transparency persist. By integrating disfluency features, this research enhances the accuracy of these systems. Additionally, conversational analysis was explored to distinguish dementias from mild cognitive impairment.

Impact

This PhD allowed me to explore my interests in speech and language technology, and introduced me to the key concepts of machine learning and artificial intelligence. The CDT gave me opportunities to work with industry leaders such as Apple and Zoo Digital, which gave me invaluable insight into work outside of academia.

Links

Peter Vickers

PhD Thesis: Navigating Multimodal Complexity: Advances in Model Design, Dataset Creation, and Evaluation Techniques

Supervisor: Professor Nikos Aletras and Dr Loïc Barrault

Industry partner: Amazon

Cohort: 1

Thesis Examiners: Dr Mark Stevenson (University of Sheffield) and Prof Yulan He (King’s College London)

Viva date: 7 June 2024

Current employer: AI Solutions Hub, Northeastern University, USA

Biography

Peter Vickers is a Computer Scientist specialising in Natural Language Processing and Multimodal AI. He completed his Ph.D. at The University of Sheffield, focusing on augmenting language models with multimodal information. His research enhances models' capabilities for tasks like Visual Question Answering and Text-to-Image Retrieval. Peter's Ph.D. was dual-funded by the UKRI Centre for Doctoral Training in Speech and Language Technologies and an Amazon Studentship Grant.

Currently, Peter works as a Data Scientist at the AI Solutions Hub, Northeastern University, applying cutting-edge Machine Learning research to business problems. He has participated in JSALT summer workshops and interned as an Applied Scientist at Amazon UK. Peter holds an MSc in Computer Science with Speech and NLP from the University of Sheffield and a BA in English Language and Literature from Magdalen College, Oxford.

Outside of his academic pursuits, Peter is an avid backcountry skier, with experience in Norway and the High Arctic. He lead the 2024 UK Stauning Alps expedition to Eastern Greenland.

PhD Summary

My thesis explores how artificial intelligence systems can integrate information from diverse data types, ranging from highly structured knowledge graphs to unstructured images and text. We investigate this through three tasks: visual question answering, eye-tracking prediction, and citation recommendation. Our research focuses on developing novel multimodal AI models, creating diverse and diagnostic datasets, and improving evaluation metrics for complex classification tasks. We introduce the concept of "knowledge density" to categorize different data modalities and examine how models perform when combining information across this spectrum. Our work aims to advance multimodal AI systems' ability to reason with heterogeneous data sources for real-world applications.

Impact

The impact of this PhD project could be significant in several ways:

Improving AI assistants and search engines: By integrating knowledge from diverse sources more effectively, AI systems could provide more accurate and contextually relevant information to users. This could lead to improved virtual assistants, more intelligent search engines, and better information retrieval systems that the general public interacts with daily.
Enhancing accessibility technologies: The research on Visual Question Answering could contribute to developing more advanced systems for visually impaired individuals, allowing them to interact with their environment more effectively through AI-powered image description and question-answering tools.
Advancing educational technologies: The insights from eye-tracking prediction could be applied to create more effective e-learning platforms that adapt to individual reading patterns and comprehension levels, potentially revolutionising personalised education.
Boosting scientific research efficiency: The work on citation prediction could lead to improved literature recommendation systems for researchers, potentially accelerating scientific discovery by helping scholars find relevant work more easily.

Links

Danae Sánchez Villegas

PhD Thesis: Beyond Words: Analyzing Social Media with Text and Images

Supervisor: Professor Nikos Aletras

Industry supporter: Emotech

Cohort: 1

Thesis Examiners: Dr Carolina Scarton (University of Sheffield) and Prof Andreas Vlachos (University of Cambridge)

Viva date: 23 October 2023

Current employer: University of Copenhagen, Denmark

Links

Sebastian Vincent

PhD Thesis: Context-Based Personalisation in Neural Machine Translation of Dialogue

Supervisor: Dr Carolina Scarton

Industry partner: ZOO Digital

Cohort: 1

Thesis Examiners: Prof Nikos Aletras (University of Sheffield) and Dr Alexandra Birch-Mayne (University of Edinburgh)

Viva date: 28 November 2023

Current employer: Cohere (previously ZOO Digital)

Biography

I was born in Poland, moved to study in the UK in 2016. I have a BSc degree in Computer Science and AI. My PhD thesis is entitled "Context-Based Personalisation in Neural Machine Translation of Dialogue”. After the PhD I became an AI Research Scientist at ZOO Digital.

PhD Summary

My thesis explores personalisation of neural machine translation through the use of extra-textual information such as character and production metadata, in the domain of scripted dialogue. It puts forward a novel framework for working with such information and described an evaluation scheme for capturing how specific the queried translation (human or machine sourced) are to the provided context.

Impact

Through publications at reputable MT and NLP venues such as EAMT and ACL people learned about my work. My work has already been cited several times, and in some cases directly inspired applications built by other researchers. My published models and datasets have been downloaded hundreds and thousands of times, suggesting that many readers deeply engage with my work. My thorough investigation into the problem of extra-textual context in translation and progress made may have a positive impact on any industry or academic lab working with this kind of data.
ZOO Digital, my industrial partner who I am currently employed by, has expressed interest in my continuation of the project in my Research Scientist role in 2024. We will be working towards implementing an improved contextual machine translation system to help professional translators when working with scripted content.
My work was disseminated at multiple venues within and outside UK and has attracted interest of students and academics who enjoyed learning about my ideas. I believe my work could have inspired them in their own academic or professional endeavours.

Links

Guanyu Huang

PhD Thesis: My Fair Robot: Shaping a Mismatched Conversational Partner via Affordance Design

Supervisor: Professor Roger K Moore

Cohort: 2

Thesis Examiners: Dr Chaona Chen (University of Sheffield) and Dr Mary Ellen Foster (University of Glasgow)

Viva date: 19 December 2024

Viva passed with minor corrections

Biography

My journey as a researcher has been guided by a fascination with communication, particularly in the areas of human-robot interaction, sociality and inclusivity. I have been fortunate to work on designing affordances for social robots that promote meaningful, human-centred interactions. My background in applied linguistics, with a focus on sociolinguistics and second language acquisition, has also allowed me to explore how second language teaching (SLT) methodologies can enhance language acquisition research.

Much of my perspective has been shaped by my experiences as a foreign language user and teacher. These experiences taught me how easily interactions between mismatched partners can falter, and inspired me to seek ways to bridge these gaps. Drawing on insights from different disciplines, I aim to contribute to user-centred, value-driven and evidence-based approaches that improve communication and foster better understanding - whether between humans or between humans and robots.

I am particularly passionate about exploring human-centred, technology-driven solutions that address real-world challenges in communication and interaction. By focusing on the needs and values of individuals, I strive to develop technologies that not only bridge gaps, but also promote inclusivity and understanding in meaningful ways.

PhD Summary

My thesis examines how the design of a social robot's appearance, sound and behaviour - its affordances - shapes human interaction. It investigates communication mismatches, anthropomorphic features, and proposes 'honest affordance design' to align robot capabilities with user expectations. Results show that while human likeness increases likeability, it is not the most influential factor; users prefer robots that embrace their non-human identity. Contextual adaptability in affordance design is crucial to meet expectations of warmth and competence in different social roles. This research provides theoretical, experimental and computational insights for transparent and consistent robot design to enhance user experience.

Impact

The results of my PhD research have implications for improving the design and use of social robots in various domains. By addressing communication mismatches and proposing 'honest affordance design', my work contributes to the creation of robots that align their external signals with their internal capabilities. This alignment increases user trust, comfort, and engagement, ensuring that robots are better equipped to meet human expectations in diverse contexts.

My research challenges the overemphasis on anthropomorphism and provides evidence-based guidelines that prioritise transparency and contextually adaptive affordances. These insights are particularly valuable for industries developing robots for healthcare, education, customer service, and other social roles, where robots must perform effectively while fostering positive human interactions. The proposed frameworks, such as the Theory of Lenses and the Interactive Lemon model, provide a conceptual guidance to improve user experience, contributing to the advancement of human-centred robotics. The reach of this research spans multiple sectors and stakeholders, including robot manufacturers, policy makers and the general public. Beyond industry, the public will benefit from a better understanding of how to interact with and use robots in everyday life, fostering acceptance and reducing fear of automation.

Overall, this research has the potential to shape the development of more trustworthy, effective and widely accepted social robots, enriching the lives of individuals and supporting innovation across sectors.

Links

Rhiannon Mogridge

PhD Thesis: An exemplar-informed approach to Speech and Language Tasks

Supervisor: Dr Anton Ragni

Cohort: 2

Thesis Examiners: Professor Roger K Moore (University of Sheffield) and Dr Sam Kirkham (Lancaster University)

Viva date: 7 November 2024

Industry partner: Toshiba

Biography

Rhiannon worked as a research scientist for fifteen years, before changing fields to do an MSc in statistics and a PhD in Speech and language Technologies. She is interested in the links between human psychology and computer science.

PhD Summary

The field of machine learning has drawn heavily from the fields of psychology and neuroscience, in particular in the development of neural network architectures, which are based on simplified versions of structures in the brain. While effective for many tasks, neural networks do not, in general, incorporate any way of storing specific experiences, instead using training data to parameterise a model, and then discarding the training date prior to inference. We explore an alternative option: a simple, explainable model from the field of human psychology called Minerva 2, which uses previously seen examples to perform classification or regression. By comparing Minerva 2 with neural networks, we demonstrate that Minerva 2 is in fact a neural network itself, with parameters taken directly from the data, rather than being trained by backpropagation. We propose new architectures, which are based on Minerva 2 and incorporate both a memory of previous examples and parameterisation that allows model flexibility. We show that feature representation is crucial for this type of model, which might explain the lack of representation of this type of model in the literature. Speech and text representations have improved rapidly in recent years, however, and if this trend continues, simple, interpretable models such those proposed here will become more competitive. As evidence of this, we use high quality speech representations in conjunction with a Minerva-based model to demonstrate state-of-the-art performance on a speech intelligibility task.

Impact

The models described in this work have the following benefits, which have the potential to impact products based on them:

Interpretability: the parameters of the model have meaning, so that the model's reasoning for decisions can be deduced.
No/reduced training: some models require no training, which leads to shorter time to deployment and a reduction in the resources required.

Links

George Close

PhD thesis: Perceptually Motivated Speech Enhancement

Supervisors: Professor Stefan Goetze and Professor Thomas Hain

Cohort: 2

Thesis Examiners: Professor Roger K Moore (University of Sheffield) and Prof. Dr.-Ing. Timo Gerkmann (University of Hamburg)

Viva date: 5 March 2025

Industry partner: Toshiba

PhD Summary

Speech Enhancement (SE) is a vital technology for online human communication. Applications of Deep Neural Network (DNN) technologies in concert with traditional signal processing approaches to the task have revolutionised both the research and implementation of SE in recent years. However, the training objective of these Neural Network Speech Enhancement (NNSE) systems generally do not consider the psychoacoustic processing which occurs in the human auditory system. As a result, enhanced audio can often contain auditory artefacts which degrade the perceptual quality or intelligibility of the speech. To overcome this, systems which directly incorporate psychoacoustically motivated measures into the training objectives of NNSE systems have been proposed.

A key development in speech audio processing in recent years is the emergence of Self Supervised Speech Representation (SSSR) models. These are powerful foundational DNN models which can be utilised for a number of more specific speech processing tasks, such as speech recognition, emotion detection as well as SE. Finally, the methods of evaluation of SE systems have been revolutionised by DNN technology, that is to say the creation of systems which are able to directly predict Mean Option Score (MOS) ratings of Speech Quality (SQ) or Speech Intelligibility (SI) derived from human listening tests.

This thesis aims to investigate these three areas; psychoacoustic training objectives of NNSE, the incorporation of SSSR features and the prediction of human derived labels of speech directly from audio signals. Further, the intersection of these areas and combined use of techniques from these areas will be investigated.

A widely adopted approach for psychoacoustically motivated NNSE training is the MetricGAN framework. Here, a NNSE network is trained as generator adversarially (pitted against in competition) with a metric prediction discriminator. The discriminator is tasked with predicting the score assigned to the input audio by a (typically non-differentiable and thus unable to be used as a loss function directly) metric function, while the generator uses inference of the discriminator to obtain a loss value for its outputs. While MetricGAN has proved effective and is becoming a widely adopted technique, there is scope to improve it in several areas. Several of the contributions of this thesis are related to these improvements including the introduction of an additional DNN tasked with improving the range of inputs to the metric prediction Discriminator, changes to the Neural Network (NN) structure of both components and the prediction of non-intrusive measures among others. A key finding of this work is that perceptually motivated NNSE systems tend to overfit towards the target perceptual metric, resulting in degraded ”real world” enhancement performance. The concept of the metric prediction is further developed into systems proposed for the related task of DNN based human MOS prediction. This can be done intrusively meaning that the system has access to a non-distorted version of the signal under test as a reference or non-intrusively meaning that only the signal under test is available. Here, human labels of SQ or SI are directly predicted from the audio signal stimulus. SI prediction is mainly investigated in the domain of hearing aid SE system evaluation in this work. State of the art performance is achieved by SQ prediction systems developed and presented in this work.

Two novel applications of SSSR are presented. Firstly, as feature space representations in the loss function of NNSE systems. In particular, it is found that using earlier intermediate DNN layer outputs in this application is particularly effective, and a strong correlation between the SSSR distance measure and psychoacoustic metrics and MOS labels is shown. Secondly, SSSR representations are proposed for use as feature extractors for the discriminator DNN components of the MetricGAN framework, as well as for MOS estimators.

Links

Tomas Goldsack

PhD thesis: Adaptive Natural Language Generation for Technical Documents

Supervisors: Prof Chenghua Lin and Dr Carolina Scarton

Cohort: 2

Thesis Examiners: Dr Diana Maynard (University of Sheffield) and Dr Stuart Middleton (University of Southampton)

Viva date: 10 April 2025

Industry partner: Advanced Manufacturing Research Centre (AMRC)

Viva passed with minor corrections

PhD Summary

Technical and specialist texts play a crucial role in documenting and advancing various fields, institutions, and academic disciplines. However, these texts often assume a certain level of prior knowledge, relying on domain-specific concepts and language that can be difficult for non-experts to understand. Text Summarisation techniques aim to retrieve and express the key points of a text within a coherent, informative summary. When trained or prompted appropriately, these techniques can also make textual information more accessible by adapting content to suit different audiences. Therefore, when applied to technical documents, summarisation has the potential to help bridge the gap between experts and non-experts, expanding access to specialised knowledge. However, the complexity and unique characteristics of such texts present significant challenges for automatic processing, limiting the effectiveness of modern summarisation systems.

This thesis explores strategies to enhance the application of summarisation models to technical texts. Furthermore, significant emphases are placed on broadening accessibility to non-technical audiences, improving content understanding through detailed analyses, and exploring how relevant external knowledge can be leveraged to improve model outputs. It is structured around five individual publications that separately investigate several pertinent directions. Publication I: Domain-driven and Discourse-guided Scientific Summarisation analyses the structure of research paper abstracts and introduces a lightweight approach for abstract generation. Publication II: Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature introduces, analyses, and benchmarks two high-quality datasets for the Lay Summarisation of research articles, enabling further research on the development of Summarisation models that can cater to a non-expert audience. Utilising one of these datasets, Publication III: Enhancing Biomedical Lay Summarisation with External Knowledge Graphs investigates the potential benefits of incorporating external knowledge (in the form of knowledge graphs) within Lay Summarisation models. Similarly, Publication IV: Leveraging Large Language Models for Zero-shot Lay Summarisation in Biomedicine and Beyond proposes a prompting technique for Lay Summarisation with Large Language Models (LLMs) that leverages newfound zero-shot abilities and knowledge of real-world industrial practices. Finally, Publication V: From Facts to Insights: A Study on the Generation and Evaluation of Analytical Reports for Deciphering Earnings Calls explores a conversational multi-agent approach for the novel task of analytical report generation with LLMs.

Links

Sam Schmück (formerly Sam Hollands)

PhD Thesis: Speech Analytics for the Detection of Neurological Conditions in Global Varieties of English

Supervisor: Professor Heidi Christensen and Dr Daniel J Blackburn (School of Medicine and Population Health)

Cohort: 2

Thesis examiners: Dr Ning Ma (University of Sheffield) and Dr Catherine Lai (University of Edinburgh)

Viva date: 6 May 2025

Viva passed with corrections

PhD Summary

Dementia is an umbrella term for the loss of cognitive and memory abilities caused by a wide variety of neurological conditions. It has been discovered that both the content of an individual’s discourse and the acoustics of their produced speech can be automatically analysed to detect dementia and other neurological conditions. If we can demonstrate robust methods for dementia classification using speech, there is potential to detect cognitive impairment years ahead of traditional approaches at a much lower cost. In addition, there are opportunities here to decrease the UK’s staggeringly high rate of undiagnosed dementia through a more streamlined reports generation approach that grants more time to clinicians to perform expert analyses. Whilst the current cutting edge demonstrates effective diagnostic capabilities on L1 (native) speakers, this thesis is interested in exploring and tackling the potential issues that may arise from the language diversity of L2 (non-native) English speakers.

The thesis explores interpretable speech analytics classification pipelines through modular, inter-modular, and system wide experiments, evaluations, and analyses. We explore automatic speech recognition (ASR) and the impact on performance of global varieties of English, identifying a stark disconnect between advertised and real-world system performances on commercialised and open-source models alike. Similarly, we identify reductionist patterns in the way ASR systems are targeted at particular L2 language varieties, describing fine-tuned derivations of the same models as unique systems. This discussion surrounding ASR is enhanced by a detailed critical analysis of ASR evaluation methods, proposing a new multi-metric standard inspired by innovations in the field of Machine Translation. Our analysis of speech analytics on L1 speech informs a new approach to automatic disfluency feature capture whilst also proposing using speech analytic fluctuation as a metric for evaluating the accuracy of ASR systems. In analysing the effects of L2 speech in our dementia corpora using these methods we observe only a small deviation in ASR performance between L1 and L2 speech using Whisper, demonstrating potential promise for the reliability of speech transcription on non-native speech, though it must be noted WERs for L1 speakers in this domain are consistently elevated above 20%. Conversely, however, we observe that L2 speech generates many analytics that more closely cluster with the speech of those with dementia than healthy L1 speakers. We find this effect to be particularly strong in lexical and syntactic complexity metrics, key speech analytics for dementia classification. Overall, this thesis establishes an insufficiency in current interpretable analytics for the accurate classification of L2 healthy controls (HCs).

This interpretable method of system design and evaluation is intended to aid in the construction of clinician facing reports (CFRs) as well as strengthen methods for assurance with regards to EDI outcomes. In order to deploy speech technologies such as these within a medical context we need a far richer understanding of the way minority voices might be impacted by our system configurations. We propose future work exploring longitudinal, speaker dependent analytics collection as a potential method for calibrating dementia classification pipelines to a function or profile of speech analytics, rather than relying on single session recording data. In addition, we suggest a transition beyond traditional definitions of HCs to improve our understanding of the impact of memory concerns on our data. Finally, we suggest a need for a greater synergy in the development of speech technologies for dementia detection in the context of developing these systems whilst actively considering how the pipeline results will be communicated in the final CFRs, patient facing reports (PFRs), and assurance protocols.

Links

Cliodhna Hughes

MPhil Thesis: An Initial Investigation into the Effects of Facial Feminisation Surgery on the Voice

Supervisor: Professor Guy Brown and Dr Ning Ma

Cohort: 4

Thesis examiners: Professor Heidi Christensen (University of Sheffield) and Dr Christian Ilbury (University of Edinburgh)

Viva date: 6 June 2025

Current employer: University of Edinburgh

Viva passed with minor corrections

Biography

Cliodhna completed an MA in linguistics before joining the CDT to gain further training in speech technology. She has a holistic interest in the voice, having been involved in projects in acoustic and articulatory phonetics, sociolinguistics, speech perception and cough analysis.

PhD Summary

Transfeminine people may choose to undergo facial feminisation surgery: an umbrella term covering a range of procedures that aim to alter the appearance of facial features, thereby potentially affecting characteristics of the vocal tract. However, effects of facial feminisation surgery on the voice are relatively understudied. This means that little information is available to people considering undergoing these procedures, about the potential impact on their voice. Such information is of particular relevance to people who use their voice professionally, such as singers. Previous work on the vocal effects of other facial surgeries also neglects to acknowledge the variety of factors that scholars in linguistics, and in particular sociolinguistics, have shown to affect the voice. This thesis presents an analysis of the effects of facial feminisation surgery on a transfeminine professional singer through a single case study. This task is approached from three different angles. First, the acoustic changes in the participant's voice are identified, through an acoustic analysis of speech and singing data collected from her before and after surgery. Then, the results of an experiment which aimed to identify the extent to which other people could perceive a change in her voice following the surgery are presented. Finally, the participant's own perception of the effects that the surgery had on her voice are explored, through a thematic analysis of longitudinal interview data. The results of the acoustic analysis and perceptual experiment suggest that facial feminisation surgery can have an impact on the voice, and the qualitative analysis suggests this may not only be as a result of the altered characteristics of the vocal tract, but also as a result of the altered social context. This research, then, provides an argument for including sociolinguistic methods in analysis of the impact of surgery on the voice.

Impact

The main intended impacts of my thesis are: (1) for trans people, to provide more information about the potential ways in which the voice could change following facial feminisation surgery, in order to allow them to make informed decisions about their healthcare; (2) for speech scientists who work with trans people, and with other people who may be undergoing facial surgeries, this thesis aims to highlight the inadequacy of the methods used in previous work to evaluate the impact of facial surgeries on the voice; (3) to draw attention to the lack of information available on the potential impact of facial feminisation surgery on the voice, and inspire further research in this area.

Links

Edward Gow-Smith

PhD Thesis: Space-Aware Subword Tokenisation and Complex Word Processing in Language Models

Supervisor: Professor Aline Villavicencio

Cohort: 2

Thesis examiners: Dr Nafise S Moosavi (University of Sheffield) and Professor Dr Hinrich Schütze (Ludwig-Maximilians-Universität)

Viva date: 28 May 2025

Current employer: Météo France, Toulouse, France

Viva passed with minor corrections

Biography

I'm a researcher at Météo France, working on machine learning for climate science.

PhD Summary

This work investigates the limitations of how subword tokenisers handle spaces, and how the processing of spaces impacts the performance of language models, with a particular focus on the processing of complex words. We propose a simple and effective modification to state-of-the-art subword tokenisers—treating spaces as individual tokens—that ameliorates known issues with the morphological validity of tokenisers such as BPE, Unigram, and WordPiece, especially regarding the splitting of prefixes. We extrinsically evaluate our space-aware tokeniser variants (BPE′, Unigram′, WordPiece′) through pretraining a number of encoder-only transformer language models, and finetuning them on a range of downstream tasks, in the general domain, and for the processing of complex words. For the latter, to extend the analysis of complex word processing beyond English for the first time, we introduce a new dataset (mCWIF) in English, German, Turkish, and Finnish. Across datasets, our space-aware tokenisers substantially improve performance on complex word classification for English, German, and Finnish, with inconsistent results in Turkish. We suggest that the Turkish results arise because topically-relevant subwords in Turkish occur either at the start of a word or not, but very rarely in both positions. In the general domain, we also find that we can remove all word boundary information from sequences and retain equivalent performance, and that such information doesn’t boost performance when included either explicitly through the input or implicitly through the pretraining task. Our work contributes to understanding of the impact of subword tokenisation, the limitations of state-of-the-art subword tokenisers, and how they can be improved for complex word processing.

Impact

My work has contributed to the understanding of how language models tokenise words, which is of benefit to the NLP community. It has been cited over 20 times. Outside of academia, my work has potential impacts for areas such a query processing, where single words are used as search terms.

Links

Ahmed Alajrami

PhD Thesis: Understanding how Language Models Learn Under Constrained or Noisy Settings

Supervisor: Professor Nikos Aletras

Cohort: 2

Thesis examiners: tbc

Viva date: tbc

PhD Summary

Language models (LMs) have become the foundation of modern Natural Language Processing (NLP), demonstrating impressive capabilities across a wide range of tasks. These models are typically trained on massive datasets with carefully curated objectives and clean supervision. However, as LMs are increasingly deployed in diverse and noisy real-world environments, it becomes critical to understand what drives their learning, what information they truly require, and how robust they are to imperfect inputs.

This thesis investigates the relationship between training signals, including pre-training objectives, input structure, and instruction quality, and the resulting generalization behavior of LMs. Through a series of empirical studies grounded in three publications, the work explores how models perform when traditional assumptions are relaxed or deliberately violated. The first study examines the role of the pre-training objective, comparing linguistically intuitive objectives like masked language modeling with arbitrary or non-linguistic objectives. Surprisingly, we find that even non-intuitive pre-training objectives can lead to models that encode meaningful linguistic information, suggesting that architecture and data distribution may play a greater role than objective semantics. The second study focuses on the internal structure of input tokens. Motivated by psycholinguistic findings, we investigate whether models can still learn effectively when only partial character-level information is available. Results show that models retain strong performance even with severely degraded input, revealing an unexpected robustness to information loss. The third and final study explores the effects of instruction noise during fine-tuning. We show that training LMs on noisy or perturbed instructions can improve generalization and robustness, even enhancing performance on standard benchmarks. These findings suggest that noise, when applied strategically, can act as an effective form of regularization.

Together, these contributions advance our understanding of how language models learn under constrained or imperfect conditions. They highlight that models are not only powerful when given large-scale, clean data, but also surprisingly resilient, and sometimes improved, when faced with noisy or incomplete information. This thesis offers practical insights for building LMs that are robust, data-efficient, and better suited for deployment in real-world and dynamic environments.

Links

Jonathan Clayton

PhD Thesis: Graphical Summarisation of Argumentative Text

Supervisor: Professor Rob Gaizauskas

Industry partner: Amazon

Cohort: 2

Thesis examiners: Dr Diana Maynard (University of Sheffield) and Professor Chris Reed (University of Dundee)

Viva date: 30 September 2025

PhD Summary

This work investigates the summarisation of argumentative text. Our main focus is on the generation of graphical summaries from dialogical argumentative texts such as online news comment sections.

We develop a novel type of graph structure to summarise argument, which we refer to as Argument Summary Graphs or ASGs. Our contributions related to these structures are twofold. Firstly, we develop two new data resources to investigate the generation of ASGs, the Debatabase-ASG dataset (created from a curated collection of online debates, idebate.org), and the SENSEIASG dataset, which we have developed by adding annotations to the SENSEI dataset of online news article comments (Barker et al., 2016).

Secondly, we investigate alternative methods for generating ASGs from both of these datasets using Large Language Models (LLMs). We carry out experiments on the Debatabase-ASG dataset and find that an end-to-end text-to-text method performs better than a pipeline approach. Additionally, we carry out experiments to test how well the end-to-end approach generalises across corpora, using both of the two corpora that we have created, and a third argument mining corpus, the Argument Annotated Essays corpus (AAEC). We find that additional fine-tuning on a monological dataset from a distinct Argument Mining task provides similar benefits to fine-tuning on a second in-genre AGSP dataset.

We also carry out two strands of work which are not narrowly focused on ASGs. Firstly, we investigate the prediction of Reasoning Markers, which can be used for the detection of argumentative text. We create a corpus for this task, and implement multiple baselines, with the best one achieving an F1 score of 0.69. Secondly, we investigate Argument Structure Parsing: the well known task of extracting argumentative components and their relations from text (Stab and Gurevych, 2017). We fine-tune several LLMs which achieve near state-of-the-art scores for carrying out this task in the end-to-end setting. We propose a novel method of formatting the output for this task, which we find to be competitive with the existing state-of-the art approach in terms of evaluation metrics, while being significantly faster to generate.

Links