Every year, our first year students undertake a mini-project.
The mini-project is a six month activity which starts within the first few weeks of being with the CDT. It offers a fantastic opportunity to really bond with your fellow cohort members through the activity of creating an SLT research prototype system.
The mini-projects are designed to span both speech and language sub-areas so you get exposure to different technologies and approaches right from the start.
Working as a team, you will work through a conventional software engineering project approach from scoping through to implementation and evaluation. You also get chance to take on different management roles over the lifespan of the project.
Example topics are speech to speech translation, spoken question answering or call centre analytics.
Cohort four (2022 intake): University Library Spoken Language Information Assistant
The University of Sheffield Library offers a number of information services to support staff, students and others. These include an on-line text chat-with-a-librarian service and a web-based interface to a database of pre-composed answers to frequently asked questions (FAQs) (see https://libraryhelp.shef.ac.uk).
The FAQ database may be accessed in several ways. It can be browsed, optionally by topic, or it can be searched by the user formulating an arbitrary query, which is then “best-matched” against the FAQ answers, with the FAQ answer deemed most likely to answer the user’s query being returned.
Additionally, the library has commissioned a number of information kiosks, which are placed in various locations around the University. These feature both a soft keyboard interface and a speech interface that allows users to pose queries to the kiosk and to receive spoken and/or written feedback. Both these interfaces access the FAQ information.
Our students are working with the Library to develop more effective and efficient information services for its users. Our students' focus is on research into the speech and language tools necessary to yield an improved spoken language interface version of the service.
The cohort is split into two teams to add a little competitive excitement.
Cohort three (2021 intake): Enhanced access to oral history repositories
“Oral History” is the collection and study of recordings in which the spoken memories of people are captured as a record for future generations. Oral History thus focuses on personal and historical experiences and its production and study involves both public and academic input. Much of the recorded information cannot be found in written sources.
Oral history is an important way to capture the true spirit present at historical events. Stories told by different participants in the same situation give a rich and multifaceted record of complex events that can reveal great insight and allow us to relive the time described.
Our students are working with the UK charity Legasee which is creating a large repository of video and audio recordings of armed forces veterans with the aim "that future generations can learn about our military history through the personal recollections of the men and women who witnessed it first hand’".
The purpose of this project is apply Speech and Language Technologies to make oral history collections more accessible by automatic processing. There are many challenges when dealing with oral history recordings (especially within a military context): accents, dialects, conversational speech, emotion, specialised terminology, names (codenames, geographical locations, battles, ships, vehicles, comrades, etc).
The students will produce a set of tools that can automatically process audio-visual archive recordings in order to pre-produce an enriched version that can be further processed efficiently by archive producers and users to enhance search. The students have access to a large portion of Legasee audio-visual recordings. Some are annotated with transcripts, metadata, and summaries; others are untranscribed recordings.
The cohort is split into two teams to add a little competitive excitement.
Check out the news postings (Jan 2022, June 2022) about this project.
Following completion of the mini-project (April 2022), Martin Bisiker (Founder & Trustee of the Legasee Educational Trust) said: "I convey my sincere thanks for the effort that you've taken to enhance Legasee's search potential. There's no doubt that the charity now has an opportunity to take a big step forward in respect of what we offer."
Cohort two (2020 intake): A reader’s companion
The purpose of this project is to create a 'reader’s companion’.
Imagine you are reading a bulky novel, or a biography, or indeed anything that has a complicated story to tell. It would be really useful to be able to ask questions like ‘remind me who Ivan is‘ or ‘How is Ivan related to Petrula?’ or ‘Was it Ivan who met Molotov in Moscow?’.
The aim of this project is to design, build and evaluate software capable of answering such questions: a reader’s companion.
The reader’s questions will be input by voice, because you don’t want to have to look away from the book and certainly not to type anything. The companion will speak its answers.
The client has specified that there should be dialogues as opposed to isolated questions. For example:
Reader: Was Ivan previously married to Katerina?
Companion: Yes, would you like to know more?
Reader: Yes, where did they live?
Companion: In Stalingrad, and before that in Kharkov.
Reader: What happened to her?
The teams have been asked to consider additional functionality such as
providing a ‘new readers start here’ summary at the start of a session
linking to other books by the same author, which might share characters or settings
speaking passages from the book, perhaps in different voices
allowing the reader to make notes
Since cohort two is larger than the previous cohort, the cohort was split into two teams which added a little competitive excitement.
Cohort one (2019 intake): A conversational persona for an AI pop musician
This project focused on the design and development of a spoken dialogue system to act as a persona for an Artificial Intelligence pop musician.
The project was commissioned by the frontman of a successful English rock band (shhh – we can’t tell you who) and who worked closely with the student team throughout the project.
The system is capable of conversing with music industry journalists about its ‘life’ as an AI musician in a semi-natural manner. We didn’t want it to sound too natural otherwise it wouldn’t sound like AI – think more audio-only Max Headroom for the 21st century with a grittier Northern attitude.
The team studied music and general purpose interviews by collecting and analysing an interview corpus containing 1266 interviews.
From this analysis, they proposed to establish idiosyncrasy in the system by incorporating characteristic behaviours in four ways:
giving ‘gnomic responses’;
incorporating incoherent and undirected nonsense in a number of responses (waffling);
predicting the emotional state of the conversational agent for each response;
incorporating a Northern accent to the system.
The students undertook a literature review of dialogue systems including dialogue modelling, question answering in a conversational manner, speech recognition, speech synthesis, and emotion prediction.
Their final system combined a variety of technologies including Kaldi (Automatic Speech Recognition – ASR), LPCNet and Tacotron (Speech Synthesis), a Finite State Machine (Question Answering state), a TF-IDF system (retrieval-based QA), Keras and PyTorch (waffle generation and emotion prediction).
The team’s evaluation showed that their chatbot was ranked at the same as Cleverbot (a well-known text-based conversational AI) in terms of humanness, hitting our goal of human-like performance (a significant achievement).
The implementation of the waffle machine is completely bespoke to their system and provides an extra element of humanness for systems that aim to emulate real speech.
They also created a bespoke dataset of Northern speech which could potentially be used by anyone looking for Northern accented speech from a male speaker. It would certainly be of a sufficient size to train a Northern accented speech synthesiser.