Mini project

Overview

Every year, our first year students undertake a mini-project.

The mini-project is a six month activity which starts within the first few weeks of being with the CDT. It offers a fantastic opportunity to really bond with your fellow cohort members through the activity of creating an SLT research prototype system.

The mini-projects are designed to span both speech and language sub-areas so you get exposure to different technologies and approaches right from the start.

Working as a team, you will work through a conventional software engineering project approach from scoping through to implementation and evaluation. You also get chance to take on different management roles over the lifespan of the project.

Example topics are speech to speech translation, spoken question answering or call centre analytics.

Cohort two (2020 intake): A reader’s companion

The purpose of this project is to create a 'reader’s companion’.

Imagine you are reading a bulky novel, or a biography, or indeed anything that has a complicated story to tell. It would be really useful to be able to ask questions like ‘remind me who Ivan is‘ or ‘How is Ivan related to Petrula?’ or ‘Was it Ivan who met Molotov in Moscow?’.

The aim of this project is to design, build and evaluate software capable of answering such questions: a reader’s companion.

The reader’s questions will be input by voice, because you don’t want to have to look away from the book and certainly not to type anything. The companion will speak its answers.

The client has specified that there should be dialogues as opposed to isolated questions. For example:

  • Reader: Was Ivan previously married to Katerina?

  • Companion: Yes, would you like to know more?

  • Reader: Yes, where did they live?

  • Companion: In Stalingrad, and before that in Kharkov.

  • Reader: What happened to her?

The teams have been asked to consider additional functionality such as

  • providing a ‘new readers start here’ summary at the start of a session

  • linking to other books by the same author, which might share characters or settings

  • speaking passages from the book, perhaps in different voices

  • allowing the reader to make notes

Approach

Since cohort two is larger than the previous cohort, the cohort is split into two teams which adds a little competitive excitement.

In March 2021, they were in the final stages of the project.

Cohort one (2019 intake): A conversational persona for an AI pop musician

This project focused on the design and development of a spoken dialogue system to act as a persona for an Artificial Intelligence pop musician.

The project was commissioned by the frontman of a successful English rock band (shhh – we can’t tell you who) and who worked closely with the student team throughout the project.

The system is capable of conversing with music industry journalists about its ‘life’ as an AI musician in a semi-natural manner. We didn’t want it to sound too natural otherwise it wouldn’t sound like AI – think more audio-only Max Headroom for the 21st century with a grittier Northern attitude.

Approach

The team studied music and general purpose interviews by collecting and analysing an interview corpus containing 1266 interviews.

From this analysis, they proposed to establish idiosyncrasy in the system by incorporating characteristic behaviours in four ways:

  • giving ‘gnomic responses’;

  • incorporating incoherent and undirected nonsense in a number of responses (waffling);

  • predicting the emotional state of the conversational agent for each response;

  • incorporating a Northern accent to the system.

The students undertook a literature review of dialogue systems including dialogue modelling, question answering in a conversational manner, speech recognition, speech synthesis, and emotion prediction.

Their final system combined a variety of technologies including Kaldi (Automatic Speech Recognition – ASR), LPCNet and Tacotron (Speech Synthesis), a Finite State Machine (Question Answering state), a TF-IDF system (retrieval-based QA), Keras and PyTorch (waffle generation and emotion prediction).

Outcome

The team’s evaluation showed that their chatbot was ranked at the same as Cleverbot (a well-known text-based conversational AI) in terms of humanness, hitting our goal of human-like performance (a significant achievement).

The implementation of the waffle machine is completely bespoke to their system and provides an extra element of humanness for systems that aim to emulate real speech.

They also created a bespoke dataset of Northern speech which could potentially be used by anyone looking for Northern accented speech from a male speaker. It would certainly be of a sufficient size to train a Northern accented speech synthesiser.