Annual Conference 2021 – Keynotes

Professor Susan M Fitzmaurice

University of Sheffield, UK

9.30am – 10.00am on 15 June 2021


Language and Speech Research at Sheffield: transdisciplinary approaches and perspectives


In this talk, I offered a wide survey of the work on language and speech undertaken across the University of Sheffield, from the standpoint or perspective, initially, of the Arts and Humanities. The aim was to discuss approaches to the study of language and speech which have as their driving motivation, questions about how to understand human conceptual systems through the analysis of language, both speech and text.

In the context of the Sheffield Institute for Language Analytics (SILAS), I discussed some of the projects that represent research in this domain, thus illustrating the transdisciplinary and very open nature of language and speech research at Sheffield.

Speaker Bio

Susan Fitzmaurice is Vice President and Head of the Faculty of Arts and Humanities at the University of Sheffield, UK. Her home academic department is the School of English where she is Professor and Chair of English Language. Fitzmaurice has been at the University of Sheffield since 2006; she served as Head of the School of English from 2011 till 2015.

She was previously Professor of English and Head of Department, and then Dean of the College of Arts and Letters at Northern Arizona University until December 2005. From 1987 to 1995, she was University Lecturer in English and Fellow of St. Catharine’s College, Cambridge, and from 1984 to 1986, she was Lecturer in Linguistics at the University of Cape Town.

Fitzmaurice is co-editor with Bernd Kortmann of the Topics in English Linguistics (TiEL) series for Mouton de Gruyter and she serves on the Council of the Philological Society.

Professor Mark Hasegawa-Johnson

University of Illinois, USA

3.30pm – 4.30pm on 16 June 2021


Providing Speech Technology to Under-Resourced Communities


In this talk I presented results of two manuscripts currently in preparation: one on the subject of automatic discovery of phoneme inventories, one on the subject of counterfactually fair automatic speech recognition. The first paper addressed the problem of developing an ASR (automatic speech recognizer) that can be used to produce meaningful, useful transcriptions of languages that the ASR has never previously encountered.

End-to-end neural ASR can be trained to listen to audio in a large number of training languages, and to generate output transcriptions in the international phonetic alphabet. When the ASR is presented with a previously unknown language, error rates skyrocket, but not without pattern: errors tend to replace each phoneme with a phoneme from some other language that has similar articulation. Results suggest that usable transcriptions in a previously unknown language could be obtained in this way.

The second paper addressed demographic disparities in the accuracy of ASR: women tend to have higher error rates than men, blacks than whites, high school than college graduates, and younger than older speakers. One of the most stringent criteria for fairness in artificial intelligence is the criterion of counterfactual fairness: counterfactual modification of the gender, race, education or age of a person should not modify the outcome of the classifier.

It's possible to train a voice conversion algorithm to counterfactually modify the gender, race, education or age of a speaker. Simply adding counterfactual data to the training set does not reduce gender, race, education and age disparities, but by training the network to ignore counterfactual differences, it is possible to reduce gender, race, education and age disparities in the accuracy of ASR.

Speaker Bio

Mark Hasegawa-Johnson has been on the faculty at the University of Illinois since 1999, where he is currently a Professor of Electrical and Computer Engineering. He received his Ph.D. in 1996 at MIT, with a thesis titled 'Formant and Burst Spectral Measures with Quantitative Error Models for Speech Sound Classification', after which he was a postdoc at UCLA from 1996-1999.

Professor Hasegawa-Johnson is a Fellow of the Acoustical Society of America, and a Senior Member of IEEE and ACM. He is currently Treasurer of ISCA, and Senior Area Editor of the IEEE Transactions on Audio, Speech and Language.

He has published 280 peer-reviewed journal articles and conference papers in the general area of automatic speech analysis, including machine learning models of articulatory and acoustic phonetics, prosody, dysarthria, non-speech acoustic events, audio source separation, and under-resourced languages.

Professor Milica Gašić

Heinrich-Heine-Universität Düsseldorf, Germany

2.00pm – 3.00pm on 15 June 2021


Towards dynamic dialogue models


Current dialogue models are unnatural, narrow in domain and frustrating for users. Ultimately, we would rather like to converse with continuously evolving, human-like dialogue models at ease with large and extending domains. In this talk I focussed on three central challenges to this problem: adaptable dialogue state tracking, learnable policy action spaces and rich reward functions. We will see how solutions to each of these challenges bring us a step closer to building dynamic dialogue models capable of natural conversation.

Speaker Bio

Milica Gašić is a Professor of the Dialog Systems and Machine Learning Group at the Heinrich Heine University Düsseldorf. Her research focuses on fundamental questions of human-computer dialogue modelling and lie in the intersection of Natural Language Processing and Machine Learning. Prior to her current position she was a Lecturer in Spoken Dialogue Systems at the Department of Engineering, University of Cambridge where she was leading the Dialogue Systems Group. Previously, she was a Research Associate and a Senior Research Associate in the same group and a Research Fellow at Murray Edwards College.

She completed her PhD under the supervision of Professor Steve Young and the topic of her thesis was Statistical Dialogue Modelling for which she received an EPSRC PhD Plus Award. She holds an MPhil degree in Computer Speech, Text and Internet Technology from the University of Cambridge and Diploma (BSc. equivalent) in Mathematics and Computer Science from the University of Belgrade. She is a member of ACL, a member of ELLIS and a senior member of IEEE.

Professor Mark Sanderson

RMIT University, Australia

9.00am – 10.00am on 16 June 2021


Creating a Conversational Search System


The Conversational Search Paradigm promises to satisfy information needs using human-like dialogs, be it in spoken or in written form. In order to achieve a system that works seamlessly, we need researched the covers a wide range of disciplines. In this talk, I described some of the work that my PhD students had conducted in the last few years to examine different aspects of conversational search including some of the distinct human and interface elements that one needs to consider when building a fully operational conversational search system.

Speaker Bio

Mark Sanderson is a Professor of information retrieval (IR) at RMIT University, Australia and works in the Centre for Information Discovery & Data Analytics (CIDDA). He has published in topics such as cross language IR (CLIR), summarization, human interaction and search, image retrieval by captions, word sense ambiguity. His speciality area of research is in the evaluation of searching systems. He is a CI of the ADMS Centre of Excellence. He is also the Dean for Research and Innovation for the School of Engineering and School of Computing Technologies at RMIT University.

Emilio Monti

Amazon, UK

11.15am – 11.45am on 15 June 2021


Natural Language Understanding as Machine Translation


To use Natural Language as a Human-Machine Interface, we need systems capable of translating the ambiguities of a natural language utterance into an unambiguous representation of its meaning. This task is called Semantic Parsing. The state of the art solutions for Semantic Parsing resemble the ones used for Machine Translation. In this talk we highlighted some of the similarities and some of the differences between Semantic Parsing and Neural Machine Translation.

Speaker Bio

Emilio Monti is a Senior Applied Scientist at Amazon UK where he has been since 2014. He works in a team of scientists and developers working on audio, speech and natural language solutions that will revolutionize how customers interact with Amazon’s products and services. Prior to joining Amazon, he worked at ARM Cambridge where he was part of the IoT team and led the mbed platform’s embedded software team enabling developers to create ARM-based IoT devices on a massive scale. He has an MSc in Physics from the University of Bologna, Italy.

Dr Giuseppe 'Pino' Di Fabbrizio


2.00pm – 2.30pm on 15 June 2021


Conversational agents for e-commerce


Recent advances in deep learning and natural language processing are fuelling a new generation of conversational agents that are accurate and articulate in interacting with users on a broad range of subjects. Several conversational systems are also making their way to specific verticals such as banking, financing, healthcare, and e-commerce, and targeting consumers while supplanting real agents at scale in email, live chats, or other types of communications.

Although conversational agents are rapidly growing, some verticals such as e-commerce are still far from a frictionless user experience where conversion rates are seamlessly completed with voice-only 'zero-click' purchases. In this talk, we illustrated VUI conversational AI platform that has been both successfully used to optimize e-commerce conversational systems and road-test challenges that currently limit scalability and adoption.

Speaker Bio

Pino Di Fabbrizio is VUI, Inc.’s Chief Technology Officer and co-founder. Before VUI, he was a principal research scientist and group leader at the Rakuten Institute of Technology in Boston. He was previously a senior research scientist at Amazon Alexa Science and a lead research scientist at AT&T – Labs Research. His research interests and publication topics include conversational agents, machine learning, natural language understanding, natural language generation, and large-scale speech system architectures. He published more than 70 papers and was awarded 30 patents. As a senior IEEE member, he regularly contributes to international scientific committees and editorial boards.