Venue: Lecture Theatre 7, Diamond Building
Note: all keynote talks will be delivered to an in-person audience. There will be no facility to join online.
Professor in Signal Processing and Machine Learning; Associate Head (External Engagement), School of Computer Science and Electronic Engineering, University of Surrey
Talk scheduled for: 10:15am - 11:15am
Title: Large Audio-Language Models and Applications
Abstract: Large Language Models (LLMs) are being increasingly used in audio processing to interpret and generate meaningful patterns from complex sound signals such as speech, music, and environmental sounds. When integrated with acoustic models, LLMs provide strong capabilities for addressing many challenges in audio understanding, processing, and generation. This talk will present recent progress in large audio-language models (LALMs), with a focus on new algorithms and their applications in audio-focused tasks. Key topics include audio–text fusion and alignment, cross-modal audio applications, the creation of audio-language datasets, and emerging research directions in audio-language learning. We will highlight our recent work on the application of LALMs in areas such as audio generation and storytelling (e.g., AudioLDM, AudioLDM2, WavJourney), audio source separation (e.g., AudioSep, FlowSep), audio captioning and reasoning/question answering (e.g., ACTUAL and APT-LLMs), audio editing (e.g., RFM-Editing, WavCraft), neural audio coding (e.g., SemantiCodec), and music-to-dance generation (e.g., GCDance). We will also discuss key datasets, including WavCaps, Sound-VECaps, and AudioSetCaps, used to train and evaluate large audio-language models. Focus will be placed on audio generation, source separation, and audio editing.
Speaker Bio: Wenwu Wang is a Professor in Signal Processing and Machine Learning, Associate Head of External Engagement, School of Computer Science and Electronic Engineering, University of Surrey, UK. He is also an AI Fellow at the Surrey Institute for People Centred Artificial Intelligence. His current research interests include signal processing, machine learning/AI, and machine audition (listening). He has (co)-authored over 400 papers in these areas. His work has been recognized with more than 15 accolades, including the Meta Distinguished Faculty Award (2026), Audio Engineering Society Best Technical Paper Award (2025), IEEE Signal Processing Society Young Author Best Paper Award (2022), DCASE Judge’s Award (2020, 2023, and 2024), DCASE Reproducible System Award (2019 and 2020), and LVA/ICA Best Student Paper Award (2018). He has been elected to IEEE Fellow for contributions to audio classification, generation and source separation, since 2026. He is a Senior Area Editor (2025-2027) of IEEE Open Journal of Signal Processing and an Associate Editor (2024-2028) for IEEE Transactions on Multimedia. He was a Senior Area Editor (2019-2023) and Associate Editor (2014-2018) for IEEE Transactions on Signal Processing, and an Associate Editor (2020-2025) for IEEE/ACM Transactions on Audio Speech and Language Processing. He is the elected Chair (2025-2027) of the EURASIP Technical Area Committee on Acoustic Speech and Music Signal Processing, and a Board Member (2026-2028) of the IEEE Signal Processing Society (SPS) Conferences Board. He was the elected Chair (2023-2024) of IEEE SPS Machine Learning for Signal Processing Technical Committee and a Board Member (2023-2024) of IEEE SPS Technical Directions Board. He has been on the organising committee of INTERSPEECH 2022 and IEEE ICASSP 2019 & 2024. He has been an invited Keynote or Plenary Speaker on more than 20 international conferences and workshops.
Director of Research at Google DeepMind, and Honorary Professor at UCL
Talk scheduled for: 2:00pm - 3:00pm
Title: tbc
Abstract: tbc
Speaker Bio: Ed Grefenstette previously was Head of Machine Learning at Cohere, before which he was a Research Scientist and RL Area Lead at Facebook AI Research (now Meta), preceded by his time as a Staff Research Scientist at DeepMind, following a (short) period as the CTO of Dark Blue Labs, which he also co-founded. Prior to his move to industry, Ed was working at the University of Oxford’s Department of Computer Science, and was a Fulford Junior Research Fellow at Somerville College, while also lecturing at Hertford College to students taking Oxford’s new computer science and philosophy course. Before coming to Oxford’s DCS in 2008 to do the MSc which led to his DPhil, Ed did graduate work in the philosophy departments at the University of St Andrews in mathematical logic, the foundation of mathematics, and some philosophy of language. Prior to this, he obtained a BSc in Physics & Philosophy from the University of Sheffield. Before that, he grew up in France (Tours and Grenoble) and the US (Pittsburgh, PA).