Speech technology for a safer future
“For the past decade, the focus of my research has largely been on spoofing attacks in the context of automatic speaker recognition,” Professor of Speech Technology Tomi Kinnunen says.
- Academic community
“Originally, of course, I was interested in computers. In the 1980s, I was into computer games and I also learned the BASIC programming language on a Commodore-64 computer, which was popular back then. Later on, music became a hobby, which is why I’m interested in everything that has to do with sound,” Professor Tomi Kinnunen says.
“Since programming was also a hobby of mine, it was a natural choice to start studying computer science at the then University of Joensuu. In the last year of my Master’s level studies, I was able to bring these different interests together and I wrote my Master’s thesis on automatic speaker recognition. I continued on the same topic in my postgraduate studies: I obtained my Licentiate of Philosophy degree in 2004 and my PhD in 2005 – and the topic is still being actively researched. After defending my PhD, I spent two years at the Institute for Infocomm Research in Singapore. My research has mainly been funded by the Academy of Finland and I’ve also secured one H2020 project (OCTAVE).”
Speaker recognition is used, for example, in smart speakers and smart phones (personal profiles, voice login), in telephone switchboards (is the caller is who they claim to be), in forensic voice comparison (is the person speaking the suspect), and in access control.
“For the past decade, the focus of my research has largely been on various spoofing attacks. Among other things, we have studied the impact of replay attacks, text-to-speech synthesis and voice conversion on speaker recognition. The latter two can be used to ‘put words in someone’s mouth’, and it is becoming increasingly difficult to tell synthetic speech apart from real one, at least by ear alone. We have also studied, for example, the impact of imitation and the effects of deliberate changes to a speaker’s voice.”
Professor Kinnunen says that in the future, you may well find yourself in a situation where you think you are getting a call from your mother, supervisor or colleague, but the caller is someone else entirely.
“I’m certain that we’ll also see more and more manipulated image, text, audio and video material on social media, and the recent years’ famous deepfake videos are just the beginning. We simply must learn to live with this new reality.”
“From the viewpoint of methodological research, it is interesting to study what kind of attacks and manipulations can be detected automatically, and how vulnerable different recognition systems are to different types of attacks,” Professor Kinnunen says.
“For instance, we have developed new machine learning-based methods for identifying synthetic and modified speech (i.e. whether the speaker is a machine or a human). I’m also one of the founders and organisers of the ASVspoof challenge (www.asvspoof.org), which is an internationally acknowledged research challenge. The aim is to not only to discover vulnerabilities in speaker recognition technology, but also to find solutions together. The competition is open to everyone and the related research data is openly accessible.”
“The ASVspoof competition has become well-known among the field’s researchers and companies. A significant number of researchers worldwide are currently working to identify and fix recognizer vulnerabilities. But since the field is constantly evolving and reforming, I’m not too keen to speculate about the future. More and more speech technology will certainly be seen in the consumer electronics sector. However, we need to be able to advance the underlying methodological research through basic research.”
“I think people in Finland are well informed when it comes to IT skills. I believe that the role of machine learning and data will grow in all sectors in the future. It is always worthwhile to study computer science.”
For further information, please contact:
Professor Tomi Kinnunen, tkinnu (a) cs.uef.fi
Tomi Kinnunen appointed as Professor of Computer Science, especially Speech Technology from 1 January 2021onwards (invitation procedure)
Master of Science (Computer Science), University of Joensuu, 1999
Licentiate of Philosophy (Computer Science), University of Joensuu, 2004
Doctor of Philosophy (Computer Science), University of Joensuu, 2005
Docent (Speaker and Language Recognition), Aalto University, 2014
Most important roles:
Professor of Speech Technology, University of Eastern Finland, 2021–
Associate Professor (Tenure Track), University of Eastern Finland, 2017–2020
Assistant Professor (Tenure Track), University of Eastern Finland, 2013–2016
Visiting Scholar, National Institute of Informatics (NII), Japan, 2015–2016
Academy of Finland Postdoctoral Researcher, University of Eastern Finland, 2010–2012
Various research and teaching roles, University of Joensuu, 2007–2009
Researcher, Institute for Infocomm Research (I2R), Singapore, 2005–2007