Exploring infrastructure for Dutch speech recognition

Recent advancements in AI have reshaped the landscape of automatic speech recognition (ASR). Models like Whisper are praised for their accurate transcriptions. However, questions remain about their performance under atypical conditions. How effective are these systems with dialects, or when capturing speech from children, or non-native speakers? What is their accuracy in environments with multiple speakers or significant background noise? Additionally, handling large volumes of speech data can be challenging in terms of infrastructure. What are the optimal strategies for managing these large-scale transcription tasks in a structured and efficient manner?

17 juli 2024

472

Leestijd 1 minuut

1 Praat mee

On June 25th we turned the SURF office in Utrecht into a speech technology hub, as we held a seminar focused on both the current state of the art and the necessary infrastructure for its broader adoption. A joint effort, the seminar was organized by SURF and Stichting Open Spraaktechnologie. The event was open to people interested in this kind of technology, with a focus on research and education.

During the seminar, six talks were presented. Two of them gave a more technical perspective of the current state and future needs of ASR. Aside from that, four use cases were highlighted that showcased the real-world applications of speech technology. These included preserving veterans' life stories, using ASR to collect data such as surveys, transcribing Dutch podcasts with the supercomputer Snellius, and making educational content more accessible through speech-to-text in videos. Each presentation highlighted both the potential impacts and the current limitations of ASR technology in diverse settings. We wrapped up with a panel discussion featuring all speakers, who shared some final thoughts for the future of speech technology. This was an opportunity for the audience to engage in a thoughtful dialogue about the paths and challenges ahead, not only technical but also ethical.

The seminar successfully highlighted the growing interest in speech technology and its varied applications. While much progress has been made, the discussions and presentations also pointed out areas still needing improvement for this technology to become an integral part of our digital landscape. The enthusiasm we received signals a promising future for speech technology.

Below you can find the slides for the six presentations.

User perspectives on ASR