ITG – Informationstechnische Gesellschaft im VDE (VDE ITG) (Ed.)

ITG-Fb. 312: Speech Communication

15th ITG Conference, 20. – 22.09.2023 in Aachen, Germany

ITG-Fachberichte

2023, 283 pages, 140 x 124 mm, Slimlinebox, CD-Rom
ISBN 978-3-8007-6164-7, e-book: ISBN 978-3-8007-6165-4
Personal VDE Members are entitled to a 10% discount on this title

Content Foreword

The 15th ITG conference on Speech Communication solicits contributions on theory, algorithms, and applications in the following areas of speech, audio, and spoken language processing.

Topics:
- Speech Enhancement and Separation
- Source Localization and Tracking
- Detection and Classification of Acoustic
- Scenes and Events
- Automatic Speech and Speaker Recognition
- Spoken Dialogue, Diarization, and Spoken Document Retrieval Systems
- Speech Synthesis
- Speech Modeling, Coding, and Transmission
- Privacy in Speech Technologies
- Speech Production and Perception
- Speech and Audio Quality Assessment
- Paralinguistics, Speech Diagnostics and Speech-related Biosignals
- Speech in Automotive, Mobile, and Multimodal Applications
- Acoustic Interfaces, Assistive Devices, and Hearing Aids
- Hardware and Software Tools
- Emerging Topics and Applications
Die VDE ITG ist als interdisziplinär arbeitende, wissenschaftliche Fachgesellschaft in das fachübergreifende Netzwerk des VDE eingebunden. Sie agiert als Schnittstelle für Experten der Informationstechnik (ITK) in Wirtschaft, Verwaltung, Lehre und Forschung. Ihre Mitglieder bündeln in enger internationaler Anbindung die deutsche Kompetenz im Bereich der ITK. Die VDE ITG fördert Forschung und Anwendung dieser Schlüsseltechnologie sowie deren effizienten Einsatz in den Bereichen Daten- und Kommunikationstechnik und -systeme, Umweltschutz, Medizin und Verkehr.
Mit ihrem weitgespannten internationalen Netzwerk versteht sich die VDE ITG als Plattform für Innovationen und Wissenstransfer für die erfolgreiche Kooperation von Industriepartnern und Forschungseinrichtungen. Hierzu führt die ITG eine ganze Reihe von Fachtagungen, Diskussionssitzungen und Workshops durch. Mit ihren Studien und Empfehlungen bringt die VDE ITG ihre Expertise in Politik und Gesellschaft ein und nimmt an Förderprogrammen teil.
1

2

3

Hearing Impairment in Crowdsourced Speech Quality Assessments: Its Effect and Screening with Digit Triplet Hearing Test

Authors:
Schuh, Benedikt; Wardah, Wafaa; Naderi, Babak; Michal, Thilo; Moeller, Sebastian

4

Long-term Conversation Analysis: Exploring Utility and Privacy

Authors:
Nespoli, Francesco; Pohlhausen, Jule; Naylor, Patrick A.; Bitzer, Joerg

5

6

Screening of Alzheimer’s Dementia up to 12 Years ahead from Conversational Speech of ILSE Study

Authors:
Ablimit, Ayimnisagul; Brausse, Elisa; Schultz, Tanja

7

8

Speech-based Age and Gender Prediction with Transformers

Authors:
Burkhardt, Felix; Wagner, Johannes; Wierstorf, Hagen; Eyben, Florian; Schuller, Bjoern

9

Transfer Learning using Musical/Non-Musical Mixtures for Multi-Instrument Recognition

Authors:
Bradl, Hannes; Huber, Markus; Pernkopf, Franz

10

U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech

Authors:
Jing, Xin; Chang, Yi; Yang, Zijiang; Xie, Jiangjian; Triantafyllopoulos, Andreas; Schuller, Bjoern W.

11

12

Advances In End-to-End Conversational Speech Quality Prediction

Authors:
Bleiholder, Stefan; Kettler, Christian; Rohrer, Nils; Weyer, Steffen

13

Comparative Study of LC3plus and Lyra codec on DNN-based Source Localisation for Hearing Aids

Authors:
Song, Siyuan; Kindt, Stijn; Maes, Jasper; Bohlender, Alexander; Madhu, Nilesh

14

Comparison of Different Neural Network Architectures for Spoken Language Identification

Authors:
Bazazo, Tala; Zeineldeen, Mohammad; Plahl, Christian; Schlueter, Ralf; Ney, Hermann

15

Exploring Shapely Values for Blood Glucose Level Prediction from Speech

Authors:
Pompe, Simone; Mallol-Ragolta, Adria; Schauer, Nicolas; Schuller, Bjoern W.

16

LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices

Authors:
Schmalenstroeer, Joerg; Gburrek, Tobias; Haeb-Umbach, Reinhold

17

Reduced-Complexity Binaural Source Localization for Headphones and Hearing Aids using Low-Rank DRTF Approximations

Authors:
Foerster, Jonas; Janning, Helena; Nagel, Sebastian; Jax, Peter

18

Single Channel Source Separation in the Wild – Conversational Speech in Realistic Environments

Authors:
Berger, Emil; Schuppler, Barbara; Pernkopf, Franz; Hagmueller, Martin

19

Subjective Performance Evaluation of Single-channel Speaker-conditioned Target Speaker Extraction Algorithms for Complex Acoustic Scenes

Authors:
Sinha, Ragini; Scherer, Ann-Christin; Doclo, Simon; Rollwage, Christian; Rennies, Jan

20

21

22

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

Authors:
Lemercier, Jean-Marie; Thiemann, Joachim; Koning, Raphael; Gerkmann, Timo

23

Investigating Speaker Embedding Disentanglement on Natural Read Speech

Authors:
Kuhlmann, Michael; Meise, Adrian; Seebauer, Fritz; Wagner, Petra; Haeb-Umbach, Reinhold

24

25

Comparative Analysis of the wav2vec 2.0 Feature Extractor

Authors:
Vieting, Peter; Schlueter, Ralf; Ney, Hermann

26

27

Fast Tracking of Time-Variant Systems Using Local Affine Subspaces

Authors:
Hardenbicker, Till; Jax, Peter

28

GeneralizedWiener Filter for Nonlinear Acoustic Echo Control

Authors:
Voit, Svantje; Enzner, Gerald

29

Compression of end-to-end non-autoregressive image-to-speech system for lowresourced devices

Authors:
Srinivasagan, Gokul; Deisher, Michael; Georges, Munir

30

CRNN-based Multi-DOA Estimator: Comparing Classification and Regression

Authors:
Cooreman, Pieter; Bohlender, Alexander; Madhu, Nilesh

31

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

Authors:
Luescher, Christoph; Zeineldeen, Mohammad; Yang, Zijian; Raissi, Tina; Vieting, Peter; Le-Duc, Khai; Wang, Weiyue; Schlueter, Ralf; Ney, Hermann

32

Evaluation of HRTF Models for Binaural Cue Adaptation

Authors:
Nagel, Sebastian; Jax, Peter

33

Global vs. Local Federated Learning in Heterogeneous Acoustic Environments

Authors:
Glitza, Rene; Becker, Luca; Martin, Rainer

34

35

Low-complexity Real-time Single-channel Speech Enhancement Based on Skip- GRUs

Authors:
Sinha, Ragini; Rollwage, Christian; Doclo, Simon

36

Multi-Speaker Text-to-Speech Using ForwardTacotron with Improved Duration Prediction

Authors:
Kayyar Lakshminarayana, Kishor; Dittmar, Christian; Pia, Nicola; Habets, Emanuel A.P.

37

On Feature Importance and Interpretability of Speaker Representations

Authors:
Rautenberg, Frederik; Kuhlmann, Michael; Wiechmann, Jana; Seebauer, Fritz; Wagner, Petra; Haeb-Umbach, Reinhold

38

39

Stream-ETS: Low-latency End-to-end Speech Synthesis from Electromyography Signals

Authors:
Scheck, Kevin; Ivucic, Darius; Ren, Zhao; Schultz, Tanja

40

Analyzing And Improving Neural Speaker Embeddings for ASR

Authors:
Luescher, Christoph; Xu, Jingjing; Zeineldeen, Mohammad; Schlueter, Ralf; Ney, Hermann

41

Distribution Mismatch Correction for Acoustic Scene Classification

Authors:
Maier, Lukas; Fuchs, Alexander; Pernkopf, Franz

42

Exploratory Evaluation of Speech Content Masking

Authors:
Williams, Jennifer; Pizzi, Karla; Noe, Paul-Gauthier; Das, Sneha

43

44

45

46

47

Language Recognition for SSB modulated HF Radio Signals of Short Duration

Authors:
Cornaggia-Urrigshardt, Alessia; Fritz, Fabian; Henneke, Lukas; Kurth, Frank; Schlich, Christian; Wilkinghoff, Kevin

48

Self-Learning and Active-Learning for Electromyography-to-Speech Conversion

Authors:
Ren, Zhao; Scheck, Kevin; Schultz, Tanja

49

Target-Speaker Voice Activity Detection in Multi-Talker Scenarios: An Empirical Study

Authors:
Aloradi, Ahmad; Elminshawi, Mohamed; Chetupalli, Srikanth Raj; Habets, Emanuel A. P.

50

Uncertainty-Driven Hybrid Fusion for Audio-Visual Phoneme Recognition

Authors:
Fang, Huajian; Frintrop, Simone; Gerkmann, Timo

51

On the Behavior of Intrusive and Non-intrusive Speech Enhancement Metrics in Predictive and Generative Settings

Authors:
de Oliveira, Danilo; Richter, Julius; Lemercier, Jean-Marie; Peer, Tal; Gerkmann, Timo

52

Evaluation Metrics for Generative Speech Enhancement Methods: Issues and Perspectives

Authors:
Pirklbauer, Jan; Sach, Marvin; Fluyt, Kristoff; Tirry, Wouter; Wardah, Wafaa; Moeller, Sebastian; Fingscheidt, Tim

53

Improving the Naturalness of Synthesized Spectrograms for TTS Using GANBased Post-Processing

Authors:
Sani, Paolo; Bauer, Judith; Zalkow, Frank; Habets, Emanuel A. P.; Dittmar, Christian

54

Audio-Visual Speech Enhancement with Score-Based Generative Models

Authors:
Richter, Julius; Frintrop, Simone; Gerkmann, Timo

55

A Maximum Entropy Information Bottleneck (MEIB) Regularization for Generative Speech Enhancement with HiFi-GAN

Authors:
Sach, Marvin; Pirklbauer, Jan; Fluyt, Kristoff; Tirry, Wouter; Fingscheidt, Tim