Rubric: Proceedings - ITG Reports

ITG – Informationstechnische Gesellschaft im VDE (VDE ITG) (Ed.)

ITG-Fb. 312: Speech Communication

15th ITG Conference, 20. – 22.09.2023 in Aachen, Germany

ITG-Fachberichte

2023, 283 pages, 140 x 124 mm, Slimlinebox, CD-Rom
ISBN 978-3-8007-6164-7, e-book: ISBN 978-3-8007-6165-4
Personal VDE Members are entitled to a 10% discount on this title

Content Foreword

149.00 € CD-Rom 149.00 € e-book/PDF

The 15th ITG conference on Speech Communication solicits contributions on theory, algorithms, and applications in the following areas of speech, audio, and spoken language processing.

Topics:
- Speech Enhancement and Separation
- Source Localization and Tracking
- Detection and Classification of Acoustic
- Scenes and Events
- Automatic Speech and Speaker Recognition
- Spoken Dialogue, Diarization, and Spoken Document Retrieval Systems
- Speech Synthesis
- Speech Modeling, Coding, and Transmission
- Privacy in Speech Technologies
- Speech Production and Perception
- Speech and Audio Quality Assessment
- Paralinguistics, Speech Diagnostics and Speech-related Biosignals
- Speech in Automotive, Mobile, and Multimodal Applications
- Acoustic Interfaces, Assistive Devices, and Hearing Aids
- Hardware and Software Tools
- Emerging Topics and Applications

Die VDE ITG ist als interdisziplinär arbeitende, wissenschaftliche Fachgesellschaft in das fachübergreifende Netzwerk des VDE eingebunden. Sie agiert als Schnittstelle für Experten der Informationstechnik (ITK) in Wirtschaft, Verwaltung, Lehre und Forschung. Ihre Mitglieder bündeln in enger internationaler Anbindung die deutsche Kompetenz im Bereich der ITK. Die VDE ITG fördert Forschung und Anwendung dieser Schlüsseltechnologie sowie deren effizienten Einsatz in den Bereichen Daten- und Kommunikationstechnik und -systeme, Umweltschutz, Medizin und Verkehr.
Mit ihrem weitgespannten internationalen Netzwerk versteht sich die VDE ITG als Plattform für Innovationen und Wissenstransfer für die erfolgreiche Kooperation von Industriepartnern und Forschungseinrichtungen. Hierzu führt die ITG eine ganze Reihe von Fachtagungen, Diskussionssitzungen und Workshops durch. Mit ihren Studien und Empfehlungen bringt die VDE ITG ihre Expertise in Politik und Gesellschaft ein und nimmt an Förderprogrammen teil.

This conference proceeding contains the following papers, purchasable as PDF download with payment via credit card / PayPal:

Search Conference Papers

Ad Hoc Distributed Microphones Clustering: A Comparative Analysis on Using Coherence and Signal-Specific Features

Authors:

Kindt, Stijn; Meeldijk, Martijn; Madhu, Nilesh

Exploiting an External Microphone to Improve Time-Difference-of-Arrival Estimates for Euclidean Distance Matrix-Based Source Localization

Authors:

Bruemann, Klaus; Doclo, Simon

Hearing Impairment in Crowdsourced Speech Quality Assessments: Its Effect and Screening with Digit Triplet Hearing Test

Authors:

Schuh, Benedikt; Wardah, Wafaa; Naderi, Babak; Michal, Thilo; Moeller, Sebastian

Long-term Conversation Analysis: Exploring Utility and Privacy

Authors:

Nespoli, Francesco; Pohlhausen, Jule; Naylor, Patrick A.; Bitzer, Joerg

Towards a Natural Reproduction of Binaural Recordings: Combining Binaural Cue Adaptation and Adaptive Crosstalk Cancellation

Authors:

Kabzinski, Tobias; Nagel, Sebastian; Jax, Peter

Screening of Alzheimer’s Dementia up to 12 Years ahead from Conversational Speech of ILSE Study

Authors:

Ablimit, Ayimnisagul; Brausse, Elisa; Schultz, Tanja

Speaker’s Articulatory Strategy Analysis: Theoretical Framework and Preliminary Experiment

Authors:

Serrurier, Antoine

Speech-based Age and Gender Prediction with Transformers

Authors:

Burkhardt, Felix; Wagner, Johannes; Wierstorf, Hagen; Eyben, Florian; Schuller, Bjoern

Transfer Learning using Musical/Non-Musical Mixtures for Multi-Instrument Recognition

Authors:

Bradl, Hannes; Huber, Markus; Pernkopf, Franz

U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech

Authors:

Jing, Xin; Chang, Yi; Yang, Zijiang; Xie, Jiangjian; Triantafyllopoulos, Andreas; Schuller, Bjoern W.

Using Perceptual Evaluation of Speech Quality (PESQ) Loss for DNN-Based Speech Enhancement

Authors:

Thieling, Lars; Nippert, Lars; Jax, Peter

Advances In End-to-End Conversational Speech Quality Prediction

Authors:

Bleiholder, Stefan; Kettler, Christian; Rohrer, Nils; Weyer, Steffen

Comparative Study of LC3plus and Lyra codec on DNN-based Source Localisation for Hearing Aids

Authors:

Song, Siyuan; Kindt, Stijn; Maes, Jasper; Bohlender, Alexander; Madhu, Nilesh

Comparison of Different Neural Network Architectures for Spoken Language Identification

Authors:

Bazazo, Tala; Zeineldeen, Mohammad; Plahl, Christian; Schlueter, Ralf; Ney, Hermann

Exploring Shapely Values for Blood Glucose Level Prediction from Speech

Authors:

Pompe, Simone; Mallol-Ragolta, Adria; Schauer, Nicolas; Schuller, Bjoern W.

LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices

Authors:

Schmalenstroeer, Joerg; Gburrek, Tobias; Haeb-Umbach, Reinhold

Reduced-Complexity Binaural Source Localization for Headphones and Hearing Aids using Low-Rank DRTF Approximations

Authors:

Foerster, Jonas; Janning, Helena; Nagel, Sebastian; Jax, Peter

Single Channel Source Separation in the Wild – Conversational Speech in Realistic Environments

Authors:

Berger, Emil; Schuppler, Barbara; Pernkopf, Franz; Hagmueller, Martin

Subjective Performance Evaluation of Single-channel Speaker-conditioned Target Speaker Extraction Algorithms for Complex Acoustic Scenes

Authors:

Sinha, Ragini; Scherer, Ann-Christin; Doclo, Simon; Rollwage, Christian; Rennies, Jan

Toward Semi-supervised Transcription of NAKO+ILSE: Influence of Automatic Speech Recognition Performance on Manual Transcription Effort

Authors:

Brausse, Elisa; Scheck, Kevin; Schultz, Tanja

Towards a Brain Computer Interface for Speech Perception

Authors:

Hoege, Harald

Wind Noise Reduction with a Diffusion-based Stochastic Regeneration Model

Authors:

Lemercier, Jean-Marie; Thiemann, Joachim; Koning, Raphael; Gerkmann, Timo

Investigating Speaker Embedding Disentanglement on Natural Read Speech

Authors:

Kuhlmann, Michael; Meise, Adrian; Seebauer, Fritz; Wagner, Petra; Haeb-Umbach, Reinhold

BRUDEX Database: Binaural Room Impulse Responses with Uniformly Distributed External Microphones

Authors:

Fejgin, Daniel; Middelberg, Wiebke; Doclo, Simon

Comparative Analysis of the wav2vec 2.0 Feature Extractor

Authors:

Vieting, Peter; Schlueter, Ralf; Ney, Hermann

Design of Low-Order IIR Filters Based on Hankel Nuclear Norm Regularization for Achieving Acoustic Transparency

Authors:

Hilgemann, Florian; Weyer, Christoph; Jax, Peter

Fast Tracking of Time-Variant Systems Using Local Affine Subspaces

Authors:

Hardenbicker, Till; Jax, Peter

GeneralizedWiener Filter for Nonlinear Acoustic Echo Control

Authors:

Voit, Svantje; Enzner, Gerald

Compression of end-to-end non-autoregressive image-to-speech system for lowresourced devices

Authors:

Srinivasagan, Gokul; Deisher, Michael; Georges, Munir

CRNN-based Multi-DOA Estimator: Comparing Classification and Regression

Authors:

Cooreman, Pieter; Bohlender, Alexander; Madhu, Nilesh

Development of Hybrid ASR Systems for Low Resource Medical Domain Conversational Telephone Speech

Authors:

Luescher, Christoph; Zeineldeen, Mohammad; Yang, Zijian; Raissi, Tina; Vieting, Peter; Le-Duc, Khai; Wang, Weiyue; Schlueter, Ralf; Ney, Hermann

Evaluation of HRTF Models for Binaural Cue Adaptation

Authors:

Nagel, Sebastian; Jax, Peter

Global vs. Local Federated Learning in Heterogeneous Acoustic Environments

Authors:

Glitza, Rene; Becker, Luca; Martin, Rainer

In-the-wild Speech Emotion Conversion Using Disentangled Self-Supervised Representations and Neural Vocoder-based Resynthesis

Authors:

Raj Prabhu, Navin; Lehmann-Willenbrock, Nale; Gerkmann, Timo

Low-complexity Real-time Single-channel Speech Enhancement Based on Skip- GRUs

Authors:

Sinha, Ragini; Rollwage, Christian; Doclo, Simon

Multi-Speaker Text-to-Speech Using ForwardTacotron with Improved Duration Prediction

Authors:

Kayyar Lakshminarayana, Kishor; Dittmar, Christian; Pia, Nicola; Habets, Emanuel A.P.

On Feature Importance and Interpretability of Speaker Representations

Authors:

Rautenberg, Frederik; Kuhlmann, Michael; Wiechmann, Jana; Seebauer, Fritz; Wagner, Petra; Haeb-Umbach, Reinhold

Quantifying Harmonic Distortions in Audio Playback Systems

Authors:

Schaefer, Magnus

Stream-ETS: Low-latency End-to-end Speech Synthesis from Electromyography Signals

Authors:

Scheck, Kevin; Ivucic, Darius; Ren, Zhao; Schultz, Tanja

Analyzing And Improving Neural Speaker Embeddings for ASR

Authors:

Luescher, Christoph; Xu, Jingjing; Zeineldeen, Mohammad; Schlueter, Ralf; Ney, Hermann

Distribution Mismatch Correction for Acoustic Scene Classification

Authors:

Maier, Lukas; Fuchs, Alexander; Pernkopf, Franz

Exploratory Evaluation of Speech Content Masking

Authors:

Williams, Jennifer; Pizzi, Karla; Noe, Paul-Gauthier; Das, Sneha

Exploring Visualization Techniques for Interpretable Learning in Speech Enhancement Deep Neural Networks

Authors:

Nustede, Eike J.; Anemueller, Joern

Feedback-Aware Design of an Occlusion Effect Reduction System Using an Earbud-Mounted Vibration Sensor

Authors:

Weyer, Christoph; Jax, Peter

Fuzzy-clustering-supported Assignment of Smart-Speaker-based Microphone Arrays to Acoustic Sources in Reverberant Acoustic Environments

Authors:

Becker, Luca; Kindt, Stijn; Martin, Rainer

ITG – Informationstechnische Gesellschaft im VDE (VDE ITG) (Ed.)

ITG-Fb. 312: Speech Communication

Your Request about getting an E-Book Network Licence

Individual Cookie Settings

Necessary Cookies

Optional Cookies