Phoneme Boundary Detection using Deep Bidirectional LSTMs
Conference: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
10/05/2016 - 10/07/2016 at Paderborn, Deutschland
Proceedings: Speech Communication
Pages: 5Language: englishTyp: PDF
Personal VDE Members are entitled to a 10% discount on this title
Authors:
Franke, Joerg; Mueller, Markus; Stueker, Sebastian; Waibel, Alex (Karlsruhe Institute of Technology, Karlsruhe, Germany)
Hamlaoui, Fatima (Zentrum für Allgemeine Sprachwissenschaft, Berlin, Germany)
Abstract:
In this paper we investigate the automatic detection of phoneme boundaries in audio recordings with the help of deep bidirectional LSTMs. This work is motivated by the needs of the project BULB which aims to support linguists in documenting unwritten languages. The automatic detection of phoneme boundaries in audio recordings of a new language is part of the technical requirements of the BULB project. For our first experiments with LSTMs for this task, we worked on TIMIT and BUCKEYE and measured the performance of our LSTMs using accuracy, precision, recall and F-measure. We then applied the trained networks crosslingually to Basaa, one of the Bantu languages addressed in BULB. With the LSTMs trained for this paper we achieve a phoneme segmentation performance on TIMIT that, to the best of our knowledge, outperforms the systems reported in literature so far.