Optimal temporal dynamics of MFCCs for low-complexity VAD systems – a case study

Conference: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10/10/2018 - 10/12/2018 at Oldenburg, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Craciun, Alexandra; Baeckstroem, Tom (XMOS Ltd, UK, 2Aalto University, Department of Signal Processing and Acoustics, Finland)

Abstract:
Recent advances in machine learning strategies for speech classification require increasingly complex classifiers and large numbers of features. For practical application in lowresource systems, such methods use prohibitively large numbers of operations. A better approach involves reducing the features to the fewest, most salient ones, while simplifying the classifier structure to a minimum. The mel-frequency cepstral coefficients (MFCCs) are often used in speechrelated classification tasks, which suggests the compressed information therein is highly informative. They are computed by warping the spectral energy to a mel scale, followed by a logarithm and a discrete cosine transformation. To better understand the properties governing such features, we examine different MFCC configurations using a simple neural network classifier for a low-complexity voice activity detector. In particular, we investigate the optimal number of MFCCs, the extent of the required temporal information and the best compression rate for different analysis settings, with varying frequency resolutions.

Optimal temporal dynamics of MFCCs for low-complexity VAD systems – a case study

Individual Cookie Settings

Necessary Cookies

Optional Cookies