- Tytuł:
- Audio-Visual Speech Processing System for Polish Applicable to Human-Computer Interaction
- Autorzy:
- Jadczyk, T.
- Tematy:
-
audio-visual speech recognition
visual features extraction
human-computer interaction - Pokaż więcej
- Wydawca:
- Akademia Górniczo-Hutnicza im. Stanisława Staszica w Krakowie. Wydawnictwo AGH
- Powiązania:
- https://bibliotekanauki.pl/articles/305828.pdf  Link otwiera się w nowym oknie
- Opis:
- This paper describes audio-visual speech recognition system for Polish language and a set of performance tests under various acoustic conditions. We first present the overall structure of AVASR systems with three main areas: audio features extraction, visual features extraction and subsequently, audiovisual speech integration. We present MFCC features for audio stream with standard HMM modeling technique, then we describe appearance and shape based visual features. Subsequently we present two feature integration techniques, feature concatenation and model fusion. We also discuss the results of a set of experiments conducted to select best system setup for Polish, under noisy audio conditions. Experiments are simulating human-computer interaction in computer control case with voice commands in difficult audio environments. With Active Appearance Model (AAM) and multistream Hidden Markov Model (HMM) we can improve system accuracy by reducing Word Error Rate for more than 30%, comparing to audio-only speech recognition, when Signal-to-Noise Ratio goes down to 0dB.
- Dostawca treści:
- Biblioteka Nauki
Artykuł