Temat: speech recognition - Prolib Integro

Skocz do pozycji: 1.

Tytuł:: Enhancing Speech Recognition in Adverse Listening Environments: The Impact of Brief Musical Training on Older Adults
Autorzy:: Nandakumar, Akhila R
Somashekara, Haralakatta Shivananjappa
Kanagokar, Vibha
Pitchaimuthu, Arivudai Nambi
Tematy:: musical training
carnatic music
speech recognition in noise
speech recognition in reverberation; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czasopisma i Monografie PAN
Powiązania:: https://bibliotekanauki.pl/articles/31339763.pdf Link otwiera się w nowym oknie
Opis:: The present research investigated the effects of short-term musical training on speech recognition in adverse listening conditions in older adults. A total of 30 Kannada-speaking participants with no history of gross otologic, neurologic, or cognitive problems were divided equally into experimental (M = 63 years) and control groups (M = 65 years). Baseline and follow-up assessments for speech in noise (SNR50) and reverberation was carried out for both groups. The participants in the experimental group were subjected to Carnatic classical music training, which lasted for seven days. The Bayesian likelihood estimates revealed no difference in SNR50 and speech recognition scores in reverberation between baseline and followed-up assessment for the control group. Whereas, in the experimental group, the SNR50 reduced, and speech recognition scores improved following musical training, suggesting the positive impact of music training. The improved performance on speech recognition suggests that short-term musical training using Carnatic music can be used as a potential tool to improve speech recognition abilities in adverse listening conditions in older adults.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 2.

Tytuł:: Allphones in automatic speech recognition
Alofony w automatycznym rozpoznawaniu mowy
Autorzy:: Giernacki, Wojciech
Dąbrowski, Adam
Sadalla, Talar
Kozierski, Piotr
Wydawca:: Wydawnictwo Poznańskiego Towarzystwa Przyjaciół Nauk
Cytata wydawnicza:: P. Kozierski, T. Sadalla, A. Dąbrowski, W. Giernacki: Allphones in automatic speech recognition. Studia z Automatyki i Informatyki, Vol. 41, 2016, pp. 47-53.
Opis:: Typowym podejściem do zagadnienia rozpoznawania mowy jest branie pod uwagę fonemów, jako podstawowych części mowy. Zamiast tego autorzy zaproponowali wykorzystanie alofonów. Dla najrzadziej występujących alofonów zaproponowano ich zamianę na inne alofony – zaproponowano 4 metody wyboru głosek do zamiany. Na podstawie uzyskanych wyników stwierdzono, że efektywne wykorzystanie dodatkowych informacji, jakie niosą alofony, nie będzie możliwe bez modyfikacji obecnie dostępnych algorytmów.
Piotr Kozierski
The common approach to speech recognition problem is the use of phonemes as basic parts of speech. The authors proposed allophones usage instead. For rarer allophones the conversion into other allophones (4 selection methods) has been proposed. Based on the obtained results one can say that the effective use of the additional information from the allophonic notation will not be possible without modification of currently used algorithms.
Dostawca treści:: Repozytorium Centrum Otwartej Nauki

Artykuł

na półce

Skocz do pozycji: 3.

Tytuł:: Phoneme Segmentation Based on Wavelet Spectra Analysis
Autorzy:: Ziółko, B.
Manandhar, S.
Wilson, R. C.
Ziółko, M.
Tematy:: speech recognition
speech segmentation
discrete wavelet transform; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Powiązania:: https://bibliotekanauki.pl/articles/177480.pdf Link otwiera się w nowym oknie
Opis:: A phoneme segmentation method based on the analysis of discrete wavelet transform spectra is described. The localization of phoneme boundaries is particularly useful in speech recognition. It enables one to use more accurate acoustic models since the length of phonemes provide more information for parametrization. Our method relies on the values of power envelopes and their first derivatives for six frequency subbands. Specific scenarios that are typical for phoneme boundaries are searched for. Discrete times with such events are noted and graded using a distribution-like event function, which represent the change of the energy distribution in the frequency domain. The exact definition of this method is described in the paper. The final decision on localization of boundaries is taken by analysis of the event function. Boundaries are, therefore, extracted using information from all subbands. The method was developed on a small set of Polish hand segmented words and tested on another large corpus containing 16 425 utterances. A recall and precision measure specifically designed to measure the quality of speech segmentation was adapted by using fuzzy sets. From this, results with F-score equal to 72.49% were obtained.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 4.

Tytuł:: Accents in Speech Recognition through the Lens of a World Englishes Evaluation Set
Autorzy:: Del Río, Miguel
Miller, Corey
Profant, Ján
Drexler-Fox, Jennifer
Mcnamara, Quinn
Bhandari, Nishchal
Delworth, Natalie
Pirkin, Ilya
Jetté, Migüel
Chandra, Shipra
Ha, Peter
Westerman, Ryan
Tematy:: accents
dialects
speech recognition
bias
multilingual; Pokaż więcej
Wydawca:: Uniwersytet Łódzki. Wydawnictwo Uniwersytetu Łódzkiego
Powiązania:: https://bibliotekanauki.pl/articles/57119913.pdf Link otwiera się w nowym oknie
Opis:: Automatic Speech Recognition (ASR) systems generalize poorly on accented speech, creating bias issues for users and providers. The phonetic and linguistic variability of accents present challenges for ASR systems in both data collection and modeling strategies. We present two promising approaches to accented speech recognition— custom vocabulary and multilingual modeling— and highlight key challenges in the space. Among these, lack of a standard benchmark makes research and comparison difficult. We address this with a novel corpus of accented speech: Earnings-22, A 125 file, 119 hour corpus of English-language earnings calls gathered from global companies. We compare commercial models showing variation in performance when taking country of origin into consideration and demonstrate targeted improvements using the methods we introduce.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 5.

Tytuł:: Two-Microphone Dereverberation for Automatic Speech Recognition of Polish
Autorzy:: Kundegorski, M.
Jackson, P. J. B.
Ziółko, B.
Tematy:: speech enhancement
reverberation
automatic speech recognition
ASR
Polish; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Powiązania:: https://bibliotekanauki.pl/articles/176431.pdf Link otwiera się w nowym oknie
Opis:: Reverberation is a common problem for many speech technologies, such as automatic speech recogni- tion (ASR) systems. This paper investigates the novel combination of precedence, binaural and statistical independence cues for enhancing reverberant speech, prior to ASR, under these adverse acoustical con- ditions when two microphone signals are available. Results of the enhancement are evaluated in terms of relevant signal measures and accuracy for both English and Polish ASR tasks. These show inconsistencies between the signal and recognition measures, although in recognition the proposed method consistently outperforms all other combinations and the spectral-subtraction baseline.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 6.

Tytuł:: Estimation of Hardware Requirements for Isolated Speech Recognition on an Embedded Systems
Autorzy:: Kłobucki, K.
Mąka, T.
Tematy:: isolated speech recognition
ASR
resources estimation; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Powiązania:: https://bibliotekanauki.pl/articles/227186.pdf Link otwiera się w nowym oknie
Opis:: In recent years, speech recognition functionality is increasingly being added in embedded devices. Because of limited resources in these devices, there is a need to assess whether the defined speech recognition system is feasible within given constraints, as well as estimating how many resources the system needs. In this paper, an attempt has been taken to define a technique for estimating hardware resources usage in the speech recognition task. To determine the parameters and their dependencies in this task, the two systems were tested. The first system utilized Dynamic Time Warping pattern matching technique, the second used Hidden Markov Models. For each case, the measurement of recognition rate and time, vocabulary database size and learning time has been performed. Obtained results have been exploited to define linear and polynomial regression models, and finally, an estimation algorithm has been developed using these models. After testing proposed approach, it was observed that even low-end mobile phones have sufficient hardware resources for realisation of isolated speech recognition system.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 7.

Tytuł:: Sentence Recognition in the Presence of Competing Speech Messages Presented in Audiometric Booths with Reverberation Times of 0.4 and 0.6 Seconds
Autorzy:: Abouchacra, K. S.
Koehnke, J.
Besing, J.
Letowski, T.
Tematy:: sound field testing
reverberation
speech recognition; Pokaż więcej
Wydawca:: Polska Akademia Nauk. Czytelnia Czasopism PAN
Powiązania:: https://bibliotekanauki.pl/articles/177482.pdf Link otwiera się w nowym oknie
Opis:: This study examined whether differences in reverberation time (RT) between typical sound field test rooms used in audiology clinics have an effect on speech recognition in multi-talker environments. Separate groups of participants listened to target speech sentences presented simultaneously with 0-to-3 competing sentences through four spatially-separated loudspeakers in two sound field test rooms having RT = 0:6 sec (Site 1: N = 16) and RT = 0:4 sec (Site 2: N = 12). Speech recognition scores (SRSs) for the Synchronized Sentence Set (S3) test and subjective estimates of perceived task difficulty were recorded. Obtained results indicate that the change in room RT from 0.4 to 0.6 sec did not significantly influence SRSs in quiet or in the presence of one competing sentence. However, this small change in RT affected SRSs when 2 and 3 competing sentences were present, resulting in mean SRSs that were about 8–10% better in the room with RT = 0:4 sec. Perceived task difficulty ratings increased as the complexity of the task increased, with average ratings similar across test sites for each level of sentence competition. These results suggest that site-specific normative data must be collected for sound field rooms if clinicians would like to use two or more directional speech maskers during routine sound field testing.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 8.

Tytuł:: System rozpoznawania mowy z ograniczonym słownikiem
Speech recognition system with limited dictionary
Autorzy:: Grabowski, D.
Kwiatkowska, M.
Świerczewski, Ł.
Tematy:: rozpoznawanie mowy
ASR
MFCC
speech recognition; Pokaż więcej
Wydawca:: Wrocławska Wyższa Szkoła Informatyki Stosowanej Horyzont
Powiązania:: https://bibliotekanauki.pl/articles/131953.pdf Link otwiera się w nowym oknie
Opis:: Motywacją w pisanej pracy jest omówienie i porównanie popularnych algorytmów rozpoznawania mowy na różnych systemach. Zebrane informacje są przedstawione w stosunkowo krótkiej formie, bez wnikliwej analizy dowodów matematycznych, do których przedstawienia i tak potrzebne jest odniesienie się do odrębnych specjalistycznych źródeł. Omówione zostały tutaj problemy pewne związane z ASR (ang. Automatic Speech Recognition) i perspektywy na rozwiązanie ich. Na podstawie dostępnych rozwiązań stworzony został moduł aplikacji umożliwiający porównywanie zebranych nagrań pod kątem podobieństwa sygnału mowy i przedstawienie wyników w formie tabelarycznej. Stworzona biblioteka w celach prezentacyjnych została użyta do pełnej aplikacji umożliwiającej wykonywanie rozkazów na podstawie słów wypowiadanych do mikrofonu. Wyniki posłużą nie tyle za ostateczne wnioski w tematyce rozpoznawania mowy, co za wskazówki do kolejnych analiz i badań. Mimo postępów w badaniach nad ASR, nadal nie ma algorytmów o skuteczności przekraczającej 95%. Motywacją do dalszych działań może być np. społeczne wykluczenie ludzi nie mogących posługiwać się komunikacją polegającą na wzroku.
Motivation of this thesis is discussion about popular ASR algorithms and comparision on various architectures. Collected results are presented in relatively short shape. It’s done without math argumentation because it could depend on complicated equations. Here are discussed some problems associated with ASR (Automatic Speech Recognition) and the prospects for a solution to their. On the basis of available solutions it was developed application module that allows comparison of collected recordings in respect of similarity of the speech signal and present the results in tabular form. For presentation purposes it has been created a library and it was used in complete application that allows execution of commands based on the words spoken to microphone. The results will be used not only for the final conclusions about ASR, what clues for further analysis and research. Despite the advances in research on ASR, still there are no algorithms for effectiveness in excess of 95%. The motivation for further actions may be, eg, the social exclusion of people who can not use the communication involving the eye
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 9.

Tytuł:: Recognition of the numbers in the Polish language
Autorzy:: Plichta, A.
Gąciarz, T.
Krzywdziński, T.
Tematy:: Automatic Speech Recognition
compressed sensing
Sparse Classification; Pokaż więcej
Wydawca:: Instytut Łączności - Państwowy Instytut Badawczy
Powiązania:: https://bibliotekanauki.pl/articles/308844.pdf Link otwiera się w nowym oknie
Opis:: Automatic Speech Recognition is one of the hottest research and application problems in today’s ICT technologies. Huge progress in the development of the intelligent mobile systems needs an implementation of the new services, where users can communicate with devices by sending audio commands. Those systems must be additionally integrated with the highly distributed infrastructures such as computational and mobile clouds, Wireless Sensor Networks (WSNs), and many others. This paper presents the recent research results for the recognition of the separate words and words in short contexts (limited to the numbers) articulated in the Polish language. Compressed Sensing Theory (CST) is applied for the first time as a methodology of speech recognition. The effectiveness of the proposed methodology is justified in numerical tests for both separate words and short sentences.
Dostawca treści:: Biblioteka Nauki

Artykuł

na półce

Skocz do pozycji: 10.

Tytuł:: Generalisation gap of keyword spotters in a cross-speaker low-resource scenario
Autorzy:: Nowak, Robert
Radzikowski, Kacper
Piczak, Karol
Lepak, Łukasz
Opis:: Models for keyword spotting in continuous recordings can significantly improve the experience of navigating vast libraries of audio recordings. In this paper, we describe the development of such a keyword spotting system detecting regions of interest in Polish call centre conversations. Unfortunately, in spite of recent advancements in automatic speech recognition systems, human-level transcription accuracy reported on English benchmarks does not reflect the performance achievable in low-resource languages, such as Polish. Therefore, in this work, we shift our focus from complete speech-to-text conversion to acoustic similarity matching in the hope of reducing the demand for data annotation. As our primary approach, we evaluate Siamese and prototypical neural networks trained on several datasets of English and Polish recordings. While we obtain usable results in English, our models’ performance remains unsatisfactory when applied to Polish speech, both after mono- and cross-lingual training. This performance gap shows that generalisation with limited training resources is a significant obstacle for actual deployments in low-resource languages. As a potential countermeasure, we implement a detector using audio embeddings generated with a generic pre-trained model provided by Google. It has a much more favourable profile when applied in a cross-lingual setup to detect Polish audio patterns. Nevertheless, despite these promising results, its performance on out-of-distribution data are still far from stellar. It would indicate that, in spite of the richness of internal representations created by more generic models, such speech embeddings are not entirely malleable to cross-language transfer.
Dostawca treści:: Repozytorium Uniwersytetu Jagiellońskiego

Artykuł

na półce

Informacja

Wyszukujesz frazę "speech recognition" wg kryterium: Temat