CINXE.COM

Search results for: audio-visual speech recognition

<!DOCTYPE html> <html lang="en" dir="ltr"> <head> <!-- Google tag (gtag.js) --> <script async src="https://www.googletagmanager.com/gtag/js?id=G-P63WKM1TM1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-P63WKM1TM1'); </script> <!-- Yandex.Metrika counter --> <script type="text/javascript" > (function(m,e,t,r,i,k,a){m[i]=m[i]||function(){(m[i].a=m[i].a||[]).push(arguments)}; m[i].l=1*new Date(); for (var j = 0; j < document.scripts.length; j++) {if (document.scripts[j].src === r) { return; }} k=e.createElement(t),a=e.getElementsByTagName(t)[0],k.async=1,k.src=r,a.parentNode.insertBefore(k,a)}) (window, document, "script", "https://mc.yandex.ru/metrika/tag.js", "ym"); ym(55165297, "init", { clickmap:false, trackLinks:true, accurateTrackBounce:true, webvisor:false }); </script> <noscript><div><img src="https://mc.yandex.ru/watch/55165297" style="position:absolute; left:-9999px;" alt="" /></div></noscript> <!-- /Yandex.Metrika counter --> <!-- Matomo --> <!-- End Matomo Code --> <title>Search results for: audio-visual speech recognition</title> <meta name="description" content="Search results for: audio-visual speech recognition"> <meta name="keywords" content="audio-visual speech recognition"> <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, user-scalable=no"> <meta charset="utf-8"> <link href="https://cdn.waset.org/favicon.ico" type="image/x-icon" rel="shortcut icon"> <link href="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/css/bootstrap.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/plugins/fontawesome/css/all.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/css/site.css?v=150220211555" rel="stylesheet"> </head> <body> <header> <div class="container"> <nav class="navbar navbar-expand-lg navbar-light"> <a class="navbar-brand" href="https://waset.org"> <img src="https://cdn.waset.org/static/images/wasetc.png" alt="Open Science Research Excellence" title="Open Science Research Excellence" /> </a> <button class="d-block d-lg-none navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#navbarMenu" aria-controls="navbarMenu" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="w-100"> <div class="d-none d-lg-flex flex-row-reverse"> <form method="get" action="https://waset.org/search" class="form-inline my-2 my-lg-0"> <input class="form-control mr-sm-2" type="search" placeholder="Search Conferences" value="audio-visual speech recognition" name="q" aria-label="Search"> <button class="btn btn-light my-2 my-sm-0" type="submit"><i class="fas fa-search"></i></button> </form> </div> <div class="collapse navbar-collapse mt-1" id="navbarMenu"> <ul class="navbar-nav ml-auto align-items-center" id="mainNavMenu"> <li class="nav-item"> <a class="nav-link" href="https://waset.org/conferences" title="Conferences in 2024/2025/2026">Conferences</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/disciplines" title="Disciplines">Disciplines</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/committees" rel="nofollow">Committees</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownPublications" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> Publications </a> <div class="dropdown-menu" aria-labelledby="navbarDropdownPublications"> <a class="dropdown-item" href="https://publications.waset.org/abstracts">Abstracts</a> <a class="dropdown-item" href="https://publications.waset.org">Periodicals</a> <a class="dropdown-item" href="https://publications.waset.org/archive">Archive</a> </div> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/page/support" title="Support">Support</a> </li> </ul> </div> </div> </nav> </div> </header> <main> <div class="container mt-4"> <div class="row"> <div class="col-md-9 mx-auto"> <form method="get" action="https://publications.waset.org/abstracts/search"> <div id="custom-search-input"> <div class="input-group"> <i class="fas fa-search"></i> <input type="text" class="search-query" name="q" placeholder="Author, Title, Abstract, Keywords" value="audio-visual speech recognition"> <input type="submit" class="btn_search" value="Search"> </div> </div> </form> </div> </div> <div class="row mt-3"> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Commenced</strong> in January 2007</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Frequency:</strong> Monthly</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Edition:</strong> International</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Paper Count:</strong> 2370</div> </div> </div> </div> <h1 class="mt-3 mb-3 text-center" style="font-size:1.6rem;">Search results for: audio-visual speech recognition</h1> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2370</span> Multimodal Data Fusion Techniques in Audiovisual Speech Recognition</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Hadeer%20M.%20Sayed">Hadeer M. Sayed</a>, <a href="https://publications.waset.org/abstracts/search?q=Hesham%20E.%20El%20Deeb"> Hesham E. El Deeb</a>, <a href="https://publications.waset.org/abstracts/search?q=Shereen%20A.%20Taie"> Shereen A. Taie</a> </p> <p class="card-text"><strong>Abstract:</strong></p> In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=multimodal%20data" title="multimodal data">multimodal data</a>, <a href="https://publications.waset.org/abstracts/search?q=data%20fusion" title=" data fusion"> data fusion</a>, <a href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition" title=" audio-visual speech recognition"> audio-visual speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=neural%20networks" title=" neural networks"> neural networks</a> </p> <a href="https://publications.waset.org/abstracts/157362/multimodal-data-fusion-techniques-in-audiovisual-speech-recognition" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/157362.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">111</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2369</span> Possibilities, Challenges and the State of the Art of Automatic Speech Recognition in Air Traffic Control</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Van%20Nhan%20Nguyen">Van Nhan Nguyen</a>, <a href="https://publications.waset.org/abstracts/search?q=Harald%20Holone"> Harald Holone</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Over the past few years, a lot of research has been conducted to bring Automatic Speech Recognition (ASR) into various areas of Air Traffic Control (ATC), such as air traffic control simulation and training, monitoring live operators for with the aim of safety improvements, air traffic controller workload measurement and conducting analysis on large quantities controller-pilot speech. Due to the high accuracy requirements of the ATC context and its unique challenges, automatic speech recognition has not been widely adopted in this field. With the aim of providing a good starting point for researchers who are interested bringing automatic speech recognition into ATC, this paper gives an overview of possibilities and challenges of applying automatic speech recognition in air traffic control. To provide this overview, we present an updated literature review of speech recognition technologies in general, as well as specific approaches relevant to the ATC context. Based on this literature review, criteria for selecting speech recognition approaches for the ATC domain are presented, and remaining challenges and possible solutions are discussed. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=automatic%20speech%20recognition" title="automatic speech recognition">automatic speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=asr" title=" asr"> asr</a>, <a href="https://publications.waset.org/abstracts/search?q=air%20traffic%20control" title=" air traffic control"> air traffic control</a>, <a href="https://publications.waset.org/abstracts/search?q=atc" title=" atc"> atc</a> </p> <a href="https://publications.waset.org/abstracts/31004/possibilities-challenges-and-the-state-of-the-art-of-automatic-speech-recognition-in-air-traffic-control" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/31004.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">399</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2368</span> Review of Speech Recognition Research on Low-Resource Languages</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=XuKe%20Cao">XuKe Cao</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This paper reviews the current state of research on low-resource languages in the field of speech recognition, focusing on the challenges faced by low-resource language speech recognition, including the scarcity of data resources, the lack of linguistic resources, and the diversity of dialects and accents. The article reviews recent progress in low-resource language speech recognition, including techniques such as data augmentation, end to-end models, transfer learning, and multi-task learning. Based on the challenges currently faced, the paper also provides an outlook on future research directions. Through these studies, it is expected that the performance of speech recognition for low resource languages can be improved, promoting the widespread application and adoption of related technologies. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=low-resource%20languages" title="low-resource languages">low-resource languages</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=data%20augmentation%20techniques" title=" data augmentation techniques"> data augmentation techniques</a>, <a href="https://publications.waset.org/abstracts/search?q=NLP" title=" NLP"> NLP</a> </p> <a href="https://publications.waset.org/abstracts/193863/review-of-speech-recognition-research-on-low-resource-languages" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/193863.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">12</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2367</span> Modern Machine Learning Conniptions for Automatic Speech Recognition</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=S.%20Jagadeesh%20Kumar">S. Jagadeesh Kumar</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This expose presents a luculent of recent machine learning practices as employed in the modern and as pertinent to prospective automatic speech recognition schemes. The aspiration is to promote additional traverse ablution among the machine learning and automatic speech recognition factions that have transpired in the precedent. The manuscript is structured according to the chief machine learning archetypes that are furthermore trendy by now or have latency for building momentous hand-outs to automatic speech recognition expertise. The standards offered and convoluted in this article embraces adaptive and multi-task learning, active learning, Bayesian learning, discriminative learning, generative learning, supervised and unsupervised learning. These learning archetypes are aggravated and conferred in the perspective of automatic speech recognition tools and functions. This manuscript bequeaths and surveys topical advances of deep learning and learning with sparse depictions; further limelight is on their incessant significance in the evolution of automatic speech recognition. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=automatic%20speech%20recognition" title="automatic speech recognition">automatic speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=deep%20learning%20methods" title=" deep learning methods"> deep learning methods</a>, <a href="https://publications.waset.org/abstracts/search?q=machine%20learning%20archetypes" title=" machine learning archetypes"> machine learning archetypes</a>, <a href="https://publications.waset.org/abstracts/search?q=Bayesian%20learning" title=" Bayesian learning"> Bayesian learning</a>, <a href="https://publications.waset.org/abstracts/search?q=supervised%20and%20unsupervised%20learning" title=" supervised and unsupervised learning"> supervised and unsupervised learning</a> </p> <a href="https://publications.waset.org/abstracts/71467/modern-machine-learning-conniptions-for-automatic-speech-recognition" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/71467.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">447</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2366</span> Advances in Artificial intelligence Using Speech Recognition</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Khaled%20M.%20Alhawiti">Khaled M. Alhawiti</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This research study aims to present a retrospective study about speech recognition systems and artificial intelligence. Speech recognition has become one of the widely used technologies, as it offers great opportunity to interact and communicate with automated machines. Precisely, it can be affirmed that speech recognition facilitates its users and helps them to perform their daily routine tasks, in a more convenient and effective manner. This research intends to present the illustration of recent technological advancements, which are associated with artificial intelligence. Recent researches have revealed the fact that speech recognition is found to be the utmost issue, which affects the decoding of speech. In order to overcome these issues, different statistical models were developed by the researchers. Some of the most prominent statistical models include acoustic model (AM), language model (LM), lexicon model, and hidden Markov models (HMM). The research will help in understanding all of these statistical models of speech recognition. Researchers have also formulated different decoding methods, which are being utilized for realistic decoding tasks and constrained artificial languages. These decoding methods include pattern recognition, acoustic phonetic, and artificial intelligence. It has been recognized that artificial intelligence is the most efficient and reliable methods, which are being used in speech recognition. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title="speech recognition">speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=acoustic%20phonetic" title=" acoustic phonetic"> acoustic phonetic</a>, <a href="https://publications.waset.org/abstracts/search?q=artificial%20intelligence" title=" artificial intelligence"> artificial intelligence</a>, <a href="https://publications.waset.org/abstracts/search?q=hidden%20markov%20models%20%28HMM%29" title=" hidden markov models (HMM)"> hidden markov models (HMM)</a>, <a href="https://publications.waset.org/abstracts/search?q=statistical%20models%20of%20speech%20recognition" title=" statistical models of speech recognition"> statistical models of speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=human%20machine%20performance" title=" human machine performance"> human machine performance</a> </p> <a href="https://publications.waset.org/abstracts/26319/advances-in-artificial-intelligence-using-speech-recognition" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/26319.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">477</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2365</span> The Capacity of Mel Frequency Cepstral Coefficients for Speech Recognition</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Fawaz%20S.%20Al-Anzi">Fawaz S. Al-Anzi</a>, <a href="https://publications.waset.org/abstracts/search?q=Dia%20AbuZeina"> Dia AbuZeina</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Speech recognition is of an important contribution in promoting new technologies in human computer interaction. Today, there is a growing need to employ speech technology in daily life and business activities. However, speech recognition is a challenging task that requires different stages before obtaining the desired output. Among automatic speech recognition (ASR) components is the feature extraction process, which parameterizes the speech signal to produce the corresponding feature vectors. Feature extraction process aims at approximating the linguistic content that is conveyed by the input speech signal. In speech processing field, there are several methods to extract speech features, however, Mel Frequency Cepstral Coefficients (MFCC) is the popular technique. It has been long observed that the MFCC is dominantly used in the well-known recognizers such as the Carnegie Mellon University (CMU) Sphinx and the Markov Model Toolkit (HTK). Hence, this paper focuses on the MFCC method as the standard choice to identify the different speech segments in order to obtain the language phonemes for further training and decoding steps. Due to MFCC good performance, the previous studies show that the MFCC dominates the Arabic ASR research. In this paper, we demonstrate MFCC as well as the intermediate steps that are performed to get these coefficients using the HTK toolkit. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title="speech recognition">speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=acoustic%20features" title=" acoustic features"> acoustic features</a>, <a href="https://publications.waset.org/abstracts/search?q=mel%20frequency" title=" mel frequency"> mel frequency</a>, <a href="https://publications.waset.org/abstracts/search?q=cepstral%20coefficients" title=" cepstral coefficients"> cepstral coefficients</a> </p> <a href="https://publications.waset.org/abstracts/78382/the-capacity-of-mel-frequency-cepstral-coefficients-for-speech-recognition" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/78382.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">259</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2364</span> Automatic Speech Recognition Systems Performance Evaluation Using Word Error Rate Method</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Jo%C3%A3o%20Rato">João Rato</a>, <a href="https://publications.waset.org/abstracts/search?q=Nuno%20Costa"> Nuno Costa</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The human verbal communication is a two-way process which requires a mutual understanding that will result in some considerations. This kind of communication, also called dialogue, besides the supposed human agents it can also be performed between human agents and machines. The interaction between Men and Machines, by means of a natural language, has an important role concerning the improvement of the communication between each other. Aiming at knowing the performance of some speech recognition systems, this document shows the results of the accomplished tests according to the Word Error Rate evaluation method. Besides that, it is also given a set of information linked to the systems of Man-Machine communication. After this work has been made, conclusions were drawn regarding the Speech Recognition Systems, among which it can be mentioned their poor performance concerning the voice interpretation in noisy environments. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=automatic%20speech%20recognition" title="automatic speech recognition">automatic speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=man-machine%20conversation" title=" man-machine conversation"> man-machine conversation</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=spoken%20dialogue%20systems" title=" spoken dialogue systems"> spoken dialogue systems</a>, <a href="https://publications.waset.org/abstracts/search?q=word%20error%20rate" title=" word error rate"> word error rate</a> </p> <a href="https://publications.waset.org/abstracts/62274/automatic-speech-recognition-systems-performance-evaluation-using-word-error-rate-method" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/62274.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">322</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2363</span> Voice Commands Recognition of Mentor Robot in Noisy Environment Using HTK</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Khenfer-Koummich%20Fatma">Khenfer-Koummich Fatma</a>, <a href="https://publications.waset.org/abstracts/search?q=Hendel%20Fatiha"> Hendel Fatiha</a>, <a href="https://publications.waset.org/abstracts/search?q=Mesbahi%20Larbi"> Mesbahi Larbi </a> </p> <p class="card-text"><strong>Abstract:</strong></p> this paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a man-machine interface with a voice recognition system that allows the operator to tele-operate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands spoken in two languages: French and Arabic. The recognition rate obtained is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equal to 30 db, the Arabic speech recognition rate is 69% and 80% for French speech recognition rate. This can be explained by the ability of phonetic context of each speech when the noise is added. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=voice%20command" title="voice command">voice command</a>, <a href="https://publications.waset.org/abstracts/search?q=HMM" title=" HMM"> HMM</a>, <a href="https://publications.waset.org/abstracts/search?q=TIMIT" title=" TIMIT"> TIMIT</a>, <a href="https://publications.waset.org/abstracts/search?q=noise" title=" noise"> noise</a>, <a href="https://publications.waset.org/abstracts/search?q=HTK" title=" HTK"> HTK</a>, <a href="https://publications.waset.org/abstracts/search?q=Arabic" title=" Arabic"> Arabic</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a> </p> <a href="https://publications.waset.org/abstracts/24454/voice-commands-recognition-of-mentor-robot-in-noisy-environment-using-htk" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/24454.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">382</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2362</span> Recognition of Voice Commands of Mentor Robot in Noisy Environment Using Hidden Markov Model</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Khenfer%20Koummich%20Fatma">Khenfer Koummich Fatma</a>, <a href="https://publications.waset.org/abstracts/search?q=Hendel%20Fatiha"> Hendel Fatiha</a>, <a href="https://publications.waset.org/abstracts/search?q=Mesbahi%20Larbi"> Mesbahi Larbi</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This paper presents an approach based on Hidden Markov Models (HMM: Hidden Markov Model) using HTK tools. The goal is to create a human-machine interface with a voice recognition system that allows the operator to teleoperate a mentor robot to execute specific tasks as rotate, raise, close, etc. This system should take into account different levels of environmental noise. This approach has been applied to isolated words representing the robot commands pronounced in two languages: French and Arabic. The obtained recognition rate is the same in both speeches, Arabic and French in the neutral words. However, there is a slight difference in favor of the Arabic speech when Gaussian white noise is added with a Signal to Noise Ratio (SNR) equals 30 dB, in this case; the Arabic speech recognition rate is 69%, and the French speech recognition rate is 80%. This can be explained by the ability of phonetic context of each speech when the noise is added. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Arabic%20speech%20recognition" title="Arabic speech recognition">Arabic speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=Hidden%20Markov%20Model%20%28HMM%29" title=" Hidden Markov Model (HMM)"> Hidden Markov Model (HMM)</a>, <a href="https://publications.waset.org/abstracts/search?q=HTK" title=" HTK"> HTK</a>, <a href="https://publications.waset.org/abstracts/search?q=noise" title=" noise"> noise</a>, <a href="https://publications.waset.org/abstracts/search?q=TIMIT" title=" TIMIT"> TIMIT</a>, <a href="https://publications.waset.org/abstracts/search?q=voice%20command" title=" voice command"> voice command</a> </p> <a href="https://publications.waset.org/abstracts/67988/recognition-of-voice-commands-of-mentor-robot-in-noisy-environment-using-hidden-markov-model" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/67988.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">385</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2361</span> An Automatic Speech Recognition Tool for the Filipino Language Using the HTK System</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=John%20Lorenzo%20Bautista">John Lorenzo Bautista</a>, <a href="https://publications.waset.org/abstracts/search?q=Yoon-Joong%20Kim"> Yoon-Joong Kim</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This paper presents the development of a Filipino speech recognition tool using the HTK System. The system was trained from a subset of the Filipino Speech Corpus developed by the DSP Laboratory of the University of the Philippines-Diliman. The speech corpus was both used in training and testing the system by estimating the parameters for phonetic HMM-based (Hidden-Markov Model) acoustic models. Experiments on different mixture-weights were incorporated in the study. The phoneme-level word-based recognition of a 5-state HMM resulted in an average accuracy rate of 80.13 for a single-Gaussian mixture model, 81.13 after implementing a phoneme-alignment, and 87.19 for the increased Gaussian-mixture weight model. The highest accuracy rate of 88.70% was obtained from a 5-state model with 6 Gaussian mixtures. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Filipino%20language" title="Filipino language">Filipino language</a>, <a href="https://publications.waset.org/abstracts/search?q=Hidden%20Markov%20Model" title=" Hidden Markov Model"> Hidden Markov Model</a>, <a href="https://publications.waset.org/abstracts/search?q=HTK%20system" title=" HTK system"> HTK system</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a> </p> <a href="https://publications.waset.org/abstracts/10240/an-automatic-speech-recognition-tool-for-the-filipino-language-using-the-htk-system" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/10240.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">480</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2360</span> Bidirectional Dynamic Time Warping Algorithm for the Recognition of Isolated Words Impacted by Transient Noise Pulses</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=G.%20Tamulevi%C4%8Dius">G. Tamulevičius</a>, <a href="https://publications.waset.org/abstracts/search?q=A.%20Serackis"> A. Serackis</a>, <a href="https://publications.waset.org/abstracts/search?q=T.%20Sledevi%C4%8D"> T. Sledevič</a>, <a href="https://publications.waset.org/abstracts/search?q=D.%20Navakauskas"> D. Navakauskas</a> </p> <p class="card-text"><strong>Abstract:</strong></p> We consider the biggest challenge in speech recognition – noise reduction. Traditionally detected transient noise pulses are removed with the corrupted speech using pulse models. In this paper we propose to cope with the problem directly in Dynamic Time Warping domain. Bidirectional Dynamic Time Warping algorithm for the recognition of isolated words impacted by transient noise pulses is proposed. It uses simple transient noise pulse detector, employs bidirectional computation of dynamic time warping and directly manipulates with warping results. Experimental investigation with several alternative solutions confirms effectiveness of the proposed algorithm in the reduction of impact of noise on recognition process – 3.9% increase of the noisy speech recognition is achieved. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=transient%20noise%20pulses" title="transient noise pulses">transient noise pulses</a>, <a href="https://publications.waset.org/abstracts/search?q=noise%20reduction" title=" noise reduction"> noise reduction</a>, <a href="https://publications.waset.org/abstracts/search?q=dynamic%20time%20warping" title=" dynamic time warping"> dynamic time warping</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a> </p> <a href="https://publications.waset.org/abstracts/7831/bidirectional-dynamic-time-warping-algorithm-for-the-recognition-of-isolated-words-impacted-by-transient-noise-pulses" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/7831.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">558</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2359</span> Speech Detection Model Based on Deep Neural Networks Classifier for Speech Emotions Recognition</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=A.%20Shoiynbek">A. Shoiynbek</a>, <a href="https://publications.waset.org/abstracts/search?q=K.%20Kozhakhmet"> K. Kozhakhmet</a>, <a href="https://publications.waset.org/abstracts/search?q=P.%20Menezes"> P. Menezes</a>, <a href="https://publications.waset.org/abstracts/search?q=D.%20Kuanyshbay"> D. Kuanyshbay</a>, <a href="https://publications.waset.org/abstracts/search?q=D.%20Bayazitov"> D. Bayazitov</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Speech emotion recognition has received increasing research interest all through current years. There was used emotional speech that was collected under controlled conditions in most research work. Actors imitating and artificially producing emotions in front of a microphone noted those records. There are four issues related to that approach, namely, (1) emotions are not natural, and it means that machines are learning to recognize fake emotions. (2) Emotions are very limited by quantity and poor in their variety of speaking. (3) There is language dependency on SER. (4) Consequently, each time when researchers want to start work with SER, they need to find a good emotional database on their language. In this paper, we propose the approach to create an automatic tool for speech emotion extraction based on facial emotion recognition and describe the sequence of actions of the proposed approach. One of the first objectives of the sequence of actions is a speech detection issue. The paper gives a detailed description of the speech detection model based on a fully connected deep neural network for Kazakh and Russian languages. Despite the high results in speech detection for Kazakh and Russian, the described process is suitable for any language. To illustrate the working capacity of the developed model, we have performed an analysis of speech detection and extraction from real tasks. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=deep%20neural%20networks" title="deep neural networks">deep neural networks</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20detection" title=" speech detection"> speech detection</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20emotion%20recognition" title=" speech emotion recognition"> speech emotion recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=Mel-frequency%20cepstrum%20coefficients" title=" Mel-frequency cepstrum coefficients"> Mel-frequency cepstrum coefficients</a>, <a href="https://publications.waset.org/abstracts/search?q=collecting%20speech%20emotion%20corpus" title=" collecting speech emotion corpus"> collecting speech emotion corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=collecting%20speech%20emotion%20dataset" title=" collecting speech emotion dataset"> collecting speech emotion dataset</a>, <a href="https://publications.waset.org/abstracts/search?q=Kazakh%20speech%20dataset" title=" Kazakh speech dataset"> Kazakh speech dataset</a> </p> <a href="https://publications.waset.org/abstracts/152814/speech-detection-model-based-on-deep-neural-networks-classifier-for-speech-emotions-recognition" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/152814.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">101</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2358</span> Distant Speech Recognition Using Laser Doppler Vibrometer</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Yunbin%20Deng">Yunbin Deng</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=covert%20speech%20acquisition" title="covert speech acquisition">covert speech acquisition</a>, <a href="https://publications.waset.org/abstracts/search?q=distant%20speech%20recognition" title=" distant speech recognition"> distant speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=DSR" title=" DSR"> DSR</a>, <a href="https://publications.waset.org/abstracts/search?q=laser%20Doppler%20vibrometer" title=" laser Doppler vibrometer"> laser Doppler vibrometer</a>, <a href="https://publications.waset.org/abstracts/search?q=LDV" title=" LDV"> LDV</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20intelligence%20surveillance%20and%20reconnaissance" title=" speech intelligence surveillance and reconnaissance"> speech intelligence surveillance and reconnaissance</a>, <a href="https://publications.waset.org/abstracts/search?q=ISR" title=" ISR"> ISR</a> </p> <a href="https://publications.waset.org/abstracts/99091/distant-speech-recognition-using-laser-doppler-vibrometer" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/99091.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">179</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2357</span> Speech Detection Model Based on Deep Neural Networks Classifier for Speech Emotions Recognition</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Aisultan%20Shoiynbek">Aisultan Shoiynbek</a>, <a href="https://publications.waset.org/abstracts/search?q=Darkhan%20Kuanyshbay"> Darkhan Kuanyshbay</a>, <a href="https://publications.waset.org/abstracts/search?q=Paulo%20Menezes"> Paulo Menezes</a>, <a href="https://publications.waset.org/abstracts/search?q=Akbayan%20Bekarystankyzy"> Akbayan Bekarystankyzy</a>, <a href="https://publications.waset.org/abstracts/search?q=Assylbek%20Mukhametzhanov"> Assylbek Mukhametzhanov</a>, <a href="https://publications.waset.org/abstracts/search?q=Temirlan%20Shoiynbek"> Temirlan Shoiynbek</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Speech emotion recognition (SER) has received increasing research interest in recent years. It is a common practice to utilize emotional speech collected under controlled conditions recorded by actors imitating and artificially producing emotions in front of a microphone. There are four issues related to that approach: emotions are not natural, meaning that machines are learning to recognize fake emotions; emotions are very limited in quantity and poor in variety of speaking; there is some language dependency in SER; consequently, each time researchers want to start work with SER, they need to find a good emotional database in their language. This paper proposes an approach to create an automatic tool for speech emotion extraction based on facial emotion recognition and describes the sequence of actions involved in the proposed approach. One of the first objectives in the sequence of actions is the speech detection issue. The paper provides a detailed description of the speech detection model based on a fully connected deep neural network for Kazakh and Russian. Despite the high results in speech detection for Kazakh and Russian, the described process is suitable for any language. To investigate the working capacity of the developed model, an analysis of speech detection and extraction from real tasks has been performed. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=deep%20neural%20networks" title="deep neural networks">deep neural networks</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20detection" title=" speech detection"> speech detection</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20emotion%20recognition" title=" speech emotion recognition"> speech emotion recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=Mel-frequency%20cepstrum%20coefficients" title=" Mel-frequency cepstrum coefficients"> Mel-frequency cepstrum coefficients</a>, <a href="https://publications.waset.org/abstracts/search?q=collecting%20speech%20emotion%20corpus" title=" collecting speech emotion corpus"> collecting speech emotion corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=collecting%20speech%20emotion%20dataset" title=" collecting speech emotion dataset"> collecting speech emotion dataset</a>, <a href="https://publications.waset.org/abstracts/search?q=Kazakh%20speech%20dataset" title=" Kazakh speech dataset"> Kazakh speech dataset</a> </p> <a href="https://publications.waset.org/abstracts/189328/speech-detection-model-based-on-deep-neural-networks-classifier-for-speech-emotions-recognition" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/189328.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">26</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2356</span> A Two-Stage Adaptation towards Automatic Speech Recognition System for Malay-Speaking Children</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Mumtaz%20Begum%20Mustafa">Mumtaz Begum Mustafa</a>, <a href="https://publications.waset.org/abstracts/search?q=Siti%20Salwah%20Salim"> Siti Salwah Salim</a>, <a href="https://publications.waset.org/abstracts/search?q=Feizal%20Dani%20Rahman"> Feizal Dani Rahman</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Recently, Automatic Speech Recognition (ASR) systems were used to assist children in language acquisition as it has the ability to detect human speech signal. Despite the benefits offered by the ASR system, there is a lack of ASR systems for Malay-speaking children. One of the contributing factors for this is the lack of continuous speech database for the target users. Though cross-lingual adaptation is a common solution for developing ASR systems for under-resourced language, it is not viable for children as there are very limited speech databases as a source model. In this research, we propose a two-stage adaptation for the development of ASR system for Malay-speaking children using a very limited database. The two stage adaptation comprises the cross-lingual adaptation (first stage) and cross-age adaptation. For the first stage, a well-known speech database that is phonetically rich and balanced, is adapted to the medium-sized Malay adults using supervised MLLR. The second stage adaptation uses the speech acoustic model generated from the first adaptation, and the target database is a small-sized database of the target users. We have measured the performance of the proposed technique using word error rate, and then compare them with the conventional benchmark adaptation. The two stage adaptation proposed in this research has better recognition accuracy as compared to the benchmark adaptation in recognizing children’s speech. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Automatic%20Speech%20Recognition%20System" title="Automatic Speech Recognition System">Automatic Speech Recognition System</a>, <a href="https://publications.waset.org/abstracts/search?q=children%20speech" title=" children speech"> children speech</a>, <a href="https://publications.waset.org/abstracts/search?q=adaptation" title=" adaptation"> adaptation</a>, <a href="https://publications.waset.org/abstracts/search?q=Malay" title=" Malay"> Malay</a> </p> <a href="https://publications.waset.org/abstracts/46534/a-two-stage-adaptation-towards-automatic-speech-recognition-system-for-malay-speaking-children" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/46534.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">397</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2355</span> The Combination of the Mel Frequency Cepstral Coefficients, Perceptual Linear Prediction, Jitter and Shimmer Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Brahim%20Fares%20Zaidi">Brahim Fares Zaidi</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Our work aims to improve our Automatic Recognition System for Dysarthria Speech based on the Hidden Models of Markov and the Hidden Markov Model Toolkit to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients and Perceptual Linear Prediction and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=ARSDS" title="ARSDS">ARSDS</a>, <a href="https://publications.waset.org/abstracts/search?q=HTK" title=" HTK"> HTK</a>, <a href="https://publications.waset.org/abstracts/search?q=HMM" title=" HMM"> HMM</a>, <a href="https://publications.waset.org/abstracts/search?q=MFCC" title=" MFCC"> MFCC</a>, <a href="https://publications.waset.org/abstracts/search?q=PLP" title=" PLP"> PLP</a> </p> <a href="https://publications.waset.org/abstracts/158636/the-combination-of-the-mel-frequency-cepstral-coefficients-perceptual-linear-prediction-jitter-and-shimmer-coefficients-for-the-improvement-of-automatic-recognition-system-for-dysarthric-speech" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/158636.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">108</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2354</span> Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Jong%20Han%20Joo">Jong Han Joo</a>, <a href="https://publications.waset.org/abstracts/search?q=Jung%20Hoon%20Lee"> Jung Hoon Lee</a>, <a href="https://publications.waset.org/abstracts/search?q=Young%20Sun%20Kim"> Young Sun Kim</a>, <a href="https://publications.waset.org/abstracts/search?q=Jae%20Young%20Kang"> Jae Young Kang</a>, <a href="https://publications.waset.org/abstracts/search?q=Seung%20Ho%20Choi"> Seung Ho Choi</a> </p> <p class="card-text"><strong>Abstract:</strong></p> In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. We propose a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=acoustic%20echo%20suppression" title="acoustic echo suppression">acoustic echo suppression</a>, <a href="https://publications.waset.org/abstracts/search?q=barge-in" title=" barge-in"> barge-in</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=echo%20path%20transfer%20function" title=" echo path transfer function"> echo path transfer function</a>, <a href="https://publications.waset.org/abstracts/search?q=initial%20delay%20estimator" title=" initial delay estimator"> initial delay estimator</a>, <a href="https://publications.waset.org/abstracts/search?q=voice%20activity%20detector" title=" voice activity detector"> voice activity detector</a> </p> <a href="https://publications.waset.org/abstracts/17151/environmentally-adaptive-acoustic-echo-suppression-for-barge-in-speech-recognition" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/17151.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">372</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2353</span> Exploratory Analysis of A Review of Nonexistence Polarity in Native Speech</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Deawan%20Rakin%20Ahamed%20Remal">Deawan Rakin Ahamed Remal</a>, <a href="https://publications.waset.org/abstracts/search?q=Sinthia%20Chowdhury"> Sinthia Chowdhury</a>, <a href="https://publications.waset.org/abstracts/search?q=Sharun%20Akter%20Khushbu"> Sharun Akter Khushbu</a>, <a href="https://publications.waset.org/abstracts/search?q=Sheak%20Rashed%20Haider%20Noori"> Sheak Rashed Haider Noori</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Native Speech to text synthesis has its own leverage for the purpose of mankind. The extensive nature of art to speaking different accents is common but the purpose of communication between two different accent types of people is quite difficult. This problem will be motivated by the extraction of the wrong perception of language meaning. Thus, many existing automatic speech recognition has been placed to detect text. Overall study of this paper mentions a review of NSTTR (Native Speech Text to Text Recognition) synthesis compared with Text to Text recognition. Review has exposed many text to text recognition systems that are at a very early stage to comply with the system by native speech recognition. Many discussions started about the progression of chatbots, linguistic theory another is rule based approach. In the Recent years Deep learning is an overwhelming chapter for text to text learning to detect language nature. To the best of our knowledge, In the sub continent a huge number of people speak in Bangla language but they have different accents in different regions therefore study has been elaborate contradictory discussion achievement of existing works and findings of future needs in Bangla language acoustic accent. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=TTR" title="TTR">TTR</a>, <a href="https://publications.waset.org/abstracts/search?q=NSTTR" title=" NSTTR"> NSTTR</a>, <a href="https://publications.waset.org/abstracts/search?q=text%20to%20text%20recognition" title=" text to text recognition"> text to text recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=deep%20learning" title=" deep learning"> deep learning</a>, <a href="https://publications.waset.org/abstracts/search?q=natural%20language%20processing" title=" natural language processing"> natural language processing</a> </p> <a href="https://publications.waset.org/abstracts/149060/exploratory-analysis-of-a-review-of-nonexistence-polarity-in-native-speech" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/149060.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">132</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2352</span> Theory and Practice of Wavelets in Signal Processing</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Jalal%20Karam">Jalal Karam</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The methods of Fourier, Laplace, and Wavelet Transforms provide transfer functions and relationships between the input and the output signals in linear time invariant systems. This paper shows the equivalence among these three methods and in each case presenting an application of the appropriate (Fourier, Laplace or Wavelet) to the convolution theorem. In addition, it is shown that the same holds for a direct integration method. The Biorthogonal wavelets Bior3.5 and Bior3.9 are examined and the zeros distribution of their polynomials associated filters are located. This paper also presents the significance of utilizing wavelets as effective tools in processing speech signals for common multimedia applications in general, and for recognition and compression in particular. Theoretically and practically, wavelets have proved to be effective and competitive. The practical use of the Continuous Wavelet Transform (CWT) in processing and analysis of speech is then presented along with explanations of how the human ear can be thought of as a natural wavelet transformer of speech. This generates a variety of approaches for applying the (CWT) to many paradigms analysing speech, sound and music. For perception, the flexibility of implementation of this transform allows the construction of numerous scales and we include two of them. Results for speech recognition and speech compression are then included. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=continuous%20wavelet%20transform" title="continuous wavelet transform">continuous wavelet transform</a>, <a href="https://publications.waset.org/abstracts/search?q=biorthogonal%20wavelets" title=" biorthogonal wavelets"> biorthogonal wavelets</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20perception" title=" speech perception"> speech perception</a>, <a href="https://publications.waset.org/abstracts/search?q=recognition%20and%20compression" title=" recognition and compression"> recognition and compression</a> </p> <a href="https://publications.waset.org/abstracts/5822/theory-and-practice-of-wavelets-in-signal-processing" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/5822.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">416</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2351</span> Influence of Auditory Visual Information in Speech Perception in Children with Normal Hearing and Cochlear Implant</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Sachin">Sachin</a>, <a href="https://publications.waset.org/abstracts/search?q=Shantanu%20Arya"> Shantanu Arya</a>, <a href="https://publications.waset.org/abstracts/search?q=Gunjan%20Mehta"> Gunjan Mehta</a>, <a href="https://publications.waset.org/abstracts/search?q=Md.%20Shamim%20Ansari"> Md. Shamim Ansari</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The cross-modal influence of visual information on speech perception can be illustrated by the McGurk effect which is an illusion of hearing of syllable /ta/ when a listener listens one syllable, e.g.: /pa/ while watching a synchronized video recording of syllable, /ka/. The McGurk effect is an excellent tool to investigate multisensory integration in speech perception in both normal hearing and hearing impaired populations. As the visual cue is unaffected by noise, individuals with hearing impairment rely more than normal listeners on the visual cues.However, when non congruent visual and auditory cues are processed together, audiovisual interaction seems to occur differently in normal and persons with hearing impairment. Therefore, this study aims to observe the audiovisual interaction in speech perception in Cochlear Implant users compares the same with normal hearing children. Auditory stimuli was routed through calibrated Clinical audiometer in sound field condition, and visual stimuli were presented on laptop screen placed at a distance of 1m at 0 degree azimuth. Out of 4 presentations, if 3 responses were a fusion, then McGurk effect was considered to be present. The congruent audiovisual stimuli /pa/ /pa/ and /ka/ /ka/ were perceived correctly as ‘‘pa’’ and ‘‘ka,’’ respectively by both the groups. For the non- congruent stimuli /da/ /pa/, 23 children out of 35 with normal hearing and 9 children out of 35 with cochlear implant had a fusion of sounds i.e. McGurk effect was present. For the non-congruent stimulus /pa/ /ka/, 25 children out of 35 with normal hearing and 8 children out of 35 with cochlear implant had fusion of sounds.The children who used cochlear implants for less than three years did not exhibit fusion of sound i.e. McGurk effect was absent in this group of children. To conclude, the results demonstrate that consistent fusion of visual with auditory information for speech perception is shaped by experience with bimodal spoken language during early life. When auditory experience with speech is mediated by cochlear implant, the likelihood of acquiring bimodal fusion is increased and it greatly depends on the age of implantation. All the above results strongly support the need for screening children for hearing capabilities and providing cochlear implants and aural rehabilitation as early as possible. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=cochlear%20implant" title="cochlear implant">cochlear implant</a>, <a href="https://publications.waset.org/abstracts/search?q=congruent%20stimuli" title=" congruent stimuli"> congruent stimuli</a>, <a href="https://publications.waset.org/abstracts/search?q=mcgurk%20effect" title=" mcgurk effect"> mcgurk effect</a>, <a href="https://publications.waset.org/abstracts/search?q=non-congruent%20stimuli" title=" non-congruent stimuli"> non-congruent stimuli</a> </p> <a href="https://publications.waset.org/abstracts/52237/influence-of-auditory-visual-information-in-speech-perception-in-children-with-normal-hearing-and-cochlear-implant" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/52237.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">308</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2350</span> Recognition by the Voice and Speech Features of the Emotional State of Children by Adults and Automatically</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Elena%20E.%20Lyakso">Elena E. Lyakso</a>, <a href="https://publications.waset.org/abstracts/search?q=Olga%20V.%20Frolova"> Olga V. Frolova</a>, <a href="https://publications.waset.org/abstracts/search?q=Yuri%20N.%20Matveev"> Yuri N. Matveev</a>, <a href="https://publications.waset.org/abstracts/search?q=Aleksey%20S.%20Grigorev"> Aleksey S. Grigorev</a>, <a href="https://publications.waset.org/abstracts/search?q=Alexander%20S.%20Nikolaev"> Alexander S. Nikolaev</a>, <a href="https://publications.waset.org/abstracts/search?q=Viktor%20A.%20Gorodnyi"> Viktor A. Gorodnyi</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The study of the children’s emotional sphere depending on age and psychoneurological state is of great importance for the design of educational programs for children and their social adaptation. Atypical development may be accompanied by violations or specificities of the emotional sphere. To study characteristics of the emotional state reflection in the voice and speech features of children, the perceptual study with the participation of adults and the automatic recognition of speech were conducted. Speech of children with typical development (TD), with Down syndrome (DS), and with autism spectrum disorders (ASD) aged 6-12 years was recorded. To obtain emotional speech in children, model situations were created, including a dialogue between the child and the experimenter containing questions that can cause various emotional states in the child and playing with a standard set of toys. The questions and toys were selected, taking into account the child’s age, developmental characteristics, and speech skills. For the perceptual experiment by adults, test sequences containing speech material of 30 children: TD, DS, and ASD were created. The listeners were 100 adults (age 19.3 ± 2.3 years). The listeners were tasked with determining the children’s emotional state as “comfort – neutral – discomfort” while listening to the test material. Spectrographic analysis of speech signals was conducted. For automatic recognition of the emotional state, 6594 speech files containing speech material of children were prepared. Automatic recognition of three states, “comfort – neutral – discomfort,” was performed using automatically extracted from the set of acoustic features - the Geneva Minimalistic Acoustic Parameter Set (GeMAPS) and the extended Geneva Minimalistic Acoustic Parameter Set (eGeMAPS). The results showed that the emotional state is worse determined by the speech of TD children (comfort – 58% of correct answers, discomfort – 56%). Listeners better recognized discomfort in children with ASD and DS (78% of answers) than comfort (70% and 67%, respectively, for children with DS and ASD). The neutral state is better recognized by the speech of children with ASD (67%) than by the speech of children with DS (52%) and TD children (54%). According to the automatic recognition data using the acoustic feature set GeMAPSv01b, the accuracy of automatic recognition of emotional states for children with ASD is 0.687; children with DS – 0.725; TD children – 0.641. When using the acoustic feature set eGeMAPSv01b, the accuracy of automatic recognition of emotional states for children with ASD is 0.671; children with DS – 0.717; TD children – 0.631. The use of different models showed similar results, with better recognition of emotional states by the speech of children with DS than by the speech of children with ASD. The state of comfort is automatically determined better by the speech of TD children (precision – 0.546) and children with ASD (0.523), discomfort – children with DS (0.504). The data on the specificities of recognition by adults of the children’s emotional state by their speech may be used in recruitment for working with children with atypical development. Automatic recognition data can be used to create alternative communication systems and automatic human-computer interfaces for social-emotional learning. Acknowledgment: This work was financially supported by the Russian Science Foundation (project 18-18-00063). <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=autism%20spectrum%20disorders" title="autism spectrum disorders">autism spectrum disorders</a>, <a href="https://publications.waset.org/abstracts/search?q=automatic%20recognition%20of%20speech" title=" automatic recognition of speech"> automatic recognition of speech</a>, <a href="https://publications.waset.org/abstracts/search?q=child%E2%80%99s%20emotional%20speech" title=" child’s emotional speech"> child’s emotional speech</a>, <a href="https://publications.waset.org/abstracts/search?q=Down%20syndrome" title=" Down syndrome"> Down syndrome</a>, <a href="https://publications.waset.org/abstracts/search?q=perceptual%20experiment" title=" perceptual experiment"> perceptual experiment</a> </p> <a href="https://publications.waset.org/abstracts/134192/recognition-by-the-voice-and-speech-features-of-the-emotional-state-of-children-by-adults-and-automatically" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/134192.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">189</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2349</span> Speech Emotion Recognition with Bi-GRU and Self-Attention based Feature Representation</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Bubai%20Maji">Bubai Maji</a>, <a href="https://publications.waset.org/abstracts/search?q=Monorama%20Swain"> Monorama Swain</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Speech is considered an essential and most natural medium for the interaction between machines and humans. However, extracting effective features for speech emotion recognition (SER) is remains challenging. The present studies show that the temporal information captured but high-level temporal-feature learning is yet to be investigated. In this paper, we present an efficient novel method using the Self-attention (SA) mechanism in a combination of Convolutional Neural Network (CNN) and Bi-directional Gated Recurrent Unit (Bi-GRU) network to learn high-level temporal-feature. In order to further enhance the representation of the high-level temporal-feature, we integrate a Bi-GRU output with learnable weights features by SA, and improve the performance. We evaluate our proposed method on our created SITB-OSED and IEMOCAP databases. We report that the experimental results of our proposed method achieve state-of-the-art performance on both databases. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Bi-GRU" title="Bi-GRU">Bi-GRU</a>, <a href="https://publications.waset.org/abstracts/search?q=1D-CNNs" title=" 1D-CNNs"> 1D-CNNs</a>, <a href="https://publications.waset.org/abstracts/search?q=self-attention" title=" self-attention"> self-attention</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20emotion%20recognition" title=" speech emotion recognition"> speech emotion recognition</a> </p> <a href="https://publications.waset.org/abstracts/148332/speech-emotion-recognition-with-bi-gru-and-self-attention-based-feature-representation" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/148332.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">113</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2348</span> Speech Recognition Performance by Adults: A Proposal for a Battery for Marathi</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=S.%20B.%20Rathna%20Kumar">S. B. Rathna Kumar</a>, <a href="https://publications.waset.org/abstracts/search?q=Pranjali%20A%20Ujwane"> Pranjali A Ujwane</a>, <a href="https://publications.waset.org/abstracts/search?q=Panchanan%20Mohanty"> Panchanan Mohanty</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The present study aimed to develop a battery for assessing speech recognition performance by adults in Marathi. A total of four word lists were developed by considering word frequency, word familiarity, words in common use, and phonemic balance. Each word list consists of 25 words (15 monosyllabic words in CVC structure and 10 monosyllabic words in CVCV structure). Equivalence analysis and performance-intensity function testing was carried using the four word lists on a total of 150 native speakers of Marathi belonging to different regions of Maharashtra (Vidarbha, Marathwada, Khandesh and Northern Maharashtra, Pune, and Konkan). The subjects were further equally divided into five groups based on above mentioned regions. It was found that there was no significant difference (p > 0.05) in the speech recognition performance between groups for each word list and between word lists for each group. Hence, the four word lists developed were equally difficult for all the groups and can be used interchangeably. The performance-intensity (PI) function curve showed semi-linear function, and the groups’ mean slope of the linear portions of the curve indicated an average linear slope of 4.64%, 4.73%, 4.68%, and 4.85% increase in word recognition score per dB for list 1, list 2, list 3 and list 4 respectively. Although, there is no data available on speech recognition tests for adults in Marathi, most of the findings of the study are in line with the findings of research reports on other languages. The four word lists, thus developed, were found to have sufficient reliability and validity in assessing speech recognition performance by adults in Marathi. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition%20performance" title="speech recognition performance">speech recognition performance</a>, <a href="https://publications.waset.org/abstracts/search?q=phonemic%20balance" title=" phonemic balance"> phonemic balance</a>, <a href="https://publications.waset.org/abstracts/search?q=equivalence%20analysis" title=" equivalence analysis"> equivalence analysis</a>, <a href="https://publications.waset.org/abstracts/search?q=performance-intensity%20function%20testing" title=" performance-intensity function testing"> performance-intensity function testing</a>, <a href="https://publications.waset.org/abstracts/search?q=reliability" title=" reliability"> reliability</a>, <a href="https://publications.waset.org/abstracts/search?q=validity" title=" validity"> validity</a> </p> <a href="https://publications.waset.org/abstracts/41329/speech-recognition-performance-by-adults-a-proposal-for-a-battery-for-marathi" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/41329.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">356</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2347</span> Speech Enhancement Using Wavelet Coefficients Masking with Local Binary Patterns</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Christian%20Arcos">Christian Arcos</a>, <a href="https://publications.waset.org/abstracts/search?q=Marley%20Vellasco"> Marley Vellasco</a>, <a href="https://publications.waset.org/abstracts/search?q=Abraham%20Alcaim"> Abraham Alcaim</a> </p> <p class="card-text"><strong>Abstract:</strong></p> In this paper, we present a wavelet coefficients masking based on Local Binary Patterns (WLBP) approach to enhance the temporal spectra of the wavelet coefficients for speech enhancement. This technique exploits the wavelet denoising scheme, which splits the degraded speech into pyramidal subband components and extracts frequency information without losing temporal information. Speech enhancement in each high-frequency subband is performed by binary labels through the local binary pattern masking that encodes the ratio between the original value of each coefficient and the values of the neighbour coefficients. This approach enhances the high-frequency spectra of the wavelet transform instead of eliminating them through a threshold. A comparative analysis is carried out with conventional speech enhancement algorithms, demonstrating that the proposed technique achieves significant improvements in terms of PESQ, an international recommendation of objective measure for estimating subjective speech quality. Informal listening tests also show that the proposed method in an acoustic context improves the quality of speech, avoiding the annoying musical noise present in other speech enhancement techniques. Experimental results obtained with a DNN based speech recognizer in noisy environments corroborate the superiority of the proposed scheme in the robust speech recognition scenario. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=binary%20labels" title="binary labels">binary labels</a>, <a href="https://publications.waset.org/abstracts/search?q=local%20binary%20patterns" title=" local binary patterns"> local binary patterns</a>, <a href="https://publications.waset.org/abstracts/search?q=mask" title=" mask"> mask</a>, <a href="https://publications.waset.org/abstracts/search?q=wavelet%20coefficients" title=" wavelet coefficients"> wavelet coefficients</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20enhancement" title=" speech enhancement"> speech enhancement</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a> </p> <a href="https://publications.waset.org/abstracts/79985/speech-enhancement-using-wavelet-coefficients-masking-with-local-binary-patterns" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/79985.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">229</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2346</span> The Combination of the Mel Frequency Cepstral Coefficients (MFCC), Perceptual Linear Prediction (PLP), JITTER and SHIMMER Coefficients for the Improvement of Automatic Recognition System for Dysarthric Speech</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Brahim-Fares%20Zaidi">Brahim-Fares Zaidi</a>, <a href="https://publications.waset.org/abstracts/search?q=Malika%20Boudraa"> Malika Boudraa</a>, <a href="https://publications.waset.org/abstracts/search?q=Sid-Ahmed%20Selouani"> Sid-Ahmed Selouani</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Our work aims to improve our Automatic Recognition System for Dysarthria Speech (ARSDS) based on the Hidden Models of Markov (HMM) and the Hidden Markov Model Toolkit (HTK) to help people who are sick. With pronunciation problems, we applied two techniques of speech parameterization based on Mel Frequency Cepstral Coefficients (MFCC's) and Perceptual Linear Prediction (PLP's) and concatenated them with JITTER and SHIMMER coefficients in order to increase the recognition rate of a dysarthria speech. For our tests, we used the NEMOURS database that represents speakers with dysarthria and normal speakers. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=hidden%20Markov%20model%20toolkit%20%28HTK%29" title="hidden Markov model toolkit (HTK)">hidden Markov model toolkit (HTK)</a>, <a href="https://publications.waset.org/abstracts/search?q=hidden%20models%20of%20Markov%20%28HMM%29" title=" hidden models of Markov (HMM)"> hidden models of Markov (HMM)</a>, <a href="https://publications.waset.org/abstracts/search?q=Mel-frequency%20cepstral%20coefficients%20%28MFCC%29" title=" Mel-frequency cepstral coefficients (MFCC)"> Mel-frequency cepstral coefficients (MFCC)</a>, <a href="https://publications.waset.org/abstracts/search?q=perceptual%20linear%20prediction%20%28PLP%E2%80%99s%29" title=" perceptual linear prediction (PLP’s)"> perceptual linear prediction (PLP’s)</a> </p> <a href="https://publications.waset.org/abstracts/143303/the-combination-of-the-mel-frequency-cepstral-coefficients-mfcc-perceptual-linear-prediction-plp-jitter-and-shimmer-coefficients-for-the-improvement-of-automatic-recognition-system-for-dysarthric-speech" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/143303.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">161</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2345</span> Comparative Methods for Speech Enhancement and the Effects on Text-Independent Speaker Identification Performance</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=R.%20Ajgou">R. Ajgou</a>, <a href="https://publications.waset.org/abstracts/search?q=S.%20Sbaa"> S. Sbaa</a>, <a href="https://publications.waset.org/abstracts/search?q=S.%20Ghendir"> S. Ghendir</a>, <a href="https://publications.waset.org/abstracts/search?q=A.%20Chemsa"> A. Chemsa</a>, <a href="https://publications.waset.org/abstracts/search?q=A.%20Taleb-Ahmed"> A. Taleb-Ahmed</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The speech enhancement algorithm is to improve speech quality. In this paper, we review some speech enhancement methods and we evaluated their performance based on Perceptual Evaluation of Speech Quality scores (PESQ, ITU-T P.862). All method was evaluated in presence of different kind of noise using TIMIT database and NOIZEUS noisy speech corpus.. The noise was taken from the AURORA database and includes suburban train noise, babble, car, exhibition hall, restaurant, street, airport and train station noise. Simulation results showed improved performance of speech enhancement for Tracking of non-stationary noise approach in comparison with various methods in terms of PESQ measure. Moreover, we have evaluated the effects of the speech enhancement technique on Speaker Identification system based on autoregressive (AR) model and Mel-frequency Cepstral coefficients (MFCC). <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=speech%20enhancement" title="speech enhancement">speech enhancement</a>, <a href="https://publications.waset.org/abstracts/search?q=pesq" title=" pesq"> pesq</a>, <a href="https://publications.waset.org/abstracts/search?q=speaker%20recognition" title=" speaker recognition"> speaker recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=MFCC" title=" MFCC"> MFCC</a> </p> <a href="https://publications.waset.org/abstracts/31102/comparative-methods-for-speech-enhancement-and-the-effects-on-text-independent-speaker-identification-performance" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/31102.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">424</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2344</span> An Automatic Speech Recognition of Conversational Telephone Speech in Malay Language</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=M.%20Draman">M. Draman</a>, <a href="https://publications.waset.org/abstracts/search?q=S.%20Z.%20Muhamad%20Yassin"> S. Z. Muhamad Yassin</a>, <a href="https://publications.waset.org/abstracts/search?q=M.%20S.%20Alias"> M. S. Alias</a>, <a href="https://publications.waset.org/abstracts/search?q=Z.%20Lambak"> Z. Lambak</a>, <a href="https://publications.waset.org/abstracts/search?q=M.%20I.%20Zulkifli"> M. I. Zulkifli</a>, <a href="https://publications.waset.org/abstracts/search?q=S.%20N.%20Padhi"> S. N. Padhi</a>, <a href="https://publications.waset.org/abstracts/search?q=K.%20N.%20Baharim"> K. N. Baharim</a>, <a href="https://publications.waset.org/abstracts/search?q=F.%20Maskuriy"> F. Maskuriy</a>, <a href="https://publications.waset.org/abstracts/search?q=A.%20I.%20A.%20Rahim"> A. I. A. Rahim</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The performance of Malay automatic speech recognition (ASR) system for the call centre environment is presented. The system utilizes Kaldi toolkit as the platform to the entire library and algorithm used in performing the ASR task. The acoustic model implemented in this system uses a deep neural network (DNN) method to model the acoustic signal and the standard (n-gram) model for language modelling. With 80 hours of training data from the call centre recordings, the ASR system can achieve 72% of accuracy that corresponds to 28% of word error rate (WER). The testing was done using 20 hours of audio data. Despite the implementation of DNN, the system shows a low accuracy owing to the varieties of noises, accent and dialect that typically occurs in Malaysian call centre environment. This significant variation of speakers is reflected by the large standard deviation of the average word error rate (WERav) (i.e., ~ 10%). It is observed that the lowest WER (13.8%) was obtained from recording sample with a standard Malay dialect (central Malaysia) of native speaker as compared to 49% of the sample with the highest WER that contains conversation of the speaker that uses non-standard Malay dialect. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=conversational%20speech%20recognition" title="conversational speech recognition">conversational speech recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=deep%20neural%20network" title=" deep neural network"> deep neural network</a>, <a href="https://publications.waset.org/abstracts/search?q=Malay%20language" title=" Malay language"> Malay language</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a> </p> <a href="https://publications.waset.org/abstracts/93292/an-automatic-speech-recognition-of-conversational-telephone-speech-in-malay-language" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/93292.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">322</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2343</span> Hindi Speech Synthesis by Concatenation of Recognized Hand Written Devnagri Script Using Support Vector Machines Classifier</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Saurabh%20Farkya">Saurabh Farkya</a>, <a href="https://publications.waset.org/abstracts/search?q=Govinda%20Surampudi"> Govinda Surampudi</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Optical Character Recognition is one of the current major research areas. This paper is focussed on recognition of Devanagari script and its sound generation. This Paper consists of two parts. First, Optical Character Recognition of Devnagari handwritten Script. Second, speech synthesis of the recognized text. This paper shows an implementation of support vector machines for the purpose of Devnagari Script recognition. The Support Vector Machines was trained with Multi Domain features; Transform Domain and Spatial Domain or Structural Domain feature. Transform Domain includes the wavelet feature of the character. Structural Domain consists of Distance Profile feature and Gradient feature. The Segmentation of the text document has been done in 3 levels-Line Segmentation, Word Segmentation, and Character Segmentation. The pre-processing of the characters has been done with the help of various Morphological operations-Otsu's Algorithm, Erosion, Dilation, Filtration and Thinning techniques. The Algorithm was tested on the self-prepared database, a collection of various handwriting. Further, Unicode was used to convert recognized Devnagari text into understandable computer document. The document so obtained is an array of codes which was used to generate digitized text and to synthesize Hindi speech. Phonemes from the self-prepared database were used to generate the speech of the scanned document using concatenation technique. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Character%20Recognition%20%28OCR%29" title="Character Recognition (OCR)">Character Recognition (OCR)</a>, <a href="https://publications.waset.org/abstracts/search?q=Text%20to%20Speech%20%28TTS%29" title=" Text to Speech (TTS)"> Text to Speech (TTS)</a>, <a href="https://publications.waset.org/abstracts/search?q=Support%20Vector%20Machines%20%28SVM%29" title=" Support Vector Machines (SVM)"> Support Vector Machines (SVM)</a>, <a href="https://publications.waset.org/abstracts/search?q=Library%20of%20Support%20Vector%20Machines%20%28LIBSVM%29" title=" Library of Support Vector Machines (LIBSVM)"> Library of Support Vector Machines (LIBSVM)</a> </p> <a href="https://publications.waset.org/abstracts/19232/hindi-speech-synthesis-by-concatenation-of-recognized-hand-written-devnagri-script-using-support-vector-machines-classifier" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/19232.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">499</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2342</span> Recognition of Noisy Words Using the Time Delay Neural Networks Approach</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Khenfer-Koummich%20Fatima">Khenfer-Koummich Fatima</a>, <a href="https://publications.waset.org/abstracts/search?q=Mesbahi%20Larbi"> Mesbahi Larbi</a>, <a href="https://publications.waset.org/abstracts/search?q=Hendel%20Fatiha"> Hendel Fatiha</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This paper presents a recognition system for isolated words like robot commands. It’s carried out by Time Delay Neural Networks; TDNN. To teleoperate a robot for specific tasks as turn, close, etc… In industrial environment and taking into account the noise coming from the machine. The choice of TDNN is based on its generalization in terms of accuracy, in more it acts as a filter that allows the passage of certain desirable frequency characteristics of speech; the goal is to determine the parameters of this filter for making an adaptable system to the variability of speech signal and to noise especially, for this the back propagation technique was used in learning phase. The approach was applied on commands pronounced in two languages separately: The French and Arabic. The results for two test bases of 300 spoken words for each one are 87%, 97.6% in neutral environment and 77.67%, 92.67% when the white Gaussian noisy was added with a SNR of 35 dB. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=TDNN" title="TDNN">TDNN</a>, <a href="https://publications.waset.org/abstracts/search?q=neural%20networks" title=" neural networks"> neural networks</a>, <a href="https://publications.waset.org/abstracts/search?q=noise" title=" noise"> noise</a>, <a href="https://publications.waset.org/abstracts/search?q=speech%20recognition" title=" speech recognition"> speech recognition</a> </p> <a href="https://publications.waset.org/abstracts/13254/recognition-of-noisy-words-using-the-time-delay-neural-networks-approach" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/13254.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">289</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">2341</span> Perceiving Casual Speech: A Gating Experiment with French Listeners of L2 English</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Naouel%20Zoghlami">Naouel Zoghlami</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Spoken-word recognition involves the simultaneous activation of potential word candidates which compete with each other for final correct recognition. In continuous speech, the activation-competition process gets more complicated due to speech reductions existing at word boundaries. Lexical processing is more difficult in L2 than in L1 because L2 listeners often lack phonetic, lexico-semantic, syntactic, and prosodic knowledge in the target language. In this study, we investigate the on-line lexical segmentation hypotheses that French listeners of L2 English form and then revise as subsequent perceptual evidence is revealed. Our purpose is to shed further light on the processes of L2 spoken-word recognition in context and better understand L2 listening difficulties through a comparison of skilled and unskilled reactions at the point where their working hypothesis is rejected. We use a variant of the gating experiment in which subjects transcribe an English sentence presented in increments of progressively greater duration. The spoken sentence was “And this amazing athlete has just broken another world record”, chosen mainly because it included common reductions and phonetic features in English, such as elision and assimilation. Our preliminary results show that there is an important difference in the manner in which proficient and less-proficient L2 listeners handle connected speech. Less-proficient listeners delay recognition of words as they wait for lexical and syntactic evidence to appear in the gates. Further statistical results are currently being undertaken. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=gating%20paradigm" title="gating paradigm">gating paradigm</a>, <a href="https://publications.waset.org/abstracts/search?q=spoken%20word%20recognition" title=" spoken word recognition"> spoken word recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=online%20lexical%20segmentation" title=" online lexical segmentation"> online lexical segmentation</a>, <a href="https://publications.waset.org/abstracts/search?q=L2%20listening" title=" L2 listening "> L2 listening </a> </p> <a href="https://publications.waset.org/abstracts/19791/perceiving-casual-speech-a-gating-experiment-with-french-listeners-of-l2-english" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/19791.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">464</span> </span> </div> </div> <ul class="pagination"> <li class="page-item disabled"><span class="page-link">&lsaquo;</span></li> <li class="page-item active"><span class="page-link">1</span></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=2">2</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=3">3</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=4">4</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=5">5</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=6">6</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=7">7</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=8">8</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=9">9</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=10">10</a></li> <li class="page-item disabled"><span class="page-link">...</span></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=78">78</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=79">79</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=audio-visual%20speech%20recognition&amp;page=2" rel="next">&rsaquo;</a></li> </ul> </div> </main> <footer> <div id="infolinks" class="pt-3 pb-2"> <div class="container"> <div style="background-color:#f5f5f5;" class="p-3"> <div class="row"> <div class="col-md-2"> <ul class="list-unstyled"> About <li><a href="https://waset.org/page/support">About Us</a></li> <li><a href="https://waset.org/page/support#legal-information">Legal</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/WASET-16th-foundational-anniversary.pdf">WASET celebrates its 16th foundational anniversary</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Account <li><a href="https://waset.org/profile">My Account</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Explore <li><a href="https://waset.org/disciplines">Disciplines</a></li> <li><a href="https://waset.org/conferences">Conferences</a></li> <li><a href="https://waset.org/conference-programs">Conference Program</a></li> <li><a href="https://waset.org/committees">Committees</a></li> <li><a href="https://publications.waset.org">Publications</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Research <li><a href="https://publications.waset.org/abstracts">Abstracts</a></li> <li><a href="https://publications.waset.org">Periodicals</a></li> <li><a href="https://publications.waset.org/archive">Archive</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Open Science <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Philosophy.pdf">Open Science Philosophy</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Award.pdf">Open Science Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Society-Open-Science-and-Open-Innovation.pdf">Open Innovation</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Postdoctoral-Fellowship-Award.pdf">Postdoctoral Fellowship Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Scholarly-Research-Review.pdf">Scholarly Research Review</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Support <li><a href="https://waset.org/page/support">Support</a></li> <li><a href="https://waset.org/profile/messages/create">Contact Us</a></li> <li><a href="https://waset.org/profile/messages/create">Report Abuse</a></li> </ul> </div> </div> </div> </div> </div> <div class="container text-center"> <hr style="margin-top:0;margin-bottom:.3rem;"> <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" class="text-muted small">Creative Commons Attribution 4.0 International License</a> <div id="copy" class="mt-2">&copy; 2024 World Academy of Science, Engineering and Technology</div> </div> </footer> <a href="javascript:" id="return-to-top"><i class="fas fa-arrow-up"></i></a> <div class="modal" id="modal-template"> <div class="modal-dialog"> <div class="modal-content"> <div class="row m-0 mt-1"> <div class="col-md-12"> <button type="button" class="close" data-dismiss="modal" aria-label="Close"><span aria-hidden="true">&times;</span></button> </div> </div> <div class="modal-body"></div> </div> </div> </div> <script src="https://cdn.waset.org/static/plugins/jquery-3.3.1.min.js"></script> <script src="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/js/bootstrap.bundle.min.js"></script> <script src="https://cdn.waset.org/static/js/site.js?v=150220211556"></script> <script> jQuery(document).ready(function() { /*jQuery.get("https://publications.waset.org/xhr/user-menu", function (response) { jQuery('#mainNavMenu').append(response); });*/ jQuery.get({ url: "https://publications.waset.org/xhr/user-menu", cache: false }).then(function(response){ jQuery('#mainNavMenu').append(response); }); }); </script> </body> </html>

Pages: 1 2 3 4 5 6 7 8 9 10