CINXE.COM
ISCA Archive
<!DOCTYPE html> <html> <head> <meta charset="UTF-8"> <title>ISCA Archive</title> <meta name="viewport" content="width=device-width, initial-scale=1"> <link rel="stylesheet" href="../resources/jquery.dataTables.min.css"> <script src="../resources/jquery-3.5.1.min.js"></script> <script src="../resources/jquery.dataTables.min.js"></script> <script src="../resources/accent-neutralise.js"></script> <!-- Fonts --> <link rel="stylesheet" href="../resources/fontawesome-free-subset/style.css"> <!-- Overall theme --> <link rel="stylesheet" href="../resources/w3.css"> <link rel="stylesheet" href="../resources/w3-theme-blue.css"> <script src="../resources/w3.js"></script> <!-- ISCA Archive modifications --> <link rel="stylesheet" href="../resources/is.css"> </head> <body> <!-- Top menu bar, fixed on large/medium screens --> <div class="w3-top w3-hide-small"> <div class="w3-bar w3-grayscale-min w3-theme-d4 w3-center"> <a href="../../index.html" class="w3-bar-item w3-button w3-theme-d2 w3-mobile"> <i class="icon-home w3-margin-right"></i>ISCA </a> <a href="../index.html" class="w3-bar-item w3-button w3-mobile">Archive</a> <a href="#" class="w3-bar-item w3-button w3-mobile">Interspeech 2014</a> <a class="w3-bar-item w3-button w3-mobile" onclick="document.getElementById('sessionchooser').style.display='block'">Sessions </a> <a href="#bypaper" class="w3-bar-item w3-button w3-mobile"><i class="icon-search" style='margin-right:5px'></i>Search</a> <a href="interspeech_2014.pdf" class="w3-bar-item w3-button w3-mobile w3-right">Booklet</a> </div> </div> <!-- Top menu bar, scrollable on small screens --> <div class="w3-hide-large w3-hide-medium"> <div class="w3-bar w3-grayscale-min w3-theme-d4 w3-center"> <span class="w3-bar-item w3-button w3-mobile w3-opacity-max"> </span> <a href="../../index.html" class="w3-bar-item w3-button w3-mobile"> <i class="icon-home w3-margin-right"></i>ISCA </a> <a href="../index.html" class="w3-bar-item w3-button w3-mobile">Archive</a> <a href="#" class="w3-bar-item w3-button w3 w3-mobile" onclick="document.getElementById('sessionchooser').style.display='block'">Sessions </a> <a href="#bypaper" class="w3-bar-item w3-button w3-mobile"><i class="icon-search" style='margin-right:5px'></i>Search</a> <a href="interspeech_2014.pdf" class="w3-bar-item w3-button w3-mobile w3-right">Booklet</a> </div> </div> <!-- Papers help popup --> <div id="help_papers" class="w3-modal"> <div class="w3-modal-content w3-card-4 w3-greyscale w3-theme-d4 w3-padding w3-bordered"> <div class="w3-container"> <span onclick="document.getElementById('help_papers').style.display='none'" class="w3-button w3-display-topright">×</span> <div class="w3-container"> <p class="w3-text">Click on column names to sort.</p> <p class="w3-text">Searching uses the 'and' of terms e.g. <span class='w3-monospace'>Smith Interspeech</span> matches all papers by Smith in any Interspeech. The order of terms is not significant.</p> <p class="w3-text">Use double quotes for exact phrasal matches e.g. <span class='w3-monospace'>"acoustic features"</span>.</p> <p class="w3-text">Case is ignored.</p> <p class="w3-text">Diacritics are optional e.g. <span class='w3-monospace'>lefevre</span> also matches <span class='w3-monospace'>lefèvre</span> (but not vice versa).</p> <p class="w3-text">It can be useful to turn off spell-checking for the search box in your browser preferences.</p> <p class="w3-text">If you prefer to scroll rather than page, increase the number in the show entries dropdown.</p> </div> </div> </div> </div> <div class="w3-top w3-hide-medium w3-hide-large"> <div class="w3-bar w3-grayscale-min w3-theme-d4 w3-opacity-max"> <a href="#" class="w3-bar-item w3-button w3-theme-d2 w3-left">top</a> </div> </div> <div class="w3-grayscale w3-theme-l5"> <!-- Conference header --> <div class="w3-container" id="about"> <div class="w3-content" style="max-width:1100px;margin-top:50px; margin-bottom: 10px"> <h2 class="w3-center w3-padding-16"> <span class="w3-text">Interspeech 2014</span> </h2> <h5 class="w3-text w3-center"> Singapore<br> 14-18 September 2014</h5> <br> <h5 class="w3-text w3-center">General Chair: Haizhou Li; General Co-Chair: Pak-Chung Ching</h5> <pre class="w3-text w3-center">doi: 10.21437/Interspeech.2014</pre> <pre class="w3-text w3-center">ISSN: 2958-1796</pre> </div> </div> <!-- Sessions --> <div class="w3-container"> <div class="w3-content" style="max-width:1200px;margin-top: 10px"> <div class="w3-content" style="height:10px" id="Keynote"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Keynote</h4> <hr> <a class="w3-text" href="cutler14_interspeech.html"> <p> Learning about speech <br> <span class="w3-text w3-text-theme"> Anne Cutler </span> </p> </a> <a class="w3-text" href="liu14_interspeech.html"> <p> Decision learning in data science: where John Nash meets social media <br> <span class="w3-text w3-text-theme"> K. J. Ray Liu </span> </p> </a> <a class="w3-text" href="lamel14_interspeech.html"> <p> Language diversity: speech processing in a multi-lingual context <br> <span class="w3-text w3-text-theme"> Lori Lamel </span> </p> </a> <a class="w3-text" href="wang14_interspeech.html"> <p> Sound patterns in language <br> <span class="w3-text w3-text-theme"> William S.-Y. Wang </span> </p> </a> <a class="w3-text" href="deng14_interspeech.html"> <p> Achievements and challenges of deep learning — from speech analysis and recognition to language and multimodal processing <br> <span class="w3-text w3-text-theme"> Li Deng </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Multi-Lingual ASR"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Multi-Lingual ASR</h4> <hr> <a class="w3-text" href="zhang14_interspeech.html"> <p> Language ID-based training of multilingual stacked bottleneck features <br> <span class="w3-text w3-text-theme"> Yu Zhang, Ekapol Chuangsuwanich, James R. Glass </span> </p> </a> <a class="w3-text" href="do14_interspeech.html"> <p> Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR <br> <span class="w3-text w3-text-theme"> Van Hai Do, Xiong Xiao, Eng Siong Chng, Haizhou Li </span> </p> </a> <a class="w3-text" href="vu14_interspeech.html"> <p> Improving ASR performance on non-native speech using multilingual and crosslingual information <br> <span class="w3-text w3-text-theme"> Ngoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova, Tanja Schultz </span> </p> </a> <a class="w3-text" href="knill14_interspeech.html"> <p> Language independent and unsupervised acoustic models for speech recognition and keyword spotting <br> <span class="w3-text w3-text-theme"> Kate M. Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath </span> </p> </a> <a class="w3-text" href="bell14_interspeech.html"> <p> Cross-lingual adaptation with multi-task adaptive networks <br> <span class="w3-text w3-text-theme"> Peter Bell, Joris Driesen, Steve Renals </span> </p> </a> <a class="w3-text" href="razavi14_interspeech.html"> <p> On recognition of non-native speech using probabilistic lexical model <br> <span class="w3-text w3-text-theme"> Marzieh Razavi, Mathew Magimai Doss </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Prosody Processing"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Prosody Processing</h4> <hr> <a class="w3-text" href="tanaka14_interspeech.html"> <p> Direct F<SUB>0</SUB> control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation <br> <span class="w3-text w3-text-theme"> Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura </span> </p> </a> <a class="w3-text" href="niekerk14_interspeech.html"> <p> A target approximation intonation model for yorùbá TTS <br> <span class="w3-text w3-text-theme"> Daniel R. van Niekerk, Etienne Barnard </span> </p> </a> <a class="w3-text" href="vadapalli14_interspeech.html"> <p> Learning continuous-valued word representations for phrase break prediction <br> <span class="w3-text w3-text-theme"> Anandaswarup Vadapalli, Kishore Prahallad </span> </p> </a> <a class="w3-text" href="che14_interspeech.html"> <p> Improving Mandarin prosodic boundary prediction with rich syntactic features <br> <span class="w3-text w3-text-theme"> Hao Che, Jianhua Tao, Ya Li </span> </p> </a> <a class="w3-text" href="dall14_interspeech.html"> <p> Investigating automatic & human filled pause insertion for speech synthesis <br> <span class="w3-text w3-text-theme"> Rasmus Dall, Marcus Tomalin, Mirjam Wester, William Byrne, Simon King </span> </p> </a> <a class="w3-text" href="dall14b_interspeech.html"> <p> The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech <br> <span class="w3-text w3-text-theme"> Rasmus Dall, Mirjam Wester, Martin Corley </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speaker Recognition — Applications"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speaker Recognition — Applications</h4> <hr> <a class="w3-text" href="khoury14_interspeech.html"> <p> Introducing i-vectors for joint anti-spoofing and speaker verification <br> <span class="w3-text w3-text-theme"> Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel </span> </p> </a> <a class="w3-text" href="leary14_interspeech.html"> <p> Random projections for large-scale speaker search <br> <span class="w3-text w3-text-theme"> Ryan Leary, Walter Andrews </span> </p> </a> <a class="w3-text" href="fredouille14_interspeech.html"> <p> Analysis of i-vector framework for speaker identification in TV-shows <br> <span class="w3-text w3-text-theme"> Corinne Fredouille, Delphine Charlet </span> </p> </a> <a class="w3-text" href="laurent14_interspeech.html"> <p> Boosting bonsai trees for efficient features combination: application to speaker role identification <br> <span class="w3-text w3-text-theme"> Antoine Laurent, Nathalie Camelin, Christian Raymond </span> </p> </a> <a class="w3-text" href="raimond14_interspeech.html"> <p> Identifying contributors in the BBC world service archive <br> <span class="w3-text w3-text-theme"> Yves Raimond, Thomas Nixon </span> </p> </a> <a class="w3-text" href="kelly14_interspeech.html"> <p> Effect of long-term ageing on i-vector speaker verification <br> <span class="w3-text w3-text-theme"> Finnian Kelly, Rahim Saeidi, Naomi Harte, David A. van Leeuwen </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Phonetics and Phonology 1, 2"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Phonetics and Phonology 1, 2</h4> <hr> <a class="w3-text" href="versteegh14_interspeech.html"> <p> Acoustic correlates of phonological status <br> <span class="w3-text w3-text-theme"> Maarten Versteegh, Amanda Seidl, Alejandrina Cristia </span> </p> </a> <a class="w3-text" href="airaksinen14_interspeech.html"> <p> Parameterization of the glottal source with the phase plane plot <br> <span class="w3-text w3-text-theme"> Manu Airaksinen, Paavo Alku </span> </p> </a> <a class="w3-text" href="rose14_interspeech.html"> <p> Transcribing tone — a likelihood-based quantitative evaluation of chao's tone letters <br> <span class="w3-text w3-text-theme"> Phil Rose </span> </p> </a> <a class="w3-text" href="hamzah14_interspeech.html"> <p> Intonational phonology and prosodic hierarchy in malay <br> <span class="w3-text w3-text-theme"> Diyana Hamzah, James Sneed German </span> </p> </a> <a class="w3-text" href="reichel14_interspeech.html"> <p> Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian <br> <span class="w3-text w3-text-theme"> Uwe D. Reichel, Katalin Mády </span> </p> </a> <a class="w3-text" href="christodoulides14_interspeech.html"> <p> An evaluation of machine learning methods for prominence detection in French <br> <span class="w3-text w3-text-theme"> George Christodoulides, Mathieu Avanzi </span> </p> </a> <a class="w3-text" href="chen14_interspeech.html"> <p> Investigating the effect of F0 and vocal intensity on harmonic magnitudes: data from high-speed laryngeal videoendoscopy <br> <span class="w3-text w3-text-theme"> Gang Chen, Soo Jin Park, Jody Kreiman, Abeer Alwan </span> </p> </a> <a class="w3-text" href="delaisroussarie14_interspeech.html"> <p> Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictation <br> <span class="w3-text w3-text-theme"> Elisabeth Delais-Roussarie, Damien Lolive, Hiyon Yoo, Nelly Barbot, Olivier Rosec </span> </p> </a> <a class="w3-text" href="sung14_interspeech.html"> <p> The articulation of lexical and post-lexical palatalization in Korean <br> <span class="w3-text w3-text-theme"> Jae-Hyun Sung </span> </p> </a> <a class="w3-text" href="archangeli14_interspeech.html"> <p> Articulation and neutralization: a preliminary study of lenition in scottish gaelic <br> <span class="w3-text w3-text-theme"> Diana Archangeli, Samuel Johnston, Jae-Hyun Sung, Muriel Fisher, Michael Hammond, Andrew Carnie </span> </p> </a> <a class="w3-text" href="amino14_interspeech.html"> <p> Nasality in speech and its contribution to speaker individuality <br> <span class="w3-text w3-text-theme"> Kanae Amino, Hisanori Makinae, Tatsuya Kitamura </span> </p> </a> <a class="w3-text" href="brown14_interspeech.html"> <p> Is speech rhythm an intrinsic property of language? <br> <span class="w3-text w3-text-theme"> Jason Brown, Eden Matene </span> </p> </a> <a class="w3-text" href="jackschina14_interspeech.html"> <p> Where /ar/ the /r/s in standard austrian German? <br> <span class="w3-text w3-text-theme"> Anke Jackschina, Barbara Schuppler, Rudolf Muhr </span> </p> </a> <a class="w3-text" href="hu14_interspeech.html"> <p> Diphthongized vowels in the yi county hui Chinese dialect <br> <span class="w3-text w3-text-theme"> Fang Hu, Minghui Zhang </span> </p> </a> <a class="w3-text" href="dellwo14_interspeech.html"> <p> Rhythmic variability between some asian languages: results from an automatic analysis of temporal characteristics <br> <span class="w3-text w3-text-theme"> Volker Dellwo, Peggy Mok, Mathias Jenny </span> </p> </a> <a class="w3-text" href="braun14_interspeech.html"> <p> Listener estimation of speaker age based on whispered speech <br> <span class="w3-text w3-text-theme"> Angelika Braun, Daniela Decker </span> </p> </a> <a class="w3-text" href="kasisopa14_interspeech.html"> <p> The Lombard effect with Thai lexical tones: an acoustic analysis of articulatory modifications in noise <br> <span class="w3-text w3-text-theme"> Benjawan Kasisopa, Virginie Attina, Denis Burnham </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Open Domain Situated Conversational Interaction (Special Session)"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Open Domain Situated Conversational Interaction (Special Session)</h4> <hr> <a class="w3-text" href="pappu14_interspeech.html"> <p> Learning situated knowledge bases through dialog <br> <span class="w3-text w3-text-theme"> Aasish Pappu, Alexander I. Rudnicky </span> </p> </a> <a class="w3-text" href="misu14_interspeech.html"> <p> Crowdsourcing for situated dialog systems in a moving car <br> <span class="w3-text w3-text-theme"> Teruhisa Misu </span> </p> </a> <a class="w3-text" href="higashinaka14_interspeech.html"> <p> Evaluating coherence in open domain conversational systems <br> <span class="w3-text w3-text-theme"> Ryuichiro Higashinaka, Toyomi Meguro, Kenji Imamura, Hiroaki Sugiyama, Toshiro Makino, Yoshihiro Matsuo </span> </p> </a> <a class="w3-text" href="bechet14_interspeech.html"> <p> Adapting dependency parsing to spontaneous speech for open domain spoken language understanding <br> <span class="w3-text w3-text-theme"> Frederic Bechet, Alexis Nasr, Benoit Favre </span> </p> </a> <a class="w3-text" href="gasic14_interspeech.html"> <p> Incremental on-line adaptation of POMDP-based dialogue managers to extended domains <br> <span class="w3-text w3-text-theme"> M. Gašić, Dongho Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, M. Szummer, B. Thomson, Steve Young </span> </p> </a> <a class="w3-text" href="robichaud14_interspeech.html"> <p> Hypotheses ranking for robust domain classification and tracking in dialogue systems <br> <span class="w3-text w3-text-theme"> Jean-Philippe Robichaud, Paul A. Crook, Puyang Xu, Omar Zia Khan, Ruhi Sarikaya </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Production: Models and Acoustics"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Production: Models and Acoustics</h4> <hr> <a class="w3-text" href="ramanarayanan14_interspeech.html"> <p> Motor control primitives arising from a learned dynamical systems model of speech articulation <br> <span class="w3-text w3-text-theme"> Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="yeh14_interspeech.html"> <p> Nonword repetition of taiwanese disyllabic tonal sequences in adults with language attrition <br> <span class="w3-text w3-text-theme"> Chia-Hsin Yeh, Chiung-Yao Wang, Jung-Yueh Tu </span> </p> </a> <a class="w3-text" href="windmann14_interspeech.html"> <p> A unified account of prominence effects in an optimization-based model of speech timing <br> <span class="w3-text w3-text-theme"> Andreas Windmann, Juraj Šimko, Petra Wagner </span> </p> </a> <a class="w3-text" href="kim14_interspeech.html"> <p> Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints <br> <span class="w3-text w3-text-theme"> Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="sudhakar14_interspeech.html"> <p> Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognition <br> <span class="w3-text w3-text-theme"> Prasad Sudhakar, Prasanta Kumar Ghosh </span> </p> </a> <a class="w3-text" href="wang14b_interspeech.html"> <p> Contribution of tongue lateral to consonant production <br> <span class="w3-text w3-text-theme"> Jun Wang, William Katz, Thomas F. Campbell </span> </p> </a> <a class="w3-text" href="liu14b_interspeech.html"> <p> A preliminary study on acoustic correlates of tone2+tone2 disyllabic word stress in Mandarin <br> <span class="w3-text w3-text-theme"> Min Liu, Shuju Shi, Jinsong Zhang </span> </p> </a> <a class="w3-text" href="abuoudeh14_interspeech.html"> <p> Vowel length impact on locus equation parameters: an investigation on jordanian Arabic <br> <span class="w3-text w3-text-theme"> Mohammad Abuoudeh, Olivier Crouzet </span> </p> </a> <a class="w3-text" href="roberts14_interspeech.html"> <p> Corpus-testing a fricative discriminator; or, just how invariant is this invariant? <br> <span class="w3-text w3-text-theme"> Philip J. Roberts, Henning Reetz, Aditi Lahiri </span> </p> </a> <a class="w3-text" href="bush14_interspeech.html"> <p> Modeling coarticulation in continuous speech <br> <span class="w3-text w3-text-theme"> Brian O. Bush, Alexander Kain </span> </p> </a> <a class="w3-text" href="daoudi14_interspeech.html"> <p> On classification between normal and pathological voices using the MEEI-kayPENTAX database: issues and consequences <br> <span class="w3-text w3-text-theme"> Khalid Daoudi, Blaise Bertrac </span> </p> </a> <a class="w3-text" href="bukmaier14_interspeech.html"> <p> Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic change <br> <span class="w3-text w3-text-theme"> Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Extraction of Para-Linguistic Information"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Extraction of Para-Linguistic Information</h4> <hr> <a class="w3-text" href="gupta14_interspeech.html"> <p> Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter <br> <span class="w3-text w3-text-theme"> Rahul Gupta, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="xiao14_interspeech.html"> <p> Modeling therapist empathy through prosody in drug addiction counseling <br> <span class="w3-text w3-text-theme"> Bo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="bone14_interspeech.html"> <p> An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model <br> <span class="w3-text w3-text-theme"> Daniel Bone, Chi-Chun Lee, Alexandros Potamianos, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="han14_interspeech.html"> <p> Speech emotion recognition using deep neural network and extreme learning machine <br> <span class="w3-text w3-text-theme"> Kun Han, Dong Yu, Ivan Tashev </span> </p> </a> <a class="w3-text" href="truong14_interspeech.html"> <p> An annotation scheme for sighs in spontaneous dialogue <br> <span class="w3-text w3-text-theme"> Khiet P. Truong, Gerben J. Westerhof, Franciska de Jong, Dirk Heylen </span> </p> </a> <a class="w3-text" href="he14_interspeech.html"> <p> Speaker idiosyncratic variability of intensity across syllables <br> <span class="w3-text w3-text-theme"> Lei He, Volker Dellwo </span> </p> </a> <a class="w3-text" href="mariooryad14_interspeech.html"> <p> Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora <br> <span class="w3-text w3-text-theme"> Soroosh Mariooryad, Reza Lotfian, Carlos Busso </span> </p> </a> <a class="w3-text" href="safavi14_interspeech.html"> <p> Identification of age-group from children's speech by computers and humans <br> <span class="w3-text w3-text-theme"> Saeid Safavi, Martin Russell, Peter Jančovič </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Spoken Language Understanding"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Spoken Language Understanding</h4> <hr> <a class="w3-text" href="morchid14_interspeech.html"> <p> Theme identification in human-human conversations with features from specific speaker type hidden spaces <br> <span class="w3-text w3-text-theme"> Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès, Renato De Mori </span> </p> </a> <a class="w3-text" href="marin14_interspeech.html"> <p> Learning phrase patterns for text classification using a knowledge graph and unlabeled data <br> <span class="w3-text w3-text-theme"> Alex Marin, Roman Holenstein, Ruhi Sarikaya, Mari Ostendorf </span> </p> </a> <a class="w3-text" href="xu14_interspeech.html"> <p> Targeted feature dropout for robust slot filling in natural language understanding <br> <span class="w3-text w3-text-theme"> Puyang Xu, Ruhi Sarikaya </span> </p> </a> <a class="w3-text" href="shiang14_interspeech.html"> <p> Spoken question answering using tree-structured conditional random fields and two-layer random walk <br> <span class="w3-text w3-text-theme"> Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee </span> </p> </a> <a class="w3-text" href="sarikaya14_interspeech.html"> <p> Shrinkage based features for slot tagging with conditional random fields <br> <span class="w3-text w3-text-theme"> Ruhi Sarikaya, Asli Celikyilmaz, Anoop Deoras, Minwoo Jeong </span> </p> </a> <a class="w3-text" href="shi14_interspeech.html"> <p> Cluster based Chinese abbreviation modeling <br> <span class="w3-text w3-text-theme"> Yangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang </span> </p> </a> <a class="w3-text" href="zhang14b_interspeech.html"> <p> Parsing named entity as syntactic structure <br> <span class="w3-text w3-text-theme"> Xiantao Zhang, Dongchen Li, Xihong Wu </span> </p> </a> <a class="w3-text" href="tur14_interspeech.html"> <p> Detecting out-of-domain utterances addressed to a virtual personal assistant <br> <span class="w3-text w3-text-theme"> Gokhan Tur, Anoop Deoras, Dilek Hakkani-Tür </span> </p> </a> <a class="w3-text" href="georgiladakis14_interspeech.html"> <p> Fusion of knowledge-based and data-driven approaches to grammar induction <br> <span class="w3-text w3-text-theme"> Spiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides Petrakis, Alexandros Potamianos </span> </p> </a> <a class="w3-text" href="katerenchuk14_interspeech.html"> <p> Improving named entity recognition with prosodic features <br> <span class="w3-text w3-text-theme"> Denys Katerenchuk, Andrew Rosenberg </span> </p> </a> <a class="w3-text" href="ravuri14_interspeech.html"> <p> Neural network models for lexical addressee detection <br> <span class="w3-text w3-text-theme"> Suman V. Ravuri, Andreas Stolcke </span> </p> </a> <a class="w3-text" href="freeman14_interspeech.html"> <p> Manipulating stance and involvement using collaborative tasks: an exploratory comparison <br> <span class="w3-text w3-text-theme"> Valerie Freeman, Julian Chan, Gina-Anne Levow, Richard Wright, Mari Ostendorf, Victoria Zayats </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Spoken Dialogue Systems"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Spoken Dialogue Systems</h4> <hr> <a class="w3-text" href="ghigi14_interspeech.html"> <p> Incremental dialog processing in a task-oriented dialog <br> <span class="w3-text w3-text-theme"> Fabrizio Ghigi, Maxine Eskenazi, M. Ines Torres, Sungjin Lee </span> </p> </a> <a class="w3-text" href="hotta14_interspeech.html"> <p> Detecting incorrectly-segmented utterances for posteriori restoration of turn-taking and ASR results <br> <span class="w3-text w3-text-theme"> Naoki Hotta, Kazunori Komatani, Satoshi Sato, Mikio Nakano </span> </p> </a> <a class="w3-text" href="hassan14_interspeech.html"> <p> Segmentation and disfluency removal for conversational speech translation <br> <span class="w3-text w3-text-theme"> Hany Hassan, Lee Schwartz, Dilek Hakkani-Tür, Gokhan Tur </span> </p> </a> <a class="w3-text" href="watanabe14_interspeech.html"> <p> Cost-level integration of statistical and rule-based dialog managers <br> <span class="w3-text w3-text-theme"> Shinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji </span> </p> </a> <a class="w3-text" href="kim14b_interspeech.html"> <p> Inverse reinforcement learning for micro-turn management <br> <span class="w3-text w3-text-theme"> Dongho Kim, Catherine Breslin, Pirros Tsiakoulis, M. Gašić, Matthew Henderson, Steve Young </span> </p> </a> <a class="w3-text" href="kane14_interspeech.html"> <p> Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions <br> <span class="w3-text w3-text-theme"> John Kane, Irena Yanushevskaya, Céline de Looze, Brian Vaughan, Ailbhe Ní Chasaide </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="DNN Architectures and Robust Recognition"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">DNN Architectures and Robust Recognition</h4> <hr> <a class="w3-text" href="sak14_interspeech.html"> <p> Long short-term memory recurrent neural network architectures for large scale acoustic modeling <br> <span class="w3-text w3-text-theme"> Haşim Sak, Andrew Senior, Françoise Beaufays </span> </p> </a> <a class="w3-text" href="saon14_interspeech.html"> <p> Unfolded recurrent neural networks for speech recognition <br> <span class="w3-text w3-text-theme"> George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny </span> </p> </a> <a class="w3-text" href="tomar14_interspeech.html"> <p> Manifold regularized deep neural networks <br> <span class="w3-text w3-text-theme"> Vikrant Singh Tomar, Richard C. Rose </span> </p> </a> <a class="w3-text" href="li14_interspeech.html"> <p> Modeling long temporal contexts for robust DNN-based speech recognition <br> <span class="w3-text w3-text-theme"> Bo Li, Khe Chai Sim </span> </p> </a> <a class="w3-text" href="li14b_interspeech.html"> <p> A long, deep and wide artificial neural net for robust speech recognition in unknown noise <br> <span class="w3-text w3-text-theme"> Feipeng Li, Phani S. Nidadavolu, Hynek Hermansky </span> </p> </a> <a class="w3-text" href="seps14_interspeech.html"> <p> Investigation of deep neural networks for robust recognition of nonlinearly distorted speech <br> <span class="w3-text w3-text-theme"> Ladislav Seps, Jiri Malek, Petr Cerva, Jan Nouza </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speaker Recognition — Evaluation and Forensics"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speaker Recognition — Evaluation and Forensics</h4> <hr> <a class="w3-text" href="banse14_interspeech.html"> <p> Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge <br> <span class="w3-text w3-text-theme"> Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Alvin F. Martin, Alan McCree, Mark Przybocki, Douglas A. Reynolds </span> </p> </a> <a class="w3-text" href="leeuwen14_interspeech.html"> <p> Constrained speaker linking <br> <span class="w3-text w3-text-theme"> David A. van Leeuwen, Niko Brümmer </span> </p> </a> <a class="w3-text" href="novoselov14_interspeech.html"> <p> RBM-PLDA subsystem for the NIST i-vector challenge <br> <span class="w3-text w3-text-theme"> Sergey Novoselov, Timur Pekhovsky, Konstantin Simonchik, Andrey Shulipa </span> </p> </a> <a class="w3-text" href="shum14_interspeech.html"> <p> Limited labels for unlimited data: active learning for speaker recognition <br> <span class="w3-text w3-text-theme"> Stephen H. Shum, Najim Dehak, James R. Glass </span> </p> </a> <a class="w3-text" href="brummer14_interspeech.html"> <p> Bayesian calibration for forensic evidence reporting <br> <span class="w3-text w3-text-theme"> Niko Brümmer, Albert Swart </span> </p> </a> <a class="w3-text" href="ishihara14_interspeech.html"> <p> Replicate mismatch between test/background and development databases: the impact on the performance of likelihood ratio-based forensic voice comparison <br> <span class="w3-text w3-text-theme"> Shunichi Ishihara </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Production I, II"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Production I, II</h4> <hr> <a class="w3-text" href="airaksinen14b_interspeech.html"> <p> Automatic estimation of the lip radiation effect in glottal inverse filtering <br> <span class="w3-text w3-text-theme"> Manu Airaksinen, Tom Bäckström, Paavo Alku </span> </p> </a> <a class="w3-text" href="rosa14_interspeech.html"> <p> Simulation of 3d larynges with asymmetric distribution of viscoelastic properties in their vocal folds <br> <span class="w3-text w3-text-theme"> Marcelo de Oliveira Rosa </span> </p> </a> <a class="w3-text" href="takemoto14_interspeech.html"> <p> Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods <br> <span class="w3-text w3-text-theme"> Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura </span> </p> </a> <a class="w3-text" href="kim14c_interspeech.html"> <p> A study of invariant properties and variation patterns in the converter/distributor model for emotional speech <br> <span class="w3-text w3-text-theme"> Jangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="hewer14_interspeech.html"> <p> A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation <br> <span class="w3-text w3-text-theme"> Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer </span> </p> </a> <a class="w3-text" href="kaburagi14_interspeech.html"> <p> Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative model <br> <span class="w3-text w3-text-theme"> Tokihiko Kaburagi </span> </p> </a> <a class="w3-text" href="benitez14_interspeech.html"> <p> A real-time MRI study of articulatory setting in second language speech <br> <span class="w3-text w3-text-theme"> Andrés Benítez, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="arai14_interspeech.html"> <p> Retroflex and bunched English /r/ with physical models of the human vocal tract <br> <span class="w3-text w3-text-theme"> Takayuki Arai </span> </p> </a> <a class="w3-text" href="rong14_interspeech.html"> <p> Parameterization of articulatory pattern in speakers with ALS <br> <span class="w3-text w3-text-theme"> Panying Rong, Yana Yunusova, James D. Berry, Lorne Zinman, Jordan R. Green </span> </p> </a> <a class="w3-text" href="p14_interspeech.html"> <p> Missing samples estimation in electromagnetic articulography data using equality constrained kalman smoother <br> <span class="w3-text w3-text-theme"> Sujith P, Prasanta Kumar Ghosh </span> </p> </a> <a class="w3-text" href="ji14_interspeech.html"> <p> Palate-referenced articulatory features for acoustic-to-articulator inversion <br> <span class="w3-text w3-text-theme"> An Ji, Michael T. Johnson, Jeff Berry </span> </p> </a> <a class="w3-text" href="uchida14_interspeech.html"> <p> A study on the improvement of measurement accuracy of the three-dimensional electromagnetic articulography <br> <span class="w3-text w3-text-theme"> Hidetsugu Uchida, Kohei Wakamiya, Tokihiko Kaburagi </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)</h4> <hr> <a class="w3-text" href="schuller14_interspeech.html"> <p> The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load <br> <span class="w3-text w3-text-theme"> Björn Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, Yue Zhang </span> </p> </a> <a class="w3-text" href="pohjalainen14_interspeech.html"> <p> Filtering and subspace selection for spectral features in detecting speech under physical stress <br> <span class="w3-text w3-text-theme"> Jouni Pohjalainen, Paavo Alku </span> </p> </a> <a class="w3-text" href="li14c_interspeech.html"> <p> Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens <br> <span class="w3-text w3-text-theme"> Ming Li </span> </p> </a> <a class="w3-text" href="kaya14_interspeech.html"> <p> Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction <br> <span class="w3-text w3-text-theme"> Heysem Kaya, Tuğçe Özkaptan, Albert Ali Salah, Sadık Fikret Gürgen </span> </p> </a> <a class="w3-text" href="jing14_interspeech.html"> <p> Ensemble of machine learning algorithms for cognitive and physical speaker load detection <br> <span class="w3-text w3-text-theme"> How Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao, Hsin-Min Wang </span> </p> </a> <a class="w3-text" href="gosztolya14_interspeech.html"> <p> Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks <br> <span class="w3-text w3-text-theme"> Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth </span> </p> </a> <a class="w3-text" href="montacie14_interspeech.html"> <p> High-level speech event analysis for cognitive load classification <br> <span class="w3-text w3-text-theme"> Claude Montacié, Marie-José Caraty </span> </p> </a> <a class="w3-text" href="nwe14_interspeech.html"> <p> On the use of Bhattacharyya based GMM distance and neural net features for identification of cognitive load levels <br> <span class="w3-text w3-text-theme"> Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma </span> </p> </a> <a class="w3-text" href="huckvale14_interspeech.html"> <p> Prediction of cognitive load from speech with the VOQAL voice quality toolbox for the interspeech 2014 computational paralinguistics challenge <br> <span class="w3-text w3-text-theme"> Mark Huckvale </span> </p> </a> <a class="w3-text" href="kua14_interspeech.html"> <p> The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge <br> <span class="w3-text w3-text-theme"> Jia Min Karen Kua, Vidhyasaharan Sethu, Phu Le, Eliathamby Ambikairajah </span> </p> </a> <a class="w3-text" href="segbroeck14_interspeech.html"> <p> Classification of cognitive load from speech using an i-vector framework <br> <span class="w3-text w3-text-theme"> Maarten Van Segbroeck, Ruchir Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, Shrikanth S. Narayanan </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Hearing and Perception"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Hearing and Perception</h4> <hr> <a class="w3-text" href="iyer14_interspeech.html"> <p> Revisiting the right-ear advantage for speech: implications for speech displays <br> <span class="w3-text w3-text-theme"> Nandini Iyer, Eric Thompson, Brian Simpson, Griffin Romigh </span> </p> </a> <a class="w3-text" href="bosch14_interspeech.html"> <p> Comparing reaction time sequences from human participants and computational models <br> <span class="w3-text w3-text-theme"> L. ten Bosch, Miriam Ernestus, Lou Boves </span> </p> </a> <a class="w3-text" href="andrei14_interspeech.html"> <p> Detecting the number of competing speakers — human selective hearing versus spectrogram distance based estimator <br> <span class="w3-text w3-text-theme"> Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu </span> </p> </a> <a class="w3-text" href="li14d_interspeech.html"> <p> The influence of sensory memory and attention on the context effect in talker normalization <br> <span class="w3-text w3-text-theme"> Guo Li, Gang Peng </span> </p> </a> <a class="w3-text" href="lin14_interspeech.html"> <p> Automatic speech recognition with primarily temporal envelope information <br> <span class="w3-text w3-text-theme"> Payton Lin, Fei Chen, Syu Siang Wang, Ying-Hui Lai, Yu Tsao </span> </p> </a> <a class="w3-text" href="lai14_interspeech.html"> <p> An adaptive envelope compression strategy for speech processing in cochlear implants <br> <span class="w3-text w3-text-theme"> Ying-Hui Lai, Fei Chen, Yu Tsao </span> </p> </a> <a class="w3-text" href="helfer14_interspeech.html"> <p> Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI <br> <span class="w3-text w3-text-theme"> Brian S. Helfer, Thomas F. Quatieri, James R. Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey Shenk, Thomas Talavage, Jeff Palmer, Kristin Heaton </span> </p> </a> <a class="w3-text" href="jinbo14_interspeech.html"> <p> A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics <br> <span class="w3-text w3-text-theme"> Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura </span> </p> </a> <a class="w3-text" href="wang14c_interspeech.html"> <p> Investigation of the relative perceptual importance of temporal envelope and temporal fine structure between tonal and non-tonal languages <br> <span class="w3-text w3-text-theme"> Dongmei Wang, James M. Kates, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="fogerty14_interspeech.html"> <p> Vowel spectral contributions to English and Mandarin sentence intelligibility <br> <span class="w3-text w3-text-theme"> Daniel Fogerty, Fei Chen </span> </p> </a> <a class="w3-text" href="mittal14_interspeech.html"> <p> Significance of aperiodicity in the pitch perception of expressive voices <br> <span class="w3-text w3-text-theme"> Vinay Kumar Mittal, B. Yegnanarayana </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Cross-Linguistic Studies"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Cross-Linguistic Studies</h4> <hr> <a class="w3-text" href="wester14_interspeech.html"> <p> DIAPIX-FL: a symmetric corpus of problem-solving dialogues in first and second languages <br> <span class="w3-text w3-text-theme"> Mirjam Wester, María Luisa García Lecumberri, Martin Cooke </span> </p> </a> <a class="w3-text" href="coupe14_interspeech.html"> <p> Cross-linguistic investigations of oral and silent reading <br> <span class="w3-text w3-text-theme"> Christophe Coupé, Yoon Mi Oh, François Pellegrino, Egidio Marsico </span> </p> </a> <a class="w3-text" href="coumans14_interspeech.html"> <p> Non-native word recognition in noise: the role of word-initial and word-final information <br> <span class="w3-text w3-text-theme"> Juul Coumans, Roeland van Hout, Odette Scharenborg </span> </p> </a> <a class="w3-text" href="wong14_interspeech.html"> <p> The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levels <br> <span class="w3-text w3-text-theme"> Janice Wing Sze Wong </span> </p> </a> <a class="w3-text" href="burgos14_interspeech.html"> <p> Dutch vowel production by Spanish learners: duration and spectral features <br> <span class="w3-text w3-text-theme"> Pepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik </span> </p> </a> <a class="w3-text" href="lengeris14_interspeech.html"> <p> English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memory <br> <span class="w3-text w3-text-theme"> Angelos Lengeris, Katerina Nicolaidis </span> </p> </a> <a class="w3-text" href="detey14_interspeech.html"> <p> Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of French <br> <span class="w3-text w3-text-theme"> Sylvain Detey, Isabelle Racine, Julien Eychenne, Yuji Kawaguchi </span> </p> </a> <a class="w3-text" href="pinter14_interspeech.html"> <p> Perception of prosodic prominence and boundaries by L1 and L2 speakers of English <br> <span class="w3-text w3-text-theme"> Gábor Pintér, Shinobu Mizuguchi, Koichi Tateishi </span> </p> </a> <a class="w3-text" href="kalathottukaren14_interspeech.html"> <p> Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged children <br> <span class="w3-text w3-text-theme"> Rose Thomas Kalathottukaren, Suzanne C. Purdy, Elaine Ballard </span> </p> </a> <a class="w3-text" href="drozdova14_interspeech.html"> <p> Phoneme category retuning in a non-native language <br> <span class="w3-text w3-text-theme"> Polina Drozdova, Roeland van Hout, Odette Scharenborg </span> </p> </a> <a class="w3-text" href="chiou14_interspeech.html"> <p> Speech emotion recognition with cross-lingual databases <br> <span class="w3-text w3-text-theme"> Bo-Chang Chiou, Chia-Ping Chen </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speaker Diarization"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speaker Diarization</h4> <hr> <a class="w3-text" href="inoue14_interspeech.html"> <p> Speaker diarization using eye-gaze information in multi-party conversations <br> <span class="w3-text w3-text-theme"> Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara </span> </p> </a> <a class="w3-text" href="huang14_interspeech.html"> <p> Unsupervised speaker diarization using riemannian manifold clustering <br> <span class="w3-text w3-text-theme"> Che-Wei Huang, Bo Xiao, Panayiotis G. Georgiou, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="delgado14_interspeech.html"> <p> Towards a complete binary key system for the speaker diarization task <br> <span class="w3-text w3-text-theme"> Héctor Delgado, Corinne Fredouille, Javier Serrano </span> </p> </a> <a class="w3-text" href="ghaemmaghami14_interspeech.html"> <p> An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives <br> <span class="w3-text w3-text-theme"> Houman Ghaemmaghami, David Dean, Sridha Sridharan </span> </p> </a> <a class="w3-text" href="gebre14_interspeech.html"> <p> Speaker diarization using gesture and speech <br> <span class="w3-text w3-text-theme"> Binyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts, Tom Heskes </span> </p> </a> <a class="w3-text" href="dupuy14_interspeech.html"> <p> Is incremental cross-show speaker diarization efficient for processing large volumes of data? <br> <span class="w3-text w3-text-theme"> Grégor Dupuy, Sylvain Meignier, Yannick Estève </span> </p> </a> <a class="w3-text" href="dighe14_interspeech.html"> <p> Detecting and labeling speakers on overlapping speech using vector taylor series <br> <span class="w3-text w3-text-theme"> Pranay Dighe, Marc Ferràs, Hervé Bourlard </span> </p> </a> <a class="w3-text" href="yella14_interspeech.html"> <p> Phoneme background model for information bottleneck based speaker diarization <br> <span class="w3-text w3-text-theme"> Sree Harsha Yella, Petr Motlicek, Hervé Bourlard </span> </p> </a> <a class="w3-text" href="ferras14_interspeech.html"> <p> Diarizing large corpora using multi-modal speaker linking <br> <span class="w3-text w3-text-theme"> Marc Ferràs, Stefano Masneri, Oliver Schreer, Hervé Bourlard </span> </p> </a> <a class="w3-text" href="bechet14b_interspeech.html"> <p> Multimodal understanding for person recognition in video broadcasts <br> <span class="w3-text w3-text-theme"> Frederic Bechet, Meriem Bendris, Delphine Charlet, Géraldine Damnati, Benoit Favre, Mickael Rouvier, Remi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linarès, Jean Martinet, Gregory Senay, Pierre Tirilly </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Robust ASR 1, 2"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Robust ASR 1, 2</h4> <hr> <a class="w3-text" href="gibson14_interspeech.html"> <p> Comparing time-frequency representations for directional derivative features <br> <span class="w3-text w3-text-theme"> James Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="du14_interspeech.html"> <p> Robust speech recognition with speech enhanced deep neural networks <br> <span class="w3-text w3-text-theme"> Jun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai, Chin-Hui Lee </span> </p> </a> <a class="w3-text" href="vincent14_interspeech.html"> <p> An investigation of likelihood normalization for robust ASR <br> <span class="w3-text w3-text-theme"> Emmanuel Vincent, Aggelos Gkiokas, Dominik Schnitzer, Arthur Flexer </span> </p> </a> <a class="w3-text" href="spille14_interspeech.html"> <p> Identifying the human-machine differences in complex binaural scenes: what can be learned from our auditory system <br> <span class="w3-text w3-text-theme"> Constantin Spille, Bernd T. Meyer </span> </p> </a> <a class="w3-text" href="geiger14_interspeech.html"> <p> Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling <br> <span class="w3-text w3-text-theme"> Jürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn Schuller, Gerhard Rigoll </span> </p> </a> <a class="w3-text" href="liu14c_interspeech.html"> <p> Joint adaptation and adaptive training of TVWR for robust automatic speech recognition <br> <span class="w3-text w3-text-theme"> Shilin Liu, Khe Chai Sim </span> </p> </a> <a class="w3-text" href="park14_interspeech.html"> <p> Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression <br> <span class="w3-text w3-text-theme"> Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern </span> </p> </a> <a class="w3-text" href="zhao14_interspeech.html"> <p> Variable-component deep neural network for robust speech recognition <br> <span class="w3-text w3-text-theme"> Rui Zhao, Jinyu Li, Yifan Gong </span> </p> </a> <a class="w3-text" href="kao14_interspeech.html"> <p> Effective modulation spectrum factorization for robust speech recognition <br> <span class="w3-text w3-text-theme"> Yu-Chen Kao, Yi-Ting Wang, Berlin Chen </span> </p> </a> <a class="w3-text" href="ravuri14b_interspeech.html"> <p> Hybrid MLP/structured-SVM tandem systems for large vocabulary and robust ASR <br> <span class="w3-text w3-text-theme"> Suman V. Ravuri </span> </p> </a> <a class="w3-text" href="kim14d_interspeech.html"> <p> Robust speech recognition using temporal masking and thresholding algorithm <br> <span class="w3-text w3-text-theme"> Chanwoo Kim, Kean K. Chin, Michiel Bacchiani, Richard M. Stern </span> </p> </a> <a class="w3-text" href="xie14_interspeech.html"> <p> Deep neural network bottleneck features for generalized variable parameter HMMs <br> <span class="w3-text w3-text-theme"> Xurong Xie, Rongfeng Su, Xunying Liu, Lan Wang </span> </p> </a> <a class="w3-text" href="bu14_interspeech.html"> <p> A novel dynamic parameters calculation approach for model compensation <br> <span class="w3-text w3-text-theme"> Suliang Bu, Yanmin Qian, Kai Yu </span> </p> </a> <a class="w3-text" href="hashimoto14_interspeech.html"> <p> Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorization <br> <span class="w3-text w3-text-theme"> Naoaki Hashimoto, Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa </span> </p> </a> <a class="w3-text" href="chung14_interspeech.html"> <p> Noise robust speech recognition based on noise-adapted HMMs using speech feature compensation <br> <span class="w3-text w3-text-theme"> Yong-Joo Chung </span> </p> </a> <a class="w3-text" href="alam14_interspeech.html"> <p> Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition <br> <span class="w3-text w3-text-theme"> M. J. Alam, Patrick Kenny, Pierre Dumouchel, Douglas O'Shaughnessy </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Implementation of Language Model Algorithms"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Implementation of Language Model Algorithms</h4> <hr> <a class="w3-text" href="chen14b_interspeech.html"> <p> Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch <br> <span class="w3-text w3-text-theme"> X. Chen, Y. Wang, X. Liu, Mark J. F. Gales, Philip C. Woodland </span> </p> </a> <a class="w3-text" href="nolden14_interspeech.html"> <p> Word pair approximation for more efficient decoding with high-order language models <br> <span class="w3-text w3-text-theme"> David Nolden, Ralf Schlüter, Hermann Ney </span> </p> </a> <a class="w3-text" href="adel14_interspeech.html"> <p> Comparing approaches to convert recurrent neural networks into backoff language models for efficient decoding <br> <span class="w3-text w3-text-theme"> Heike Adel, Katrin Kirchhoff, Ngoc Thang Vu, Dominic Telaar, Tanja Schultz </span> </p> </a> <a class="w3-text" href="nolden14b_interspeech.html"> <p> Removing redundancy from lattices <br> <span class="w3-text w3-text-theme"> David Nolden, Hagen Soltau, Daniel Povey, Pegah Ghahremani, Lidia Mangu, Hermann Ney </span> </p> </a> <a class="w3-text" href="sundermeyer14_interspeech.html"> <p> Lattice decoding and rescoring with long-Span neural network language models <br> <span class="w3-text w3-text-theme"> Martin Sundermeyer, Zoltán Tüske, Ralf Schlüter, Hermann Ney </span> </p> </a> <a class="w3-text" href="levit14_interspeech.html"> <p> Word-phrase-entity language models: getting more mileage out of n-grams <br> <span class="w3-text w3-text-theme"> Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang, Andreas Stolcke, Benoît Dumoulin </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speaker Recognition — Noise and Channel Robustness"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speaker Recognition — Noise and Channel Robustness</h4> <hr> <a class="w3-text" href="sarkar14_interspeech.html"> <p> A novel boosting algorithm for improved i-vector based speaker verification in noisy environments <br> <span class="w3-text w3-text-theme"> Sourjya Sarkar, K. Sreenivasa Rao </span> </p> </a> <a class="w3-text" href="campbell14_interspeech.html"> <p> Using deep belief networks for vector-based speaker recognition <br> <span class="w3-text w3-text-theme"> W. M. Campbell </span> </p> </a> <a class="w3-text" href="lei14_interspeech.html"> <p> A deep neural network speaker verification system targeting microphone speech <br> <span class="w3-text w3-text-theme"> Yun Lei, Luciana Ferrer, Mitchell McLaren, Nicolas Scheffer </span> </p> </a> <a class="w3-text" href="mclaren14_interspeech.html"> <p> Application of convolutional neural networks to speaker recognition in noisy conditions <br> <span class="w3-text w3-text-theme"> Mitchell McLaren, Yun Lei, Nicolas Scheffer, Luciana Ferrer </span> </p> </a> <a class="w3-text" href="pelecanos14_interspeech.html"> <p> SVM based speaker recognition: harnessing trials with multiple enrollment sessions <br> <span class="w3-text w3-text-theme"> Jason Pelecanos, Weizhong Zhu, Sibel Yaman </span> </p> </a> <a class="w3-text" href="gallardo14_interspeech.html"> <p> I-vector speaker verification based on phonetic information under transmission channel effects <br> <span class="w3-text w3-text-theme"> Laura Fernández Gallardo, Michael Wagner, Sebastian Möller </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Synthesis I-III"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Synthesis I-III</h4> <hr> <a class="w3-text" href="zang14_interspeech.html"> <p> Using conditional random fields to predict focus word pair in spontaneous spoken English <br> <span class="w3-text w3-text-theme"> Xiao Zang, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai </span> </p> </a> <a class="w3-text" href="sproat14_interspeech.html"> <p> Applications of maximum entropy rankers to problems in spoken language processing <br> <span class="w3-text w3-text-theme"> Richard Sproat, Keith Hall </span> </p> </a> <a class="w3-text" href="gonzalvo14_interspeech.html"> <p> Text-to-speech with cross-lingual neural network-based grapheme-to-phoneme models <br> <span class="w3-text w3-text-theme"> Xavi Gonzalvo, Monika Podsiadło </span> </p> </a> <a class="w3-text" href="nagahama14_interspeech.html"> <p> Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis <br> <span class="w3-text w3-text-theme"> Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi </span> </p> </a> <a class="w3-text" href="ramani14_interspeech.html"> <p> Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages <br> <span class="w3-text w3-text-theme"> B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan </span> </p> </a> <a class="w3-text" href="hu14b_interspeech.html"> <p> An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis <br> <span class="w3-text w3-text-theme"> Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, Javier Latorre </span> </p> </a> <a class="w3-text" href="patil14_interspeech.html"> <p> Chaotic mixed excitation source for speech synthesis <br> <span class="w3-text w3-text-theme"> Hemant A. Patil, Tanvina B. Patel </span> </p> </a> <a class="w3-text" href="sorin14_interspeech.html"> <p> Refined inter-segment joining in multi-form speech synthesis <br> <span class="w3-text w3-text-theme"> Alexander Sorin, Slava Shechtman, Vincent Pollet </span> </p> </a> <a class="w3-text" href="zhang14c_interspeech.html"> <p> A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system <br> <span class="w3-text w3-text-theme"> Ran Zhang, Zhengqi Wen, Jianhua Tao, Ya Li, Bing Liu, Xiaoyan Lou </span> </p> </a> <a class="w3-text" href="fabre14_interspeech.html"> <p> Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression <br> <span class="w3-text w3-text-theme"> Diandra Fabre, Thomas Hueber, Pierre Badin </span> </p> </a> <a class="w3-text" href="tobing14_interspeech.html"> <p> Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models <br> <span class="w3-text w3-text-theme"> Patrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura, Ayu Purwarianti </span> </p> </a> <a class="w3-text" href="ding14_interspeech.html"> <p> Speech-driven head motion synthesis using neural networks <br> <span class="w3-text w3-text-theme"> Chuang Ding, Pengcheng Zhu, Lei Xie, Dongmei Jiang, Zhong-Hua Fu </span> </p> </a> <a class="w3-text" href="song14_interspeech.html"> <p> Text-independent voice conversion using speaker model alignment method from non-parallel speech <br> <span class="w3-text w3-text-theme"> Peng Song, Yun Jin, Wenming Zheng, Li Zhao </span> </p> </a> <a class="w3-text" href="chen14c_interspeech.html"> <p> Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes <br> <span class="w3-text w3-text-theme"> Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai </span> </p> </a> <a class="w3-text" href="sanchez14_interspeech.html"> <p> Hierarchical modeling of F0 contours for voice conversion <br> <span class="w3-text w3-text-theme"> Gerard Sanchez, Hanna Silen, Jani Nurminen, Moncef Gabbouj </span> </p> </a> <a class="w3-text" href="kadowaki14_interspeech.html"> <p> Speech prosody generation for text-to-speech synthesis based on generative model of F<SUB>0</SUB> contours <br> <span class="w3-text w3-text-theme"> Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo, Hirokazu Kameoka </span> </p> </a> <a class="w3-text" href="chen14d_interspeech.html"> <p> An iterative approach to decision tree training for context dependent speech synthesis <br> <span class="w3-text w3-text-theme"> Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson </span> </p> </a> <a class="w3-text" href="nguyen14_interspeech.html"> <p> Prosodic phrasing modeling for vietnamese TTS using syntactic information <br> <span class="w3-text w3-text-theme"> Thi Thu Trang Nguyen, Albert Rilliard, Do Dat Tran, Christophe d'Alessandro </span> </p> </a> <a class="w3-text" href="koriyama14_interspeech.html"> <p> Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling <br> <span class="w3-text w3-text-theme"> Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi </span> </p> </a> <a class="w3-text" href="fang14_interspeech.html"> <p> Reconstruction of mistracked articulatory trajectories <br> <span class="w3-text w3-text-theme"> Qiang Fang, Jianguo Wei, Fang Hu </span> </p> </a> <a class="w3-text" href="chen14e_interspeech.html"> <p> Enabling controllability for continuous expression space <br> <span class="w3-text w3-text-theme"> Langzhou Chen, Norbert Braunschweiler </span> </p> </a> <a class="w3-text" href="nose14_interspeech.html"> <p> Analysis of spectral enhancement using global variance in HMM-based speech synthesis <br> <span class="w3-text w3-text-theme"> Takashi Nose, Akinori Ito </span> </p> </a> <a class="w3-text" href="valentinibotinhao14_interspeech.html"> <p> Intelligibility analysis of fast synthesized speech <br> <span class="w3-text w3-text-theme"> Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, Junichi Yamagishi </span> </p> </a> <a class="w3-text" href="lopezpelaez14_interspeech.html"> <p> Speech synthesis reactive to dynamic noise environmental conditions <br> <span class="w3-text w3-text-theme"> Susana Palmaz López-Peláez, Robert A. J. Clark </span> </p> </a> <a class="w3-text" href="baumann14_interspeech.html"> <p> Partial representations improve the prosody of incremental speech synthesis <br> <span class="w3-text w3-text-theme"> Timo Baumann </span> </p> </a> <a class="w3-text" href="tsiakoulis14_interspeech.html"> <p> Dialogue context sensitive speech synthesis using factorized decision trees <br> <span class="w3-text w3-text-theme"> Pirros Tsiakoulis, Catherine Breslin, M. Gašić, Matthew Henderson, Dongho Kim, Steve Young </span> </p> </a> <a class="w3-text" href="wang14d_interspeech.html"> <p> Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis <br> <span class="w3-text w3-text-theme"> Xin Wang, Zhen-Hua Ling, Li-Rong Dai </span> </p> </a> <a class="w3-text" href="gowda14_interspeech.html"> <p> On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech <br> <span class="w3-text w3-text-theme"> Dhananjaya Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle Palomäki, Mircea Giurgiu, Mikko Kurimo </span> </p> </a> <a class="w3-text" href="do14b_interspeech.html"> <p> Objective evaluation of HMM-based speech synthesis system using kullback-leibler divergence <br> <span class="w3-text w3-text-theme"> C. -T. Do, M. Evrard, A. Leman, Christophe d'Alessandro, Albert Rilliard, J. -L. Crebouw </span> </p> </a> <a class="w3-text" href="latorre14_interspeech.html"> <p> Speech intonation for TTS: study on evaluation methodology <br> <span class="w3-text w3-text-theme"> Javier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru, Mark J. F. Gales </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Multi-Lingual Cross-Lingual and Low-Resource ASR"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Multi-Lingual Cross-Lingual and Low-Resource ASR</h4> <hr> <a class="w3-text" href="miao14_interspeech.html"> <p> Improving language-universal feature extraction with deep maxout and convolutional neural networks <br> <span class="w3-text w3-text-theme"> Yajie Miao, Florian Metze </span> </p> </a> <a class="w3-text" href="fernandez14_interspeech.html"> <p> Exploiting vocal-source features to improve ASR accuracy for low-resource languages <br> <span class="w3-text w3-text-theme"> Raul Fernandez, Jia Cui, Andrew Rosenberg, Bhuvana Ramabhadran, Xiaodong Cui </span> </p> </a> <a class="w3-text" href="ragni14_interspeech.html"> <p> Data augmentation for low resource languages <br> <span class="w3-text w3-text-theme"> Anton Ragni, Kate M. Knill, Shakti P. Rath, Mark J. F. Gales </span> </p> </a> <a class="w3-text" href="jouvet14_interspeech.html"> <p> About combining forward and backward-based decoders for selecting data for unsupervised training of acoustic models <br> <span class="w3-text w3-text-theme"> Denis Jouvet, Dominique Fohr </span> </p> </a> <a class="w3-text" href="grezl14_interspeech.html"> <p> Combination of multilingual and semi-supervised training for under-resourced languages <br> <span class="w3-text w3-text-theme"> František Grézl, Martin Karafiát </span> </p> </a> <a class="w3-text" href="vu14b_interspeech.html"> <p> Investigating the learning effect of multilingual bottle-neck features for ASR <br> <span class="w3-text w3-text-theme"> Ngoc Thang Vu, Jochen Weiner, Tanja Schultz </span> </p> </a> <a class="w3-text" href="miao14b_interspeech.html"> <p> Distributed learning of multilingual DNN feature extractors using GPUs <br> <span class="w3-text w3-text-theme"> Yajie Miao, Hao Zhang, Florian Metze </span> </p> </a> <a class="w3-text" href="rath14_interspeech.html"> <p> Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages <br> <span class="w3-text w3-text-theme"> Shakti P. Rath, Kate M. Knill, Anton Ragni, Mark J. F. Gales </span> </p> </a> <a class="w3-text" href="cui14_interspeech.html"> <p> Recent improvements in neural network acoustic modeling for LVCSR in low resource languages <br> <span class="w3-text w3-text-theme"> Jia Cui, Bhuvana Ramabhadran, Xiaodong Cui, Andrew Rosenberg, Brian Kingsbury, Abhinav Sethy </span> </p> </a> <a class="w3-text" href="huang14b_interspeech.html"> <p> Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks <br> <span class="w3-text w3-text-theme"> Yan Huang, Malcolm Slaney, Michael L. Seltzer, Yifan Gong </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Estimation and Sound Source Separation"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Estimation and Sound Source Separation</h4> <hr> <a class="w3-text" href="higuchi14_interspeech.html"> <p> A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models <br> <span class="w3-text w3-text-theme"> Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, Hirokazu Kameoka </span> </p> </a> <a class="w3-text" href="vaz14_interspeech.html"> <p> Enhancing audio source separability using spectro-temporal regularization with NMF <br> <span class="w3-text w3-text-theme"> Colin Vaz, Dimitrios Dimitriadis, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="mirzaei14_interspeech.html"> <p> Blind speech source localization, counting and separation for 2-channel convolutive mixtures in a reverberant environment <br> <span class="w3-text w3-text-theme"> Sayeh Mirzaei, Hugo Van hamme, Yaser Norouzi </span> </p> </a> <a class="w3-text" href="weninger14_interspeech.html"> <p> Discriminative NMF and its application to single-channel source separation <br> <span class="w3-text w3-text-theme"> Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe </span> </p> </a> <a class="w3-text" href="kawahara14_interspeech.html"> <p> Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information <br> <span class="w3-text w3-text-theme"> Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino </span> </p> </a> <a class="w3-text" href="wang14e_interspeech.html"> <p> A graph-based Gaussian component clustering approach to unsupervised acoustic modeling <br> <span class="w3-text w3-text-theme"> Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li </span> </p> </a> <a class="w3-text" href="ziaei14_interspeech.html"> <p> A speech system for estimating daily word counts <br> <span class="w3-text w3-text-theme"> Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="lu14_interspeech.html"> <p> Ensemble modeling of denoising autoencoder for speech spectrum restoration <br> <span class="w3-text w3-text-theme"> Xugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Feature Extraction and Modeling for ASR 1, 2"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Feature Extraction and Modeling for ASR 1, 2</h4> <hr> <a class="w3-text" href="tuske14_interspeech.html"> <p> Acoustic modeling with deep neural networks using raw time signal for LVCSR <br> <span class="w3-text w3-text-theme"> Zoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney </span> </p> </a> <a class="w3-text" href="mitra14_interspeech.html"> <p> Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions <br> <span class="w3-text w3-text-theme"> Vikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels, Martin Graciarena </span> </p> </a> <a class="w3-text" href="sainath14_interspeech.html"> <p> Deep scattering spectra with deep neural networks for LVCSR tasks <br> <span class="w3-text w3-text-theme"> Tara N. Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran, David Nahamoo </span> </p> </a> <a class="w3-text" href="chang14_interspeech.html"> <p> Robust CNN-based speech recognition with Gabor filter kernels <br> <span class="w3-text w3-text-theme"> Shuo-Yiin Chang, Nelson Morgan </span> </p> </a> <a class="w3-text" href="lu14b_interspeech.html"> <p> Probabilistic linear discriminant analysis with bottleneck features for speech recognition <br> <span class="w3-text w3-text-theme"> Liang Lu, Steve Renals </span> </p> </a> <a class="w3-text" href="schatz14_interspeech.html"> <p> Evaluating speech features with the minimal-pair ABX task (II): resistance to noise <br> <span class="w3-text w3-text-theme"> Thomas Schatz, Vijayaditya Peddinti, Xuan-Nga Cao, Francis Bach, Hynek Hermansky, Emmanuel Dupoux </span> </p> </a> <a class="w3-text" href="geiger14b_interspeech.html"> <p> Investigating NMF speech enhancement for neural network based acoustic models <br> <span class="w3-text w3-text-theme"> Jürgen T. Geiger, Jort F. Gemmeke, Björn Schuller, Gerhard Rigoll </span> </p> </a> <a class="w3-text" href="lilley14_interspeech.html"> <p> Automatic speech feature classification for children with cochlear implants <br> <span class="w3-text w3-text-theme"> Jason Lilley, James Mahshie, H. Timothy Bunnell </span> </p> </a> <a class="w3-text" href="tachioka14_interspeech.html"> <p> Sequential maximum mutual information linear discriminant analysis for speech recognition <br> <span class="w3-text w3-text-theme"> Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey </span> </p> </a> <a class="w3-text" href="ghaffarzadegan14_interspeech.html"> <p> Model and feature based compensation for whispered speech recognition <br> <span class="w3-text w3-text-theme"> Shabnam Ghaffarzadegan, Hynek Bořil, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="moghimi14_interspeech.html"> <p> Post-masking: a hybrid approach to array processing for speech recognition <br> <span class="w3-text w3-text-theme"> Amir R. Moghimi, Bhiksha Raj, Richard M. Stern </span> </p> </a> <a class="w3-text" href="delacallesilos14_interspeech.html"> <p> ASR feature extraction with morphologically-filtered power-normalized cochleograms <br> <span class="w3-text w3-text-theme"> F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, C. Peláez-Moreno </span> </p> </a> <a class="w3-text" href="martinez14_interspeech.html"> <p> Should deep neural nets have ears? the role of auditory features in deep learning approaches <br> <span class="w3-text w3-text-theme"> Angel Mario Castro Martinez, Niko Moritz, Bernd T. Meyer </span> </p> </a> <a class="w3-text" href="fox14_interspeech.html"> <p> Extending Limabeam with discrimination and coarse gradients <br> <span class="w3-text w3-text-theme"> Charles Fox, Thomas Hain </span> </p> </a> <a class="w3-text" href="mukherjee14_interspeech.html"> <p> Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali language <br> <span class="w3-text w3-text-theme"> Sankar Mukherjee, Shyamal Kumar Das Mandal </span> </p> </a> <a class="w3-text" href="moralescordovilla14_interspeech.html"> <p> Room localization for distant speech recognition <br> <span class="w3-text w3-text-theme"> Juan A. Morales-Cordovilla, Hannes Pessentheiner, Martin Hagmüller, Gernot Kubin </span> </p> </a> <a class="w3-text" href="bahaadini14_interspeech.html"> <p> Posterior-based sparse representation for automatic speech recognition <br> <span class="w3-text w3-text-theme"> Sara Bahaadini, Afsaneh Asaei, David Imseng, Hervé Bourlard </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Analysis I, II"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Analysis I, II</h4> <hr> <a class="w3-text" href="tabain14_interspeech.html"> <p> Lateral formants in three central australian languages <br> <span class="w3-text w3-text-theme"> Marija Tabain, Andrew Butcher, Gavan Breen, Richard Beare </span> </p> </a> <a class="w3-text" href="khasanova14_interspeech.html"> <p> Detecting articulatory compensation in acoustic data through linear regression modeling <br> <span class="w3-text w3-text-theme"> Alina Khasanova, Jennifer Cole, Mark Hasegawa-Johnson </span> </p> </a> <a class="w3-text" href="guo14_interspeech.html"> <p> The relationship between the second subglottal resonance and vowel class, standing height, trunk length, and F0 variation for Mandarin speakers <br> <span class="w3-text w3-text-theme"> Jinxi Guo, Angli Liu, Harish Arsikere, Abeer Alwan, Steven M. Lulich </span> </p> </a> <a class="w3-text" href="meenakshi14_interspeech.html"> <p> Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording <br> <span class="w3-text w3-text-theme"> Nisha Meenakshi, Chiranjeevi Yarra, B. K. Yamini, Prasanta Kumar Ghosh </span> </p> </a> <a class="w3-text" href="albuquerque14_interspeech.html"> <p> Impact of age in the production of European Portuguese vowels <br> <span class="w3-text w3-text-theme"> Luciana Albuquerque, Catarina Oliveira, António Teixeira, Pedro Sa-Couto, João Freitas, Miguel Sales Dias </span> </p> </a> <a class="w3-text" href="yu14_interspeech.html"> <p> `houston, we have a solution': a case study of the analysis of astronaut speech during NASA apollo 11 for long-term speaker modeling <br> <span class="w3-text w3-text-theme"> Chengzhu Yu, John H. L. Hansen, Douglas W. Oard </span> </p> </a> <a class="w3-text" href="luan14_interspeech.html"> <p> Relating automatic vowel space estimates to talker intelligibility <br> <span class="w3-text w3-text-theme"> Yi Luan, Richard Wright, Mari Ostendorf, Gina-Anne Levow </span> </p> </a> <a class="w3-text" href="kawahara14b_interspeech.html"> <p> Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation <br> <span class="w3-text w3-text-theme"> Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino </span> </p> </a> <a class="w3-text" href="pedersen14_interspeech.html"> <p> Sparse time-frequency representation of speech by the vandermonde transform <br> <span class="w3-text w3-text-theme"> Christian Fischer Pedersen, Tom Bäckström </span> </p> </a> <a class="w3-text" href="nandwana14_interspeech.html"> <p> Analysis and identification of human scream: implications for speaker recognition <br> <span class="w3-text w3-text-theme"> Mahesh Kumar Nandwana, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="wang14f_interspeech.html"> <p> F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification <br> <span class="w3-text w3-text-theme"> Dongmei Wang, Philipos C. Loizou, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="slaney14_interspeech.html"> <p> The influence of pitch and noise on the discriminability of filterbank features <br> <span class="w3-text w3-text-theme"> Malcolm Slaney, Michael L. Seltzer </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Technologies and Applications"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Technologies and Applications</h4> <hr> <a class="w3-text" href="harwath14_interspeech.html"> <p> Choosing useful word alternates for automatic speech recognition correction interfaces <br> <span class="w3-text w3-text-theme"> David Harwath, Alexander Gruenstein, Ian McGraw </span> </p> </a> <a class="w3-text" href="chen14f_interspeech.html"> <p> An initial investigation of long-term adaptation for meeting transcription <br> <span class="w3-text w3-text-theme"> X. Chen, Mark J. F. Gales, Kate M. Knill, Catherine Breslin, Langzhou Chen, K. K. Chin, Vincent Wan </span> </p> </a> <a class="w3-text" href="ng14_interspeech.html"> <p> Progress in the BBN keyword search system for the DARPA RATS program <br> <span class="w3-text w3-text-theme"> Tim Ng, Roger Hsiao, Le Zhang, Damianos Karakos, Sri Harish Mallidi, Martin Karafiát, Karel Veselý, Igor Szőke, Bing Zhang, Long Nguyen, Richard Schwartz </span> </p> </a> <a class="w3-text" href="nouza14_interspeech.html"> <p> Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive <br> <span class="w3-text w3-text-theme"> Jan Nouza, Petr Cerva, Jindrich Zdansky, Karel Blavka, Marek Bohac, Jan Silovsky, Josef Chaloupka, Michaela Kucharova, Ladislav Seps, Jiri Malek, Michal Rott </span> </p> </a> <a class="w3-text" href="ylmaz14_interspeech.html"> <p> Automatic assessment of children's reading with the FLaVoR decoding using a phone confusion model <br> <span class="w3-text w3-text-theme"> Emre Yılmaz, Joris Pelemans, Hugo Van hamme </span> </p> </a> <a class="w3-text" href="shaik14_interspeech.html"> <p> RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese <br> <span class="w3-text w3-text-theme"> M. Ali Basha Shaik, Zoltán Tüske, M. Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Source Separation and Computational Auditory Scene Analysis"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Source Separation and Computational Auditory Scene Analysis</h4> <hr> <a class="w3-text" href="zohrer14_interspeech.html"> <p> Single channel source separation with general stochastic networks <br> <span class="w3-text w3-text-theme"> Matthias Zöhrer, Franz Pernkopf </span> </p> </a> <a class="w3-text" href="yeung14_interspeech.html"> <p> Large-margin conditional random fields for single-microphone speech separation <br> <span class="w3-text w3-text-theme"> Yu Ting Yeung, Tan Lee, Cheung-Chi Leung </span> </p> </a> <a class="w3-text" href="jafari14_interspeech.html"> <p> On the use of the Watson mixture model for clustering-based under-determined blind source separation <br> <span class="w3-text w3-text-theme"> Ingrid Jafari, Roberto Togneri, Sven Nordholm </span> </p> </a> <a class="w3-text" href="hsu14_interspeech.html"> <p> Binary mask estimation based on frequency modulations <br> <span class="w3-text w3-text-theme"> Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi </span> </p> </a> <a class="w3-text" href="yang14_interspeech.html"> <p> Bayesian factorization and selection for speech and music separation <br> <span class="w3-text w3-text-theme"> Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien </span> </p> </a> <a class="w3-text" href="wohlmayr14_interspeech.html"> <p> Self-adaption in single-channel source separation <br> <span class="w3-text w3-text-theme"> Michael Wohlmayr, Ludwig Mohr, Franz Pernkopf </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Technologies for Ambient Assisted Living (Special Session)"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Technologies for Ambient Assisted Living (Special Session)</h4> <hr> <a class="w3-text" href="vacher14_interspeech.html"> <p> Multichannel automatic recognition of voice command in a multi-room smart home: an experiment involving seniors and users with visual impairment <br> <span class="w3-text w3-text-theme"> Michel Vacher, Benjamin Lecouteux, François Portet </span> </p> </a> <a class="w3-text" href="walter14_interspeech.html"> <p> An evaluation of unsupervised acoustic model training for a dysarthric speech interface <br> <span class="w3-text w3-text-theme"> Oliver Walter, Vladimir Despotovic, Reinhold Haeb-Umbach, Jort F. Gemmeke, Bart Ons, Hugo Van hamme </span> </p> </a> <a class="w3-text" href="gonzalez14_interspeech.html"> <p> Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography <br> <span class="w3-text w3-text-theme"> Jose A. Gonzalez, Lam A. Cheah, Jie Bai, Stephen R. Ell, James M. Gilbert, Roger K. Moore, Phil D. Green </span> </p> </a> <a class="w3-text" href="karpov14_interspeech.html"> <p> Audio-visual signal processing in a multimodal assisted living environment <br> <span class="w3-text w3-text-theme"> Alexey Karpov, Lale Akarun, Hülya Yalçın, Alexander Ronzhin, Barış Evrim Demiröz, Aysun Çoban, Miloš Železný </span> </p> </a> <a class="w3-text" href="ravanelli14_interspeech.html"> <p> On the selection of the impulse responses for distant-speech recognition based on contaminated speech training <br> <span class="w3-text w3-text-theme"> Mirco Ravanelli, Maurizio Omologo </span> </p> </a> <a class="w3-text" href="casanueva14_interspeech.html"> <p> Adaptive speech recognition and dialogue management for users with speech disorders <br> <span class="w3-text w3-text-theme"> I. Casanueva, H. Christensen, Thomas Hain, Phil D. Green </span> </p> </a> <a class="w3-text" href="yu14b_interspeech.html"> <p> Prediction of cognitive performance in an animal fluency task based on rate and articulatory markers <br> <span class="w3-text w3-text-theme"> Bea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt </span> </p> </a> <a class="w3-text" href="ishi14_interspeech.html"> <p> Analysis of laughter events in real science classes by using multiple environment sensor data <br> <span class="w3-text w3-text-theme"> Carlos Ishi, Hiroaki Hatano, Norihiro Hagita </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="DNN for ASR"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">DNN for ASR</h4> <hr> <a class="w3-text" href="sainath14b_interspeech.html"> <p> Parallel deep neural network training for LVCSR tasks using blue gene/Q <br> <span class="w3-text w3-text-theme"> Tara N. Sainath, I-hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra Chaudhari </span> </p> </a> <a class="w3-text" href="bengio14_interspeech.html"> <p> Word embeddings for speech recognition <br> <span class="w3-text w3-text-theme"> Samy Bengio, Georg Heigold </span> </p> </a> <a class="w3-text" href="seide14_interspeech.html"> <p> 1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs <br> <span class="w3-text w3-text-theme"> Frank Seide, Hao Fu, Jasha Droppo, Gang Li, Dong Yu </span> </p> </a> <a class="w3-text" href="takeda14_interspeech.html"> <p> Boundary contraction training for acoustic models based on discrete deep neural networks <br> <span class="w3-text w3-text-theme"> Ryu Takeda, Naoyuki Kanda, Nobuo Nukaga </span> </p> </a> <a class="w3-text" href="kubo14_interspeech.html"> <p> Restructuring output layers of deep neural networks using minimum risk parameter clustering <br> <span class="w3-text w3-text-theme"> Yotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura </span> </p> </a> <a class="w3-text" href="chan14_interspeech.html"> <p> Distributed asynchronous optimization of convolutional neural networks <br> <span class="w3-text w3-text-theme"> William Chan, Ian Lane </span> </p> </a> <a class="w3-text" href="toth14_interspeech.html"> <p> Convolutional deep maxout networks for phone recognition <br> <span class="w3-text w3-text-theme"> László Tóth </span> </p> </a> <a class="w3-text" href="chen14g_interspeech.html"> <p> Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks <br> <span class="w3-text w3-text-theme"> Dongpeng Chen, Brian Mak, Sunil Sivadas </span> </p> </a> <a class="w3-text" href="hsiao14_interspeech.html"> <p> Improving semi-supervised deep neural network for keyword search in low resource languages <br> <span class="w3-text w3-text-theme"> Roger Hsiao, Tim Ng, Le Zhang, Shivesh Ranjan, Stavros Tsakalidis, Long Nguyen, Richard Schwartz </span> </p> </a> <a class="w3-text" href="liu14d_interspeech.html"> <p> Pruning deep neural networks by optimal brain damage <br> <span class="w3-text w3-text-theme"> Chao Liu, Zhiyong Zhang, Dong Wang </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speaker Recognition — General Topics"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speaker Recognition — General Topics</h4> <hr> <a class="w3-text" href="avila14_interspeech.html"> <p> Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems <br> <span class="w3-text w3-text-theme"> Anderson R. Avila, Milton Sarria-Paja, Francisco J. Fraga, Douglas O'Shaughnessy, Tiago H. Falk </span> </p> </a> <a class="w3-text" href="lee14_interspeech.html"> <p> Clustering-based i-vector formulation for speaker recognition <br> <span class="w3-text w3-text-theme"> Hung-Shin Lee, Yu Tsao, Hsin-Min Wang, Shyh-Kang Jeng </span> </p> </a> <a class="w3-text" href="arsikere14_interspeech.html"> <p> Speaker recognition via fusion of subglottal features and MFCCs <br> <span class="w3-text w3-text-theme"> Harish Arsikere, Hitesh Anand Gupta, Abeer Alwan </span> </p> </a> <a class="w3-text" href="sun14_interspeech.html"> <p> The NIST SRE summed channel speaker recognition system <br> <span class="w3-text w3-text-theme"> Hanwu Sun, Bin Ma </span> </p> </a> <a class="w3-text" href="gallardo14b_interspeech.html"> <p> Advantages of wideband over narrowband channels for speaker verification employing MFCCs and LFCCs <br> <span class="w3-text w3-text-theme"> Laura Fernández Gallardo, Michael Wagner, Sebastian Möller </span> </p> </a> <a class="w3-text" href="li14e_interspeech.html"> <p> Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features <br> <span class="w3-text w3-text-theme"> Ming Li, Wenbo Liu </span> </p> </a> <a class="w3-text" href="asha14_interspeech.html"> <p> Feature Switching in the i-vector framework for speaker verification <br> <span class="w3-text w3-text-theme"> T. Asha, M. S. Saranya, D. S. Karthik Pandia, Srikanth Madikeri, Hema A. Murthy </span> </p> </a> <a class="w3-text" href="zhong14_interspeech.html"> <p> PLDA modeling in the fishervoice subspace for speaker verification <br> <span class="w3-text w3-text-theme"> Jinghua Zhong, Weiwu Jiang, Wei Rao, Man-Wai Mak, Helen Meng </span> </p> </a> <a class="w3-text" href="martin14_interspeech.html"> <p> Performance factor analysis for the 2012 NIST speaker recognition evaluation <br> <span class="w3-text w3-text-theme"> Alvin F. Martin, Craig S. Greenberg, Vincent M. Stanford, John M. Howard, George R. Doddington, John J. Godfrey </span> </p> </a> <a class="w3-text" href="fujimura14_interspeech.html"> <p> Simultaneous gender classification and voice activity detection using deep neural networks <br> <span class="w3-text w3-text-theme"> Hiroshi Fujimura </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Processing with Multi-Modalities"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Processing with Multi-Modalities</h4> <hr> <a class="w3-text" href="abdelaziz14_interspeech.html"> <p> Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons <br> <span class="w3-text w3-text-theme"> Ahmed Hussen Abdelaziz, Dorothea Kolossa </span> </p> </a> <a class="w3-text" href="noda14_interspeech.html"> <p> Lipreading using convolutional neural network <br> <span class="w3-text w3-text-theme"> Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata </span> </p> </a> <a class="w3-text" href="tao14_interspeech.html"> <p> Lipreading approach for isolated digits recognition under whisper and neutral speech <br> <span class="w3-text w3-text-theme"> Fei Tao, Carlos Busso </span> </p> </a> <a class="w3-text" href="masaka14_interspeech.html"> <p> Multimodal exemplar-based voice conversion using lip features in noisy environments <br> <span class="w3-text w3-text-theme"> Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki </span> </p> </a> <a class="w3-text" href="deng14b_interspeech.html"> <p> Towards a practical silent speech recognition system <br> <span class="w3-text w3-text-theme"> Yunbin Deng, James T. Heaton, Geoffrey S. Meltzner </span> </p> </a> <a class="w3-text" href="freitas14_interspeech.html"> <p> Enhancing multimodal silent speech interfaces with feature selection <br> <span class="w3-text w3-text-theme"> João Freitas, Artur Ferreira, Mário Figueiredo, António Teixeira, Miguel Sales Dias </span> </p> </a> <a class="w3-text" href="katz14_interspeech.html"> <p> Opti-speech: a real-time, 3d visual feedback system for speech training <br> <span class="w3-text w3-text-theme"> William Katz, Thomas F. Campbell, Jun Wang, Eric Farrar, J. Coleman Eubanks, Arvind Balasubramanian, Balakrishnan Prabhakaran, Rob Rennaker </span> </p> </a> <a class="w3-text" href="wang14g_interspeech.html"> <p> Across-speaker articulatory normalization for speaker-independent silent speech recognition <br> <span class="w3-text w3-text-theme"> Jun Wang, Ashok Samal, Jordan R. Green </span> </p> </a> <a class="w3-text" href="zahner14_interspeech.html"> <p> Conversion from facial myoelectric signals to speech: a unit selection approach <br> <span class="w3-text w3-text-theme"> Marlene Zahner, Matthias Janke, Michael Wand, Tanja Schultz </span> </p> </a> <a class="w3-text" href="wand14_interspeech.html"> <p> Towards real-life application of EMG-based speech recognition by using unsupervised adaptation <br> <span class="w3-text w3-text-theme"> Michael Wand, Tanja Schultz </span> </p> </a> <a class="w3-text" href="liang14_interspeech.html"> <p> Simple gesture-based error correction interface for smartphone speech recognition <br> <span class="w3-text w3-text-theme"> Yuan Liang, Koji Iwano, Koichi Shinoda </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Normalization and Discriminative Training Methods"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Normalization and Discriminative Training Methods</h4> <hr> <a class="w3-text" href="kumar14_interspeech.html"> <p> Normalization of ASR confidence classifier scores via confidence mapping <br> <span class="w3-text w3-text-theme"> Kshitiz Kumar, Chaojun Liu, Yifan Gong </span> </p> </a> <a class="w3-text" href="alumae14_interspeech.html"> <p> Neural network phone duration model for speech recognition <br> <span class="w3-text w3-text-theme"> Tanel Alumäe </span> </p> </a> <a class="w3-text" href="sak14b_interspeech.html"> <p> Sequence discriminative distributed training of long short-term memory recurrent neural networks <br> <span class="w3-text w3-text-theme"> Haşim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao </span> </p> </a> <a class="w3-text" href="huang14c_interspeech.html"> <p> Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition <br> <span class="w3-text w3-text-theme"> Zhen Huang, Jinyu Li, Chao Weng, Chin-Hui Lee </span> </p> </a> <a class="w3-text" href="tang14_interspeech.html"> <p> A comparison of training approaches for discriminative segmental models <br> <span class="w3-text w3-text-theme"> Hao Tang, Kevin Gimpel, Karen Livescu </span> </p> </a> <a class="w3-text" href="mcdermott14_interspeech.html"> <p> Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data <br> <span class="w3-text w3-text-theme"> Erik McDermott, Georg Heigold, Pedro J. Moreno, Andrew Senior, Michiel Bacchiani </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Paralinguistic and Extralinguistic Information"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Paralinguistic and Extralinguistic Information</h4> <hr> <a class="w3-text" href="rao14_interspeech.html"> <p> Detection of children's paralinguistic events in interaction with caregivers <br> <span class="w3-text w3-text-theme"> Hrishikesh Rao, Jonathan C. Kim, Mark A. Clements, Agata Rozga, Daniel S. Messinger </span> </p> </a> <a class="w3-text" href="pettorino14_interspeech.html"> <p> Age and rhythmic variations: a study on Italian <br> <span class="w3-text w3-text-theme"> Massimo Pettorino, Elisa Pellegrino </span> </p> </a> <a class="w3-text" href="cummins14_interspeech.html"> <p> Probabilistic acoustic volume analysis for speech affected by depression <br> <span class="w3-text w3-text-theme"> Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski </span> </p> </a> <a class="w3-text" href="bozkurt14_interspeech.html"> <p> Exploring modulation spectrum features for speech-based depression level classification <br> <span class="w3-text w3-text-theme"> Elif Bozkurt, Orith Toledo-Ronen, Alexander Sorin, Ron Hoory </span> </p> </a> <a class="w3-text" href="honig14_interspeech.html"> <p> Automatic modelling of depressed speech: relevant features and relevance of gender <br> <span class="w3-text w3-text-theme"> Florian Hönig, Anton Batliner, Elmar Nöth, Sebastian Schnieder, Jarek Krajewski </span> </p> </a> <a class="w3-text" href="gangamohan14_interspeech.html"> <p> Excitation source features for discrimination of anger and happy emotions <br> <span class="w3-text w3-text-theme"> P. Gangamohan, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Text Processing for Speech Synthesis"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Text Processing for Speech Synthesis</h4> <hr> <a class="w3-text" href="wu14_interspeech.html"> <p> Encoding linear models as weighted finite-state transducers <br> <span class="w3-text w3-text-theme"> Ke Wu, Cyril Allauzen, Keith Hall, Michael Riley, Brian Roark </span> </p> </a> <a class="w3-text" href="kubo14b_interspeech.html"> <p> Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion <br> <span class="w3-text w3-text-theme"> Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura </span> </p> </a> <a class="w3-text" href="zhang14d_interspeech.html"> <p> Unsupervised language filtering using the latent dirichlet allocation <br> <span class="w3-text w3-text-theme"> Wei Zhang, Robert A. J. Clark, Yongyuan Wang </span> </p> </a> <a class="w3-text" href="kolluru14_interspeech.html"> <p> Generating multiple-accent pronunciations for TTS using joint sequence model interpolation <br> <span class="w3-text w3-text-theme"> BalaKrishna Kolluru, Vincent Wan, Javier Latorre, Kayoko Yanagisawa, Mark J. F. Gales </span> </p> </a> <a class="w3-text" href="mendonca14_interspeech.html"> <p> Using a hybrid approach to build a pronunciation dictionary for Brazilian Portuguese <br> <span class="w3-text w3-text-theme"> Gustavo Mendonça, Sandra Aluisio </span> </p> </a> <a class="w3-text" href="aylett14_interspeech.html"> <p> A flexible front-end for HTS <br> <span class="w3-text w3-text-theme"> Matthew P. Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, Thomas Merritt </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Cross-language Perception and Production"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Cross-language Perception and Production</h4> <hr> <a class="w3-text" href="tsukada14_interspeech.html"> <p> Cross-language perception of Japanese singleton and geminate consonants: preliminary data from non-native learners of Japanese and native speakers of Italian and australian English <br> <span class="w3-text w3-text-theme"> Kimiko Tsukada, Felicity Cox, John Hajek </span> </p> </a> <a class="w3-text" href="alispahic14_interspeech.html"> <p> Difficulty in discriminating non-native vowels: are Dutch vowels easier for australian English than Spanish listeners? <br> <span class="w3-text w3-text-theme"> Samra Alispahic, Paola Escudero, Karen E. Mulak </span> </p> </a> <a class="w3-text" href="yang14b_interspeech.html"> <p> Acoustic properties of shared vowels in bilingual Mandarin-English children <br> <span class="w3-text w3-text-theme"> Jing Yang, Robert Allen Fox </span> </p> </a> <a class="w3-text" href="lecumberri14_interspeech.html"> <p> Generating segmental foreign accent <br> <span class="w3-text w3-text-theme"> María Luisa García Lecumberri, Roberto Barra-Chicote, Rubén Pérez Ramón, Junichi Yamagishi, Martin Cooke </span> </p> </a> <a class="w3-text" href="andreeva14_interspeech.html"> <p> Differences of pitch profiles in Germanic and slavic languages <br> <span class="w3-text w3-text-theme"> Bistra Andreeva, Grażyna Demenko, Bernd Möbius, Frank Zimmerer, Jeanin Jügler, Magdalena Oleskowicz-Popiel </span> </p> </a> <a class="w3-text" href="avanzi14_interspeech.html"> <p> The obligatory contour principle in african and European varieties of French <br> <span class="w3-text w3-text-theme"> Mathieu Avanzi, Guri Bordal, Gélase Nimbona </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Text-Dependent Speaker Verification With Short Utterances (Special"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Text-Dependent Speaker Verification With Short Utterances (Special</h4> <hr> <a class="w3-text" href="scheffer14_interspeech.html"> <p> Content matching for short duration speaker recognition <br> <span class="w3-text w3-text-theme"> Nicolas Scheffer, Yun Lei </span> </p> </a> <a class="w3-text" href="larcher14_interspeech.html"> <p> Extended RSR2015 for text-dependent speaker verification over VHF channel <br> <span class="w3-text w3-text-theme"> Anthony Larcher, Kong Aik Lee, Pablo L. Sordo Martínez, Trung Hieu Nguyen, Bin Ma, Haizhou Li </span> </p> </a> <a class="w3-text" href="fu14_interspeech.html"> <p> Tandem deep features for text-dependent speaker verification <br> <span class="w3-text w3-text-theme"> Tianfan Fu, Yanmin Qian, Yuan Liu, Kai Yu </span> </p> </a> <a class="w3-text" href="kenny14_interspeech.html"> <p> In-domain versus out-of-domain training for text-dependent JFA <br> <span class="w3-text w3-text-theme"> Patrick Kenny, Themos Stafylakis, M. J. Alam, Pierre Ouellet, Marcel Kockmann </span> </p> </a> <a class="w3-text" href="aronowitz14_interspeech.html"> <p> Domain adaptation for text dependent speaker verification <br> <span class="w3-text w3-text-theme"> Hagai Aronowitz, Asaf Rendel </span> </p> </a> <a class="w3-text" href="miguel14_interspeech.html"> <p> Factor analysis with sampling methods for text dependent speaker recognition <br> <span class="w3-text w3-text-theme"> Antonio Miguel, Jesús Villalba, Alfonso Ortega, Eduardo Lleida, Carlos Vaquero </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech and Audio Analysis"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech and Audio Analysis</h4> <hr> <a class="w3-text" href="berg14_interspeech.html"> <p> Dictionary-based pitch tracking with dynamic programming <br> <span class="w3-text w3-text-theme"> Ewout van den Berg, Bhuvana Ramabhadran </span> </p> </a> <a class="w3-text" href="hu14c_interspeech.html"> <p> Acoustic features for robust classification of Mandarin tones <br> <span class="w3-text w3-text-theme"> Hongbing Hu, Stephen A. Zahorian, Peter Guzewich, Jiang Wu </span> </p> </a> <a class="w3-text" href="karlsson14_interspeech.html"> <p> Preservation of lexical tones in singing in a tone language <br> <span class="w3-text w3-text-theme"> Anastasia Karlsson, Håkan Lundström, Jan-Olof Svantesson </span> </p> </a> <a class="w3-text" href="yakoumaki14_interspeech.html"> <p> Emotional speech classification using adaptive sinusoidal modelling <br> <span class="w3-text w3-text-theme"> Theodora Yakoumaki, George P. Kafentzis, Yannis Stylianou </span> </p> </a> <a class="w3-text" href="wang14h_interspeech.html"> <p> Formant enhancement based speech watermarking for tampering detection <br> <span class="w3-text w3-text-theme"> Shengbei Wang, Masashi Unoki, Nam Soo Kim </span> </p> </a> <a class="w3-text" href="barker14_interspeech.html"> <p> Modelling primitive streaming of simple tone sequences through factorisation of modulation pattern tensors <br> <span class="w3-text w3-text-theme"> Tom Barker, Hugo Van hamme, Tuomas Virtanen </span> </p> </a> <a class="w3-text" href="sarma14_interspeech.html"> <p> Detection of vowel onset points in voiced aspirated sounds of indian languages <br> <span class="w3-text w3-text-theme"> Biswajit Dev Sarma, S. R. M. Prasanna </span> </p> </a> <a class="w3-text" href="sasou14_interspeech.html"> <p> Accuracy evaluation of esophageal voice analysis based on automatic topology generated-voicing source HMM <br> <span class="w3-text w3-text-theme"> Akira Sasou </span> </p> </a> <a class="w3-text" href="zhang14e_interspeech.html"> <p> Audio watermarking based on multiple echoes hiding for FM radio <br> <span class="w3-text w3-text-theme"> Xuejun Zhang, Xiang Xie </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Cross-Lingual and Adaptive Language Modeling"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Cross-Lingual and Adaptive Language Modeling</h4> <hr> <a class="w3-text" href="motlicek14_interspeech.html"> <p> Development of bilingual ASR system for MediaParl corpus <br> <span class="w3-text w3-text-theme"> Petr Motlicek, David Imseng, Milos Cernak, Namhoon Kim </span> </p> </a> <a class="w3-text" href="li14f_interspeech.html"> <p> Investigation of cross-lingual bottleneck features in hybrid ASR systems <br> <span class="w3-text w3-text-theme"> Jie Li, Rong Zheng, Bo Xu </span> </p> </a> <a class="w3-text" href="giwa14_interspeech.html"> <p> Language identification of individual words with joint sequence models <br> <span class="w3-text w3-text-theme"> Oluwapelumi Giwa, Marelie H. Davel </span> </p> </a> <a class="w3-text" href="anguera14_interspeech.html"> <p> Audio-to-text alignment for speech recognition with very limited resources <br> <span class="w3-text w3-text-theme"> Xavier Anguera, Jordi Luque, Ciro Gracia </span> </p> </a> <a class="w3-text" href="ngo14_interspeech.html"> <p> A minimal-resource transliteration framework for vietnamese <br> <span class="w3-text w3-text-theme"> Hoang Gia Ngo, Nancy F. Chen, Sunil Sivadas, Bin Ma, Haizhou Li </span> </p> </a> <a class="w3-text" href="adel14b_interspeech.html"> <p> Combining recurrent neural networks and factored language models during decoding of code-Switching speech <br> <span class="w3-text w3-text-theme"> Heike Adel, Dominic Telaar, Ngoc Thang Vu, Katrin Kirchhoff, Tanja Schultz </span> </p> </a> <a class="w3-text" href="tuske14b_interspeech.html"> <p> Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages <br> <span class="w3-text w3-text-theme"> Zoltán Tüske, Pavel Golik, David Nolden, Ralf Schlüter, Hermann Ney </span> </p> </a> <a class="w3-text" href="masumura14_interspeech.html"> <p> Mixture of latent words language models for domain adaptation <br> <span class="w3-text w3-text-theme"> Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi </span> </p> </a> <a class="w3-text" href="herms14_interspeech.html"> <p> Improving spoken document retrieval by unsupervised language model adaptation using utterance-based web search <br> <span class="w3-text w3-text-theme"> Robert Herms, Marc Ritter, Thomas Wilhelm-Stein, Maximilian Eibl </span> </p> </a> <a class="w3-text" href="chien14_interspeech.html"> <p> The nested indian buffet process for flexible topic modeling <br> <span class="w3-text w3-text-theme"> Jen-Tzung Chien, Ying-Lan Chang </span> </p> </a> <a class="w3-text" href="levin14_interspeech.html"> <p> Automated closed captioning for Russian live broadcasting <br> <span class="w3-text w3-text-theme"> K. Levin, I. Ponomareva, A. Bulusheva, G. Chernykh, I. Medennikov, N. Merkin, A. Prudnikov, Natalia Tomashenko </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Pronunciation Modeling and Learning"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Pronunciation Modeling and Learning</h4> <hr> <a class="w3-text" href="wang14i_interspeech.html"> <p> Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transfer <br> <span class="w3-text w3-text-theme"> Lei Wang, Rong Tong </span> </p> </a> <a class="w3-text" href="rutherford14_interspeech.html"> <p> Pronunciation learning for named-entities through crowd-sourcing <br> <span class="w3-text w3-text-theme"> Attapol T. Rutherford, Fuchun Peng, Françoise Beaufays </span> </p> </a> <a class="w3-text" href="schuppler14_interspeech.html"> <p> Pronunciation variation in read and conversational austrian German <br> <span class="w3-text w3-text-theme"> Barbara Schuppler, Martine Adda-Decker, Juan A. Morales-Cordovilla </span> </p> </a> <a class="w3-text" href="lehr14_interspeech.html"> <p> Discriminative pronunciation modeling for dialectal speech recognition <br> <span class="w3-text w3-text-theme"> Maider Lehr, Kyle Gorman, Izhak Shafran </span> </p> </a> <a class="w3-text" href="pellegrini14_interspeech.html"> <p> The goodness of pronunciation algorithm applied to disordered speech <br> <span class="w3-text w3-text-theme"> Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Jérôme Farinas, Marina Robert </span> </p> </a> <a class="w3-text" href="metallinou14_interspeech.html"> <p> Using deep neural networks to improve proficiency assessment for children English language learners <br> <span class="w3-text w3-text-theme"> Angeliki Metallinou, Jian Cheng </span> </p> </a> <a class="w3-text" href="lu14c_interspeech.html"> <p> Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM) <br> <span class="w3-text w3-text-theme"> Han Lu, Sheng-syun Shen, Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee </span> </p> </a> <a class="w3-text" href="duan14_interspeech.html"> <p> A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners <br> <span class="w3-text w3-text-theme"> Richeng Duan, Jinsong Zhang, Wen Cao, Yanlu Xie </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Show and Tell Session 1, 1"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Show and Tell Session 1, 1</h4> <hr> <a class="w3-text" href="xu14b_interspeech.html"> <p> 3d tongue motion visualization based on ultrasound image sequences <br> <span class="w3-text w3-text-theme"> Kele Xu, Yin Yang, A. Jaumard-Hakoun, Martine Adda-Decker, A. Amelot, S. K. Al Kork, L. Crevier-Buchman, P. Chawah, G. Dreyfus, T. Fux, C. Pillot-Loiseau, P. Roussel, M. Stone, B. Denby </span> </p> </a> <a class="w3-text" href="derrick14_interspeech.html"> <p> Listen with your skin: aerotak speech perception enhancement system <br> <span class="w3-text w3-text-theme"> Donald Derrick, Tom De Rybel, Greg A. O'Beirne, Jennifer Hay </span> </p> </a> <a class="w3-text" href="czap14_interspeech.html"> <p> Speech assistant system <br> <span class="w3-text w3-text-theme"> László Czap </span> </p> </a> <a class="w3-text" href="banchs14_interspeech.html"> <p> Spoken dialogue system for restaurant recommendation and reservation <br> <span class="w3-text w3-text-theme"> Rafael E. Banchs, Seokhwan Kim </span> </p> </a> <a class="w3-text" href="akira14_interspeech.html"> <p> Interlingual map task corpus collection <br> <span class="w3-text w3-text-theme"> Hayakawa Akira, Nick Campbell, Saturnino Luz </span> </p> </a> <a class="w3-text" href="centelles14_interspeech.html"> <p> A client mobile application for Chinese-Spanish statistical machine translation <br> <span class="w3-text w3-text-theme"> Jordi Centelles, Marta R. Costa-jussà, Rafael E. Banchs </span> </p> </a> <a class="w3-text" href="benin14_interspeech.html"> <p> LuciawebGL: a new WebGL-Based talking head <br> <span class="w3-text w3-text-theme"> Alberto Benin, Piero Cosi, Giuseppe Riccardo Leone, Giulio Paci </span> </p> </a> <a class="w3-text" href="naderi14_interspeech.html"> <p> Crowdee: mobile crowdsourcing micro-task platform for celebrating the diversity of languages <br> <span class="w3-text w3-text-theme"> Babak Naderi, Tim Polzehl, André Beyer, Tibor Pilz, Sebastian Möller </span> </p> </a> <a class="w3-text" href="moore14_interspeech.html"> <p> On the use of the `pure data' programming language for teaching and public outreach in speech processing <br> <span class="w3-text w3-text-theme"> Roger K. Moore </span> </p> </a> <a class="w3-text" href="dubinsky14_interspeech.html"> <p> Syncwords: a platform for semi-automated closed captioning and subtitles <br> <span class="w3-text w3-text-theme"> Aleksandr Dubinsky </span> </p> </a> <a class="w3-text" href="clark14_interspeech.html"> <p> Simple<SUP>4</SUP>all <br> <span class="w3-text w3-text-theme"> Robert A. J. Clark </span> </p> </a> <a class="w3-text" href="chawah14_interspeech.html"> <p> An educational platform to capture, visualize and analyze rare singing <br> <span class="w3-text w3-text-theme"> P. Chawah, S. K. Al Kork, T. Fux, Martine Adda-Decker, A. Amelot, N. Audibert, B. Denby, G. Dreyfus, A. Jaumard-Hakoun, C. Pillot-Loiseau, P. Roussel, M. Stone, Kele Xu, L. Crevier-Buchman </span> </p> </a> <a class="w3-text" href="jeon14_interspeech.html"> <p> Single-channel speech enhancement based on non-negative matrix factorization and online noise adaptation <br> <span class="w3-text w3-text-theme"> Kwang Myung Jeon, Chan Jun Chun, Woo Kyeong Seong, Hong Kook Kim, Myung Kyu Choi </span> </p> </a> <a class="w3-text" href="maurer14_interspeech.html"> <p> Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese opera singer <br> <span class="w3-text w3-text-theme"> Dieter Maurer, Peggy Mok, Daniel Friedrichs, Volker Dellwo </span> </p> </a> <a class="w3-text" href="mowlaee14_interspeech.html"> <p> Iterative refinement of amplitude and phase in single-channel speech enhancement <br> <span class="w3-text w3-text-theme"> Pejman Mowlaee, Mario Kaoru Watanabe, Rahim Saeidi </span> </p> </a> <a class="w3-text" href="roekhaut14_interspeech.html"> <p> elite-HTS: a NLP tool for French HMM-based speech synthesis <br> <span class="w3-text w3-text-theme"> Sophie Roekhaut, Sandrine Brognaux, Richard Beaufort, Thierry Dutoit </span> </p> </a> <a class="w3-text" href="niculescu14_interspeech.html"> <p> SARA — singapore's automated responsive assistant for the touristic domain <br> <span class="w3-text w3-text-theme"> Andreea I. Niculescu, Rafael E. Banchs, Ridong Jiang, Seokhwan Kim, Kheng Hui Yeo, Arthur Niswar </span> </p> </a> <a class="w3-text" href="plummer14_interspeech.html"> <p> The speech recognition virtual kitchen: launch party <br> <span class="w3-text w3-text-theme"> Andrew Plummer, Eric Riebling, Anuj Kumar, Florian Metze, Eric Fosler-Lussier, Rebecca Bates </span> </p> </a> <a class="w3-text" href="marekspartz14_interspeech.html"> <p> System for automated speech and language analysis (SALSA) <br> <span class="w3-text w3-text-theme"> Kyle Marek-Spartz, Benjamin Knoll, Robert Bill, Thomas Christie, Serguei Pakhomov </span> </p> </a> <a class="w3-text" href="masudakatsuse14_interspeech.html"> <p> Pronunciation practice support system for children who have difficulty correctly pronouncing words <br> <span class="w3-text w3-text-theme"> Ikuyo Masuda-Katsuse </span> </p> </a> <a class="w3-text" href="driesen14_interspeech.html"> <p> Automated production of true-cased punctuated subtitles for weather and news broadcasts <br> <span class="w3-text w3-text-theme"> Joris Driesen, Alexandra Birch, Simon Grimsey, Saeid Safarfashandi, Juliet Gauthier, Matt Simpson, Steve Renals </span> </p> </a> <a class="w3-text" href="dong14_interspeech.html"> <p> I<SUP>2</SUP>r speech2singing perfects everyone's singing <br> <span class="w3-text w3-text-theme"> Minghui Dong, S. W. Lee, Haizhou Li, Paul Chan, Xuejian Peng, Jochen Walter Ehnes, Dongyan Huang </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Statistical Parametric Speech Synthesis"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Statistical Parametric Speech Synthesis</h4> <hr> <a class="w3-text" href="henter14_interspeech.html"> <p> Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech <br> <span class="w3-text w3-text-theme"> Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, Simon King </span> </p> </a> <a class="w3-text" href="merritt14_interspeech.html"> <p> Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis <br> <span class="w3-text w3-text-theme"> Thomas Merritt, Tuomo Raitio, Simon King </span> </p> </a> <a class="w3-text" href="latorre14b_interspeech.html"> <p> Voice expression conversion with factorised HMM-TTS models <br> <span class="w3-text w3-text-theme"> Javier Latorre, Vincent Wan, Kayoko Yanagisawa </span> </p> </a> <a class="w3-text" href="yanagisawa14_interspeech.html"> <p> Noise-robust TTS speaker adaptation with statistics smoothing <br> <span class="w3-text w3-text-theme"> Kayoko Yanagisawa, Langzhou Chen, Mark J. F. Gales </span> </p> </a> <a class="w3-text" href="brognaux14_interspeech.html"> <p> Speech synthesis in various communicative situations: impact of pronunciation variations <br> <span class="w3-text w3-text-theme"> Sandrine Brognaux, Benjamin Picart, Thomas Drugman </span> </p> </a> <a class="w3-text" href="cai14_interspeech.html"> <p> Formant-controlled speech synthesis using hidden trajectory model <br> <span class="w3-text w3-text-theme"> Ming-Qi Cai, Zhen-Hua Ling, Li-Rong Dai </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Voice Activity Detection"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Voice Activity Detection</h4> <hr> <a class="w3-text" href="zhang14f_interspeech.html"> <p> Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection <br> <span class="w3-text w3-text-theme"> Xiao-Lei Zhang, DeLiang Wang </span> </p> </a> <a class="w3-text" href="prasad14_interspeech.html"> <p> Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection <br> <span class="w3-text w3-text-theme"> Abhay Prasad, Prasanta Kumar Ghosh, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="ziaei14b_interspeech.html"> <p> Speech activity detection for NASA apollo space missions: challenges and solutions <br> <span class="w3-text w3-text-theme"> Ali Ziaei, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen, Douglas W. Oard </span> </p> </a> <a class="w3-text" href="tu14_interspeech.html"> <p> Towards improving statistical model based voice activity detection <br> <span class="w3-text w3-text-theme"> Ming Tu, Xiang Xie, Yishan Jiao </span> </p> </a> <a class="w3-text" href="mcloughlin14_interspeech.html"> <p> The use of low-frequency ultrasound for voice activity detection <br> <span class="w3-text w3-text-theme"> Ian Vince McLoughlin </span> </p> </a> <a class="w3-text" href="ma14_interspeech.html"> <p> Improving the speech activity detection for the DARPA RATS phase-3 evaluation <br> <span class="w3-text w3-text-theme"> Jeff Ma </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Disordered Speech"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Disordered Speech</h4> <hr> <a class="w3-text" href="le14_interspeech.html"> <p> Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation <br> <span class="w3-text w3-text-theme"> Duc Le, Emily Mower Provost </span> </p> </a> <a class="w3-text" href="strombergsson14_interspeech.html"> <p> Ranking severity of speech errors by their phonological impact in context <br> <span class="w3-text w3-text-theme"> Sofia Strömbergsson, Christina Tånnander, Jens Edlund </span> </p> </a> <a class="w3-text" href="orozcoarroyave14_interspeech.html"> <p> Automatic detection of parkinson's disease from words uttered in three different languages <br> <span class="w3-text w3-text-theme"> J. R. Orozco-Arroyave, Florian Hönig, J. D. Arias-Londoño, J. F. Vargas-Bonilla, S. Skodda, J. Rusz, Elmar Nöth </span> </p> </a> <a class="w3-text" href="lilley14b_interspeech.html"> <p> Automating an objective measure of pediatric speech intelligibility <br> <span class="w3-text w3-text-theme"> Jason Lilley, Susan Nittrouer, H. Timothy Bunnell </span> </p> </a> <a class="w3-text" href="shahin14_interspeech.html"> <p> A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech <br> <span class="w3-text w3-text-theme"> Mostafa Shahin, Beena Ahmed, Jacqueline McKechnie, Kirrie Ballard, Ricardo Gutierrez-Osuna </span> </p> </a> <a class="w3-text" href="berry14_interspeech.html"> <p> Acoustic and kinematic characteristics of vowel production through a virtual vocal tract in dysarthria <br> <span class="w3-text w3-text-theme"> Jeff Berry, Andrew Kolb, Cassandra North, Michael T. Johnson </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech and Multimodal Resources"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech and Multimodal Resources</h4> <hr> <a class="w3-text" href="wand14b_interspeech.html"> <p> The EMG-UKA corpus for electromyographic speech processing <br> <span class="w3-text w3-text-theme"> Michael Wand, Matthias Janke, Tanja Schultz </span> </p> </a> <a class="w3-text" href="lee14b_interspeech.html"> <p> A whispered Mandarin corpus for speech technology applications <br> <span class="w3-text w3-text-theme"> Pei Xuan Lee, Darren Wee, Hilary Si Yin Toh, Boon Pang Lim, Nancy F. Chen, Bin Ma </span> </p> </a> <a class="w3-text" href="gretter14_interspeech.html"> <p> Euronews: a multilingual benchmark for ASR and LID <br> <span class="w3-text w3-text-theme"> Roberto Gretter </span> </p> </a> <a class="w3-text" href="tsiami14_interspeech.html"> <p> ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece) <br> <span class="w3-text w3-text-theme"> Antigoni Tsiami, Isidoros Rodomagoulakis, Panagiotis Giannoulis, Athanasios Katsamanis, Gerasimos Potamianos, Petros Maragos </span> </p> </a> <a class="w3-text" href="matassoni14_interspeech.html"> <p> The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones <br> <span class="w3-text w3-text-theme"> Marco Matassoni, Ramón Fernandez Astudillo, Athanasios Katsamanis, Mirco Ravanelli </span> </p> </a> <a class="w3-text" href="henriques14_interspeech.html"> <p> Verbal description of LEGO blocks <br> <span class="w3-text w3-text-theme"> Diogo Henriques, Isabel Trancoso, Daniel Mendes, Alfredo Ferreira </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Phase Importance in Speech Processing Applications (Special Session)"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Phase Importance in Speech Processing Applications (Special Session)</h4> <hr> <a class="w3-text" href="mowlaee14b_interspeech.html"> <p> Phase importance in speech processing applications <br> <span class="w3-text w3-text-theme"> Pejman Mowlaee, Rahim Saeidi, Yannis Stylianou </span> </p> </a> <a class="w3-text" href="cano14_interspeech.html"> <p> Phase-based harmonic/percussive separation <br> <span class="w3-text w3-text-theme"> Estefanía Cano, Mark Plumbley, Christian Dittmar </span> </p> </a> <a class="w3-text" href="degottex14_interspeech.html"> <p> Phase distortion statistics as a representation of the glottal source: application to the classification of voice qualities <br> <span class="w3-text w3-text-theme"> Gilles Degottex, Nicolas Obin </span> </p> </a> <a class="w3-text" href="degottex14b_interspeech.html"> <p> A measure of phase randomness for the harmonic model in speech synthesis <br> <span class="w3-text w3-text-theme"> Gilles Degottex, Daniel Erro </span> </p> </a> <a class="w3-text" href="jokinen14_interspeech.html"> <p> Enhancement of speech intelligibility in near-end noise conditions with phase modification <br> <span class="w3-text w3-text-theme"> Emma Jokinen, Marko Takanen, Hannu Pulakka, Paavo Alku </span> </p> </a> <a class="w3-text" href="shanmugam14_interspeech.html"> <p> A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation <br> <span class="w3-text w3-text-theme"> S. Aswin Shanmugam, Hema Murthy </span> </p> </a> <a class="w3-text" href="koutsogiannaki14_interspeech.html"> <p> The importance of phase on voice quality assessment <br> <span class="w3-text w3-text-theme"> Maria Koutsogiannaki, Olympia Simantiraki, Gilles Degottex, Yannis Stylianou </span> </p> </a> <a class="w3-text" href="vijayan14_interspeech.html"> <p> Feature extraction from analytic phase of speech signals for speaker verification <br> <span class="w3-text w3-text-theme"> Karthika Vijayan, Vinay Kumar, K. Sri Rama Murty </span> </p> </a> <a class="w3-text" href="sanchez14b_interspeech.html"> <p> A cross-vocoder study of speaker independent synthetic speech detection using phase information <br> <span class="w3-text w3-text-theme"> Jon Sanchez, Ibon Saratxaga, Inma Hernaez, Eva Navas, Daniel Erro </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Spoken Term Detection and Document Retrieval"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Spoken Term Detection and Document Retrieval</h4> <hr> <a class="w3-text" href="yang14c_interspeech.html"> <p> Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection <br> <span class="w3-text w3-text-theme"> Peng Yang, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li </span> </p> </a> <a class="w3-text" href="hout14_interspeech.html"> <p> Recent improvements in SRI's keyword detection system for noisy audio <br> <span class="w3-text w3-text-theme"> Julien van Hout, Vikramjit Mitra, Yun Lei, Dimitra Vergyri, Martin Graciarena, Arindam Mandal, Horacio Franco </span> </p> </a> <a class="w3-text" href="makino14_interspeech.html"> <p> Utilizing state-level distance vector representation for improved spoken term detection by text and spoken queries <br> <span class="w3-text w3-text-theme"> Mitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai </span> </p> </a> <a class="w3-text" href="pappagari14_interspeech.html"> <p> Unsupervised spoken word retrieval using Gaussian-bernoulli restricted boltzmann machines <br> <span class="w3-text w3-text-theme"> Raghavendra Reddy Pappagari, Shekhar Nayak, K. Sri Rama Murty </span> </p> </a> <a class="w3-text" href="george14_interspeech.html"> <p> Unsupervised query-by-example spoken term detection using bag of acoustic words and non-segmental dynamic time warping <br> <span class="w3-text w3-text-theme"> Basil George, Abhijeet Saxena, Gautam Mantena, Kishore Prahallad, B. Yegnanarayana </span> </p> </a> <a class="w3-text" href="li14g_interspeech.html"> <p> An empirical study of multilingual and low-resource spoken term detection using deep neural networks <br> <span class="w3-text w3-text-theme"> Jie Li, Xiaorui Wang, Bo Xu </span> </p> </a> <a class="w3-text" href="schulam14_interspeech.html"> <p> Diagnostic techniques for spoken keyword discovery <br> <span class="w3-text w3-text-theme"> Peter Schulam, Murat Akbacak </span> </p> </a> <a class="w3-text" href="kawasaki14_interspeech.html"> <p> Robust retrieval models for false positive errors in spoken documents <br> <span class="w3-text w3-text-theme"> Sho Kawasaki, Tomoyosi Akiba </span> </p> </a> <a class="w3-text" href="liou14_interspeech.html"> <p> Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features <br> <span class="w3-text w3-text-theme"> Yuan-ming Liou, Yi-sheng Fu, Hung-yi Lee, Lin-shan Lee </span> </p> </a> <a class="w3-text" href="gravier14_interspeech.html"> <p> Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion <br> <span class="w3-text w3-text-theme"> Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion, Frédéric Bimbot </span> </p> </a> <a class="w3-text" href="garcia14_interspeech.html"> <p> Semantically based search in a social speech task <br> <span class="w3-text w3-text-theme"> Fernando García, Emilio Sanchis, Ferran Pla </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Prosody and Paralinguistic Information"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Prosody and Paralinguistic Information</h4> <hr> <a class="w3-text" href="mittal14b_interspeech.html"> <p> Study of changes in glottal vibration characteristics during laughter <br> <span class="w3-text w3-text-theme"> Vinay Kumar Mittal, B. Yegnanarayana </span> </p> </a> <a class="w3-text" href="ntalampiras14_interspeech.html"> <p> On predicting the unpleasantness level of a sound event <br> <span class="w3-text w3-text-theme"> Stavros Ntalampiras, Ilyas Potamitis </span> </p> </a> <a class="w3-text" href="piot14_interspeech.html"> <p> Predicting when to laugh with structured classification <br> <span class="w3-text w3-text-theme"> Bilal Piot, Olivier Pietquin, Matthieu Geist </span> </p> </a> <a class="w3-text" href="weiss14_interspeech.html"> <p> Conversational structures affecting auditory likeability <br> <span class="w3-text w3-text-theme"> Benjamin Weiss, Katrin Schoenenberg </span> </p> </a> <a class="w3-text" href="avanzi14b_interspeech.html"> <p> Towards the adaptation of prosodic models for expressive text-to-speech synthesis <br> <span class="w3-text w3-text-theme"> Mathieu Avanzi, George Christodoulides, Damien Lolive, Elisabeth Delais-Roussarie, Nelly Barbot </span> </p> </a> <a class="w3-text" href="matsumiya14_interspeech.html"> <p> Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus <br> <span class="w3-text w3-text-theme"> Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura </span> </p> </a> <a class="w3-text" href="tseng14_interspeech.html"> <p> Learning L2 prosody is more difficult than you realize — F0 characteristics and chunking size of L1 English, TW L2 English and TW L1 Mandarin <br> <span class="w3-text w3-text-theme"> Chiu-yu Tseng, Chao-yu Su </span> </p> </a> <a class="w3-text" href="truong14b_interspeech.html"> <p> Investigating prosodic relations between initiating and responding laughs <br> <span class="w3-text w3-text-theme"> Khiet P. Truong, Jürgen Trouvain </span> </p> </a> <a class="w3-text" href="prylipko14_interspeech.html"> <p> Application of image processing methods to filled pauses detection from spontaneous speech <br> <span class="w3-text w3-text-theme"> Dmytro Prylipko, Olga Egorow, Ingo Siegert, Andreas Wendemuth </span> </p> </a> <a class="w3-text" href="kakouros14_interspeech.html"> <p> Perception of sentence stress in English infant directed speech <br> <span class="w3-text w3-text-theme"> Sofoklis Kakouros, Okko Räsänen </span> </p> </a> <a class="w3-text" href="madzlan14_interspeech.html"> <p> Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis <br> <span class="w3-text w3-text-theme"> Noor Alhusna Madzlan, JingGuang Han, Francesca Bonin, Nick Campbell </span> </p> </a> <a class="w3-text" href="katerenchuk14b_interspeech.html"> <p> “was that your mother on the phone?”: classifying interpersonal relationships between dialog participants with lexical and acoustic properties <br> <span class="w3-text w3-text-theme"> Denys Katerenchuk, David Guy Brizan, Andrew Rosenberg </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Features and Robustness in Speaker and Language Recognition"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Features and Robustness in Speaker and Language Recognition</h4> <hr> <a class="w3-text" href="das14_interspeech.html"> <p> Combining source and system information for limited data speaker verification <br> <span class="w3-text w3-text-theme"> Rohan Kumar Das, S. Abhiram, S. R. M. Prasanna, A. G. Ramakrishnan </span> </p> </a> <a class="w3-text" href="diez14_interspeech.html"> <p> New insight into the use of phone log-likelihood ratios as features for language recognition <br> <span class="w3-text w3-text-theme"> Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, German Bordel </span> </p> </a> <a class="w3-text" href="ganapathy14_interspeech.html"> <p> Robust language identification using convolutional neural network features <br> <span class="w3-text w3-text-theme"> Sriram Ganapathy, Kyu Han, Samuel Thomas, Mohamed Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="yu14c_interspeech.html"> <p> Acoustic feature transformation using UBM-based LDA for speaker recognition <br> <span class="w3-text w3-text-theme"> Chengzhu Yu, Gang Liu, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="mak14_interspeech.html"> <p> SNR-dependent mixture of PLDA for noise robust speaker verification <br> <span class="w3-text w3-text-theme"> Man-Wai Mak </span> </p> </a> <a class="w3-text" href="sadjadi14_interspeech.html"> <p> Nearest neighbor discriminant analysis for robust speaker recognition <br> <span class="w3-text w3-text-theme"> Seyed Omid Sadjadi, Jason Pelecanos, Weizhong Zhu </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Topic Spotting and Summarization of Spoken Documents"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Topic Spotting and Summarization of Spoken Documents</h4> <hr> <a class="w3-text" href="liu14e_interspeech.html"> <p> Enhanced language modeling for extractive speech summarization with sentence relatedness information <br> <span class="w3-text w3-text-theme"> Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu </span> </p> </a> <a class="w3-text" href="morchid14b_interspeech.html"> <p> I-vector based representation of highly imperfect automatic transcriptions <br> <span class="w3-text w3-text-theme"> Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linarès, Driss Matrouf, Renato De Mori </span> </p> </a> <a class="w3-text" href="lai14b_interspeech.html"> <p> Incorporating lexical and prosodic information at different levels for meeting summarization <br> <span class="w3-text w3-text-theme"> Catherine Lai, Steve Renals </span> </p> </a> <a class="w3-text" href="bouallegue14_interspeech.html"> <p> Subspace Gaussian mixture models for dialogues classification <br> <span class="w3-text w3-text-theme"> Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori </span> </p> </a> <a class="w3-text" href="bouallegue14b_interspeech.html"> <p> Factor analysis based semantic variability compensation for automatic conversation representation <br> <span class="w3-text w3-text-theme"> Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori </span> </p> </a> <a class="w3-text" href="bouchekif14_interspeech.html"> <p> Speech cohesion for topic segmentation of spoken contents <br> <span class="w3-text w3-text-theme"> Abdessalam Bouchekif, Géraldine Damnati, Delphine Charlet </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="DNN Learning"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">DNN Learning</h4> <hr> <a class="w3-text" href="huang14d_interspeech.html"> <p> A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models <br> <span class="w3-text w3-text-theme"> Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong </span> </p> </a> <a class="w3-text" href="bacchiani14_interspeech.html"> <p> Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition <br> <span class="w3-text w3-text-theme"> Michiel Bacchiani, Andrew Senior, Georg Heigold </span> </p> </a> <a class="w3-text" href="jaitly14_interspeech.html"> <p> Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models <br> <span class="w3-text w3-text-theme"> Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton </span> </p> </a> <a class="w3-text" href="li14h_interspeech.html"> <p> Learning small-size DNN with output-distribution-based criteria <br> <span class="w3-text w3-text-theme"> Jinyu Li, Rui Zhao, Jui-Ting Huang, Yifan Gong </span> </p> </a> <a class="w3-text" href="deng14c_interspeech.html"> <p> Ensemble deep learning for speech recognition <br> <span class="w3-text w3-text-theme"> Li Deng, John C. Platt </span> </p> </a> <a class="w3-text" href="zhou14_interspeech.html"> <p> Learning conditional random field with hierarchical representations for dialogue act recognition <br> <span class="w3-text w3-text-theme"> Yucan Zhou, Qinghua Hu, Jie Liu, Yuan Jia </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Perception of Emotion and Prosody"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Perception of Emotion and Prosody</h4> <hr> <a class="w3-text" href="hsu14b_interspeech.html"> <p> Can adolescents with autism perceive emotional prosody? <br> <span class="w3-text w3-text-theme"> Cristiane Hsu, Yi Xu </span> </p> </a> <a class="w3-text" href="schmidt14_interspeech.html"> <p> Age, hearing loss and the perception of affective utterances in conversational speech <br> <span class="w3-text w3-text-theme"> Juliane Schmidt, Esther Janse, Odette Scharenborg </span> </p> </a> <a class="w3-text" href="yang14d_interspeech.html"> <p> Analysis of emotional effect on speech-body gesture interplay <br> <span class="w3-text w3-text-theme"> Zhaojun Yang, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="chappuis14_interspeech.html"> <p> When voices get emotional: a study of emotion-enhanced memory and impairment during emotional prosody exposure <br> <span class="w3-text w3-text-theme"> Cyrielle Chappuis, Didier Grandjean </span> </p> </a> <a class="w3-text" href="zellers14_interspeech.html"> <p> Perception of pitch tails at potential turn boundaries in Swedish <br> <span class="w3-text w3-text-theme"> Margaret Zellers </span> </p> </a> <a class="w3-text" href="fuchs14_interspeech.html"> <p> Towards a perceptual model of speech rhythm: integrating the influence of f0 on perceived duration <br> <span class="w3-text w3-text-theme"> Robert Fuchs </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Deep Neural Networks for Speech Generation and Synthesis (Special"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Deep Neural Networks for Speech Generation and Synthesis (Special</h4> <hr> <a class="w3-text" href="chen14h_interspeech.html"> <p> DNN-based stochastic postfilter for HMM-based speech synthesis <br> <span class="w3-text w3-text-theme"> Ling-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi, Zhen-Hua Ling </span> </p> </a> <a class="w3-text" href="kang14_interspeech.html"> <p> Statistical parametric speech synthesis using weighted multi-distribution deep belief network <br> <span class="w3-text w3-text-theme"> Shiyin Kang, Helen Meng </span> </p> </a> <a class="w3-text" href="fan14_interspeech.html"> <p> TTS synthesis with bidirectional LSTM based recurrent neural networks <br> <span class="w3-text w3-text-theme"> Yuchen Fan, Yao Qian, Feng-Long Xie, Frank K. Soong </span> </p> </a> <a class="w3-text" href="raitio14_interspeech.html"> <p> Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort <br> <span class="w3-text w3-text-theme"> Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, Paavo Alku </span> </p> </a> <a class="w3-text" href="yu14d_interspeech.html"> <p> An introduction to computational networks and the computational network toolkit (invited talk) <br> <span class="w3-text w3-text-theme"> Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoff Zweig, Chris Rossbach, Jon Currey </span> </p> </a> <a class="w3-text" href="fernandez14b_interspeech.html"> <p> Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks <br> <span class="w3-text w3-text-theme"> Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory </span> </p> </a> <a class="w3-text" href="yin14_interspeech.html"> <p> Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree <br> <span class="w3-text w3-text-theme"> Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai </span> </p> </a> <a class="w3-text" href="nakashika14_interspeech.html"> <p> High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion <br> <span class="w3-text w3-text-theme"> Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki </span> </p> </a> <a class="w3-text" href="xie14b_interspeech.html"> <p> Sequence error (SE) minimization training of neural network for voice conversion <br> <span class="w3-text w3-text-theme"> Feng-Long Xie, Yao Qian, Yuchen Fan, Frank K. Soong, Haifeng Li </span> </p> </a> <a class="w3-text" href="bocquelet14_interspeech.html"> <p> Robust articulatory speech synthesis using deep neural networks for BCI applications <br> <span class="w3-text w3-text-theme"> Florent Bocquelet, Thomas Hueber, Laurent Girin, Pierre Badin, Blaise Yvert </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Analysis and Perception"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Analysis and Perception</h4> <hr> <a class="w3-text" href="xu14c_interspeech.html"> <p> Acoustic investigation of /t<SUP>h</SUP>/ lenition in brunei Mandarin <br> <span class="w3-text w3-text-theme"> Shufang Xu </span> </p> </a> <a class="w3-text" href="wang14j_interspeech.html"> <p> Mapping emotions into acoustic space: the role of voice quality <br> <span class="w3-text w3-text-theme"> Ting Wang, Hongwei Ding, Jianjing Kuang, Qiuwu Ma </span> </p> </a> <a class="w3-text" href="mahajan14_interspeech.html"> <p> Principal components of auditory spectro-temporal receptive fields <br> <span class="w3-text w3-text-theme"> Nagaraj Mahajan, Nima Mesgarani, Hynek Hermansky </span> </p> </a> <a class="w3-text" href="thlithi14_interspeech.html"> <p> Segmentation in singer turns with the Bayesian information criterion <br> <span class="w3-text w3-text-theme"> Marwa Thlithi, Thomas Pellegrini, Julien Pinquier, Régine André-Obrecht </span> </p> </a> <a class="w3-text" href="watson14_interspeech.html"> <p> Mappings between vocal tract area functions, vocal tract resonances and speech formants for multiple speakers <br> <span class="w3-text w3-text-theme"> Catherine I. Watson </span> </p> </a> <a class="w3-text" href="arndt14_interspeech.html"> <p> A next step towards measuring perceived quality of speech through physiology <br> <span class="w3-text w3-text-theme"> Sebastian Arndt, Markus Wenzel, Jan-Niklas Antons, Friedemann Köster, Sebastian Möller, Gabriel Curio </span> </p> </a> <a class="w3-text" href="chen14i_interspeech.html"> <p> Effect of spectral degradation to the intelligibility of vowel sentences <br> <span class="w3-text w3-text-theme"> Fei Chen, Sharon W. K. Wong, Lena L. N. Wong </span> </p> </a> <a class="w3-text" href="berry14b_interspeech.html"> <p> Consonant context effects on vowel sensorimotor adaptation <br> <span class="w3-text w3-text-theme"> Jeff Berry, John Jaeger, Melissa Wiedenhoeft, Brittany Bernal, Michael T. Johnson </span> </p> </a> <a class="w3-text" href="bailly14_interspeech.html"> <p> Assessing objective characterizations of phonetic convergence <br> <span class="w3-text w3-text-theme"> Gérard Bailly, Amélie Martin </span> </p> </a> <a class="w3-text" href="mandel14_interspeech.html"> <p> Generalizing time-frequency importance functions across noises, talkers, and phonemes <br> <span class="w3-text w3-text-theme"> Michael I. Mandel, Sarah E. Yoho, Eric W. Healy </span> </p> </a> <a class="w3-text" href="mahajan14b_interspeech.html"> <p> Does elderly speech recognition in noise benefit from spectral and visual cues? <br> <span class="w3-text w3-text-theme"> Yatin Mahajan, Jeesun Kim, Chris Davis </span> </p> </a> <a class="w3-text" href="laskowski14_interspeech.html"> <p> On the conversant-specificity of stochastic turn-taking models <br> <span class="w3-text w3-text-theme"> Kornel Laskowski </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Intelligibility Enhancement and Predictive Measures"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Intelligibility Enhancement and Predictive Measures</h4> <hr> <a class="w3-text" href="sakano14_interspeech.html"> <p> Single-ended estimation of speech intelligibility using the ITU p.563 feature set <br> <span class="w3-text w3-text-theme"> Toshihiro Sakano, Yosuke Kobayashi, Kazuhiro Kondo </span> </p> </a> <a class="w3-text" href="jokinen14b_interspeech.html"> <p> Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speech <br> <span class="w3-text w3-text-theme"> Emma Jokinen, Ulpu Remes, Marko Takanen, Kalle Palomäki, Mikko Kurimo, Paavo Alku </span> </p> </a> <a class="w3-text" href="koster14_interspeech.html"> <p> Analyzing perceptual dimensions of conversational speech quality <br> <span class="w3-text w3-text-theme"> Friedemann Köster, Sebastian Möller </span> </p> </a> <a class="w3-text" href="aubanel14_interspeech.html"> <p> Interplay of informational content and energetic masking in speech perception in noise <br> <span class="w3-text w3-text-theme"> Vincent Aubanel, Chris Davis, Jeesun Kim </span> </p> </a> <a class="w3-text" href="zorila14_interspeech.html"> <p> On spectral and time domain energy reallocation for speech-in-noise intelligibility enhancement <br> <span class="w3-text w3-text-theme"> Tudor-Cătălin Zorilă, Yannis Stylianou </span> </p> </a> <a class="w3-text" href="chen14j_interspeech.html"> <p> Objective quality evaluation of noise-suppressed speech: effects of temporal envelope and fine-structure cues <br> <span class="w3-text w3-text-theme"> Fei Chen, Yi Hu </span> </p> </a> <a class="w3-text" href="wang14k_interspeech.html"> <p> Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners <br> <span class="w3-text w3-text-theme"> Dongmei Wang, Philipos C. Loizou, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="valentinibotinhao14b_interspeech.html"> <p> Using linguistic predictability and the lombard effect to increase the intelligibility of synthetic speech in noise <br> <span class="w3-text w3-text-theme"> Cassia Valentini-Botinhao, Mirjam Wester </span> </p> </a> <a class="w3-text" href="dabel14_interspeech.html"> <p> Speech pre-enhancement using a discriminative microscopic intelligibility model <br> <span class="w3-text w3-text-theme"> Maryam Al Dabel, Jon Barker </span> </p> </a> <a class="w3-text" href="harvilla14_interspeech.html"> <p> Least squares signal declipping for robust speech recognition <br> <span class="w3-text w3-text-theme"> Mark J. Harvilla, Richard M. Stern </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech and Language Processing — General Topics"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech and Language Processing — General Topics</h4> <hr> <a class="w3-text" href="xu14d_interspeech.html"> <p> Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems <br> <span class="w3-text w3-text-theme"> Haihua Xu, Hang Su, Eng Siong Chng, Haizhou Li </span> </p> </a> <a class="w3-text" href="kapralova14_interspeech.html"> <p> A big data approach to acoustic model training corpus selection <br> <span class="w3-text w3-text-theme"> Olga Kapralova, John Alex, Eugene Weinstein, Pedro J. Moreno, Olivier Siohan </span> </p> </a> <a class="w3-text" href="cardinal14_interspeech.html"> <p> Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera <br> <span class="w3-text w3-text-theme"> Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James R. Glass, Stephan Vogel </span> </p> </a> <a class="w3-text" href="sundermeyer14b_interspeech.html"> <p> rwthlm — the RWTH aachen university neural network language modeling toolkit <br> <span class="w3-text w3-text-theme"> Martin Sundermeyer, Ralf Schlüter, Hermann Ney </span> </p> </a> <a class="w3-text" href="cheng14_interspeech.html"> <p> Language modeling with sum-product networks <br> <span class="w3-text w3-text-theme"> Wei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu, Kian Ming A. Chai </span> </p> </a> <a class="w3-text" href="cui14b_interspeech.html"> <p> Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel program <br> <span class="w3-text w3-text-theme"> Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash, Vaibhava Goel </span> </p> </a> <a class="w3-text" href="chowdhury14_interspeech.html"> <p> Cross-language transfer of semantic annotation via targeted crowdsourcing <br> <span class="w3-text w3-text-theme"> Shammur Absar Chowdhury, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Ioannis Klasinas </span> </p> </a> <a class="w3-text" href="hakkanitur14_interspeech.html"> <p> Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding <br> <span class="w3-text w3-text-theme"> Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck, Gokhan Tur, Geoff Zweig </span> </p> </a> <a class="w3-text" href="garner14_interspeech.html"> <p> Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch <br> <span class="w3-text w3-text-theme"> Philip N. Garner, David Imseng, Thomas Meyer </span> </p> </a> <a class="w3-text" href="harrat14_interspeech.html"> <p> Building resources for Algerian Arabic dialects <br> <span class="w3-text w3-text-theme"> S. Harrat, K. Meftouh, M. Abbas, K. Smaili </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Language, Dialect and Accent Recognition"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Language, Dialect and Accent Recognition</h4> <hr> <a class="w3-text" href="ferrer14_interspeech.html"> <p> Spoken language recognition based on senone posteriors <br> <span class="w3-text w3-text-theme"> Luciana Ferrer, Yun Lei, Mitchell McLaren, Nicolas Scheffer </span> </p> </a> <a class="w3-text" href="gonzalezdominguez14_interspeech.html"> <p> Automatic language identification using long short-term memory recurrent neural networks <br> <span class="w3-text w3-text-theme"> Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Haşim Sak, Joaquin Gonzalez-Rodriguez, Pedro J. Moreno </span> </p> </a> <a class="w3-text" href="desplanques14_interspeech.html"> <p> Robust language recognition via adaptive language factor extraction <br> <span class="w3-text w3-text-theme"> Brecht Desplanques, Kris Demuynck, Jean-Pierre Martens </span> </p> </a> <a class="w3-text" href="behravan14_interspeech.html"> <p> Dialect levelling in Finnish: a universal speech attribute approach <br> <span class="w3-text w3-text-theme"> Hamid Behravan, Ville Hautamäki, Sabato Marco Siniscalchi, Elie Khoury, Tommi Kurki, Tomi Kinnunen, Chin-Hui Lee </span> </p> </a> <a class="w3-text" href="chen14k_interspeech.html"> <p> Improving native accent identification using deep neural networks <br> <span class="w3-text w3-text-theme"> Mingming Chen, Zhanlei Yang, Hao Zheng, Wenju Liu </span> </p> </a> <a class="w3-text" href="kolly14_interspeech.html"> <p> Foreign accent recognition based on temporal information contained in lowpass-filtered speech <br> <span class="w3-text w3-text-theme"> Marie-José Kolly, Adrian Leemann, Volker Dellwo </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Adaptation 1, 2"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Adaptation 1, 2</h4> <hr> <a class="w3-text" href="karanasou14_interspeech.html"> <p> Adaptation of deep neural network acoustic models using factorised i-vectors <br> <span class="w3-text w3-text-theme"> Penny Karanasou, Yongqiang Wang, Mark J. F. Gales, Philip C. Woodland </span> </p> </a> <a class="w3-text" href="fukuda14_interspeech.html"> <p> Regularized feature-space discriminative adaptation for robust ASR <br> <span class="w3-text w3-text-theme"> Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie, Vaibhava Goel </span> </p> </a> <a class="w3-text" href="miao14c_interspeech.html"> <p> Towards speaker adaptive training of deep neural network acoustic models <br> <span class="w3-text w3-text-theme"> Yajie Miao, Hao Zhang, Florian Metze </span> </p> </a> <a class="w3-text" href="gorin14_interspeech.html"> <p> Component structuring and trajectory modeling for speech recognition <br> <span class="w3-text w3-text-theme"> Arseniy Gorin, Denis Jouvet </span> </p> </a> <a class="w3-text" href="doddipatla14_interspeech.html"> <p> Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognition <br> <span class="w3-text w3-text-theme"> Rama Doddipatla, Madina Hasan, Thomas Hain </span> </p> </a> <a class="w3-text" href="you14_interspeech.html"> <p> Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation <br> <span class="w3-text w3-text-theme"> Zhao You, Bo Xu </span> </p> </a> <a class="w3-text" href="pellegrini14b_interspeech.html"> <p> Speaker age estimation for elderly speech recognition in European Portuguese <br> <span class="w3-text w3-text-theme"> Thomas Pellegrini, Vahid Hedayati, Isabel Trancoso, Annika Hämäläinen, Miguel Sales Dias </span> </p> </a> <a class="w3-text" href="najafian14_interspeech.html"> <p> Unsupervised model selection for recognition of regional accented speech <br> <span class="w3-text w3-text-theme"> Maryam Najafian, Andrea DeMarco, Stephen Cox, Martin Russell </span> </p> </a> <a class="w3-text" href="zhang14g_interspeech.html"> <p> Speaker adaptation based on sparse and low-rank eigenphone matrix estimation <br> <span class="w3-text w3-text-theme"> Wen-Lin Zhang, Dan Qu, Wei-Qiang Zhang, Bi-Cheng Li </span> </p> </a> <a class="w3-text" href="huang14e_interspeech.html"> <p> Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation <br> <span class="w3-text w3-text-theme"> Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong </span> </p> </a> <a class="w3-text" href="shahnawazuddin14_interspeech.html"> <p> A low complexity model adaptation approach involving sparse coding over multiple dictionaries <br> <span class="w3-text w3-text-theme"> S. Shahnawazuddin, Rohit Sinha </span> </p> </a> <a class="w3-text" href="kubota14_interspeech.html"> <p> Effect of frequency weighting on MLP-based speaker canonicalization <br> <span class="w3-text w3-text-theme"> Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta </span> </p> </a> <a class="w3-text" href="huang14f_interspeech.html"> <p> Feature space maximum a posteriori linear regression for adaptation of deep neural networks <br> <span class="w3-text w3-text-theme"> Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Chao Weng, Chin-Hui Lee </span> </p> </a> <a class="w3-text" href="tomashenko14_interspeech.html"> <p> Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing <br> <span class="w3-text w3-text-theme"> Natalia Tomashenko, Yuri Khokhlov </span> </p> </a> <a class="w3-text" href="karafiat14_interspeech.html"> <p> BUT 2014 Babel system: analysis of adaptation in NN based systems <br> <span class="w3-text w3-text-theme"> Martin Karafiát, František Grézl, Karel Veselý, Mirko Hannemann, Igor Szőke, Jan Černocký </span> </p> </a> <a class="w3-text" href="rouvier14_interspeech.html"> <p> Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers? <br> <span class="w3-text w3-text-theme"> Mickael Rouvier, Benoit Favre </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speaker Localization"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speaker Localization</h4> <hr> <a class="w3-text" href="singhal14_interspeech.html"> <p> A sparse reconstruction method for speech source localization using partial dictionaries over a spherical microphone array <br> <span class="w3-text w3-text-theme"> Kushagra Singhal, Rajesh M. Hegde </span> </p> </a> <a class="w3-text" href="cui14c_interspeech.html"> <p> A robust TDOA estimation method for in-car-noise environments <br> <span class="w3-text w3-text-theme"> Weiwei Cui, Jaeyeon Cho, Seungyeol Lee </span> </p> </a> <a class="w3-text" href="netsch14_interspeech.html"> <p> Robust low-resource sound localization in correlated noise <br> <span class="w3-text w3-text-theme"> Lorin Netsch, Jacek Stachurski </span> </p> </a> <a class="w3-text" href="ying14_interspeech.html"> <p> Direction-of-arrival estimation of multiple speakers using a planar array <br> <span class="w3-text w3-text-theme"> Dongwen Ying, Ruohua Zhou, Junfeng Li, Jielin Pan, Yonghong Yan </span> </p> </a> <a class="w3-text" href="xue14_interspeech.html"> <p> Weighted spatial bispectrum correlation matrix for DOA estimation in the presence of interferences <br> <span class="w3-text w3-text-theme"> Wei Xue, Shan Liang, Wenju Liu </span> </p> </a> <a class="w3-text" href="bouafif14_interspeech.html"> <p> Multi-sources separation for sound source localization <br> <span class="w3-text w3-text-theme"> Mariem Bouafif, Zied Lachiri </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Representation, Detection and Classification"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Representation, Detection and Classification</h4> <hr> <a class="w3-text" href="zhang14h_interspeech.html"> <p> Phone classification by a hierarchy of invariant representation layers <br> <span class="w3-text w3-text-theme"> Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio </span> </p> </a> <a class="w3-text" href="sinclair14_interspeech.html"> <p> A semi-Markov model for speech segmentation with an utterance-break prior <br> <span class="w3-text w3-text-theme"> Mark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes </span> </p> </a> <a class="w3-text" href="aneeja14_interspeech.html"> <p> Speech detection in transient noises <br> <span class="w3-text w3-text-theme"> G. Aneeja, B. Yegnanarayana </span> </p> </a> <a class="w3-text" href="he14b_interspeech.html"> <p> Evaluation of dictionary for sparse coding in speech processing <br> <span class="w3-text w3-text-theme"> Yongjun He, Guanglu Sun, Guibin Zheng, Jiqing Han </span> </p> </a> <a class="w3-text" href="vaz14b_interspeech.html"> <p> Joint filtering and factorization for recovering latent structure from noisy speech data <br> <span class="w3-text w3-text-theme"> Colin Vaz, Vikram Ramanarayanan, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="gallardoantolin14_interspeech.html"> <p> A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis <br> <span class="w3-text w3-text-theme"> A. Gallardo-Antolín, J. M. Montero, Simon King </span> </p> </a> <a class="w3-text" href="asami14_interspeech.html"> <p> Read and spontaneous speech classification based on variance of GMM supervectors <br> <span class="w3-text w3-text-theme"> Taichi Asami, Ryo Masumura, Hirokazu Masataki, Sumitaka Sakauchi </span> </p> </a> <a class="w3-text" href="shokouhi14_interspeech.html"> <p> Co-channel speech detection via spectral analysis of frequency modulated sub-bands <br> <span class="w3-text w3-text-theme"> Navid Shokouhi, Seyed Omid Sadjadi, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="voinea14_interspeech.html"> <p> Word-level invariant representations from acoustic waveforms <br> <span class="w3-text w3-text-theme"> Stephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio </span> </p> </a> <a class="w3-text" href="dalsgaard14_interspeech.html"> <p> On closed form calculation of line spectral frequencies (LSF) <br> <span class="w3-text w3-text-theme"> Paul Dalsgaard, Ove Andersen </span> </p> </a> <a class="w3-text" href="ouali14_interspeech.html"> <p> Robust features for content-based audio copy detection <br> <span class="w3-text w3-text-theme"> Chahid Ouali, Pierre Dumouchel, Vishwa Gupta </span> </p> </a> <a class="w3-text" href="jiang14_interspeech.html"> <p> Binaural deep neural network classification for reverberant speech segregation <br> <span class="w3-text w3-text-theme"> Yi Jiang, DeLiang Wang, RunSheng Liu </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Spoken Term Detection for Low-Resource Languages I, II"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Spoken Term Detection for Low-Resource Languages I, II</h4> <hr> <a class="w3-text" href="anguera14b_interspeech.html"> <p> Query-by-example spoken term detection on multilingual unconstrained speech <br> <span class="w3-text w3-text-theme"> Xavier Anguera, Luis Javier Rodriguez-Fuentes, Igor Szőke, Andi Buzo, Florian Metze, Mikel Penagarikano </span> </p> </a> <a class="w3-text" href="soto14_interspeech.html"> <p> A comparison of multiple methods for rescoring keyword search lists for low resource languages <br> <span class="w3-text w3-text-theme"> Victor Soto, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg </span> </p> </a> <a class="w3-text" href="karakos14_interspeech.html"> <p> Subword and phonetic search for detecting out-of-vocabulary keywords <br> <span class="w3-text w3-text-theme"> Damianos Karakos, Richard Schwartz </span> </p> </a> <a class="w3-text" href="wang14l_interspeech.html"> <p> An in-depth comparison of keyword specific thresholding and sum-to-one score normalization <br> <span class="w3-text w3-text-theme"> Yun Wang, Florian Metze </span> </p> </a> <a class="w3-text" href="lee14c_interspeech.html"> <p> Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages <br> <span class="w3-text w3-text-theme"> Hung-yi Lee, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass </span> </p> </a> <a class="w3-text" href="le14b_interspeech.html"> <p> Developing STT and KWS systems using limited language resources <br> <span class="w3-text w3-text-theme"> Viet-Bac Le, Lori Lamel, Abdel Messaoudi, William Hartmann, Jean-Luc Gauvain, Cécile Woehrling, Julien Despres, Anindya Roy </span> </p> </a> <a class="w3-text" href="hartmann14_interspeech.html"> <p> Comparing decoding strategies for subword-based keyword spotting in low-resourced languages <br> <span class="w3-text w3-text-theme"> William Hartmann, Viet-Bac Le, Abdel Messaoudi, Lori Lamel, Jean-Luc Gauvain </span> </p> </a> <a class="w3-text" href="ma14b_interspeech.html"> <p> Strategies for rescoring keyword search results using word-burst and acoustic features <br> <span class="w3-text w3-text-theme"> Min Ma, Justin Richards, Victor Soto, Julia Hirschberg, Andrew Rosenberg </span> </p> </a> <a class="w3-text" href="xu14e_interspeech.html"> <p> Word-based probabilistic phonetic retrieval for low-resource spoken term detection <br> <span class="w3-text w3-text-theme"> Di Xu, Florian Metze </span> </p> </a> <a class="w3-text" href="chen14l_interspeech.html"> <p> A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modeling <br> <span class="w3-text w3-text-theme"> I-Fan Chen, Nancy F. Chen, Chin-Hui Lee </span> </p> </a> <a class="w3-text" href="chiu14_interspeech.html"> <p> Combination of FST and CN search in spoken term detection <br> <span class="w3-text w3-text-theme"> Justin Chiu, Yun Wang, Jan Trmal, Daniel Povey, Guoguo Chen, Alexander I. Rudnicky </span> </p> </a> <a class="w3-text" href="liu14f_interspeech.html"> <p> Low-resource open vocabulary keyword search using point process models <br> <span class="w3-text w3-text-theme"> Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal, Sanjeev Khudanpur </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Voice Conversion"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Voice Conversion</h4> <hr> <a class="w3-text" href="ohtani14_interspeech.html"> <p> GMM-based bandwidth extension using sub-band basis spectrum model <br> <span class="w3-text w3-text-theme"> Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Masami Akamine </span> </p> </a> <a class="w3-text" href="nakamura14_interspeech.html"> <p> A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech <br> <span class="w3-text w3-text-theme"> Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda </span> </p> </a> <a class="w3-text" href="lee14d_interspeech.html"> <p> A comparative study of spectral transformation techniques for singing voice synthesis <br> <span class="w3-text w3-text-theme"> S. W. Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian, Haizhou Li </span> </p> </a> <a class="w3-text" href="saito14_interspeech.html"> <p> Application of matrix variate Gaussian mixture model to statistical voice conversion <br> <span class="w3-text w3-text-theme"> Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose </span> </p> </a> <a class="w3-text" href="wu14b_interspeech.html"> <p> Joint nonnegative matrix factorization for exemplar-based voice conversion <br> <span class="w3-text w3-text-theme"> Zhizheng Wu, Eng Siong Chng, Haizhou Li </span> </p> </a> <a class="w3-text" href="kobayashi14_interspeech.html"> <p> Statistical singing voice conversion with direct waveform modification based on the spectrum differential <br> <span class="w3-text w3-text-theme"> Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech and Audio Segmentation and Classification"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech and Audio Segmentation and Classification</h4> <hr> <a class="w3-text" href="ellis14_interspeech.html"> <p> Detecting proximity from personal audio recordings <br> <span class="w3-text w3-text-theme"> Daniel P. W. Ellis, Hiroyuki Satoh, Zhuo Chen </span> </p> </a> <a class="w3-text" href="phan14_interspeech.html"> <p> Acoustic event detection and localization with regression forests <br> <span class="w3-text w3-text-theme"> Huy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins </span> </p> </a> <a class="w3-text" href="ferras14b_interspeech.html"> <p> Multi-source posteriors for speech activity detection on public talks <br> <span class="w3-text w3-text-theme"> Marc Ferràs, Hervé Bourlard </span> </p> </a> <a class="w3-text" href="dennis14_interspeech.html"> <p> Analysis of spectrogram image methods for sound event classification <br> <span class="w3-text w3-text-theme"> Jonathan Dennis, Huy Dat Tran, Eng Siong Chng </span> </p> </a> <a class="w3-text" href="satt14_interspeech.html"> <p> Speech-based automatic and robust detection of very early dementia <br> <span class="w3-text w3-text-theme"> Aharon Satt, Ron Hoory, Alexandra König, Pauline Aalten, Philippe H. Robert </span> </p> </a> <a class="w3-text" href="raboshchuk14_interspeech.html"> <p> On the acoustic environment of a neonatal intensive care unit: initial description, and detection of equipment alarms <br> <span class="w3-text w3-text-theme"> Ganna Raboshchuk, Climent Nadeu, Omid Ghahabi, Sergi Solvez, Blanca Muñoz Mahamud, Ana Riverola de Veciana, Santiago Navarro Hervas </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Language Acquisition"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Language Acquisition</h4> <hr> <a class="w3-text" href="fox14b_interspeech.html"> <p> Non-native perception of regionally accented speech in a multitalker context <br> <span class="w3-text w3-text-theme"> Robert Allen Fox, Ewa Jacewicz, Florence Hardjono </span> </p> </a> <a class="w3-text" href="turco14_interspeech.html"> <p> A crosslinguistic and acquisitional perspective on intonational rises in French <br> <span class="w3-text w3-text-theme"> Giuseppina Turco, Elisabeth Delais-Roussarie </span> </p> </a> <a class="w3-text" href="tu14b_interspeech.html"> <p> Error patterns of Mandarin disyllabic tones by Japanese learners <br> <span class="w3-text w3-text-theme"> Jung-Yueh Tu, Yuwen Hsiung, Min-Da Wu, Yao-Ting Sung </span> </p> </a> <a class="w3-text" href="leong14_interspeech.html"> <p> Infant-directed speech enhances temporal rhythmic structure in the envelope <br> <span class="w3-text w3-text-theme"> Victoria Leong, Marina Kalashnikova, Denis Burnham, Usha Goswami </span> </p> </a> <a class="w3-text" href="wewalaarachchi14_interspeech.html"> <p> Influences of tone sandhi on word recognition in preschool children <br> <span class="w3-text w3-text-theme"> Dilu Wewalaarachchi, Leher Singh </span> </p> </a> <a class="w3-text" href="goh14_interspeech.html"> <p> Lexical representation of consonant, vowels and tones in early childhood <br> <span class="w3-text w3-text-theme"> Hwee Hwee Goh, Charlene Hu, Kheng Hui Yeo, Leher Singh </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Perception"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Perception</h4> <hr> <a class="w3-text" href="francisco14_interspeech.html"> <p> Audiovisual temporal sensitivity in typical and dyslexic adult readers <br> <span class="w3-text w3-text-theme"> Ana A. Francisco, Alexandra Jesse, Margriet A. Groen, James M. McQueen </span> </p> </a> <a class="w3-text" href="derrick14b_interspeech.html"> <p> Aero-tactile integration in fricatives: converting audio to air flow information for speech perception enhancement <br> <span class="w3-text w3-text-theme"> Donald Derrick, Greg A. O'Beirne, Tom De Rybel, Jennifer Hay </span> </p> </a> <a class="w3-text" href="mai14_interspeech.html"> <p> Relative importance of AM and FM cues for speech comprehension: effects of speaking rate and their implications for neurophysiological processing of speech <br> <span class="w3-text w3-text-theme"> Guangting Mai </span> </p> </a> <a class="w3-text" href="stringer14_interspeech.html"> <p> The effect of regional and non-native accents on word recognition processes: a comparison of EEG responses in quiet to speech recognition in noise <br> <span class="w3-text w3-text-theme"> Louise Stringer, Paul Iverson </span> </p> </a> <a class="w3-text" href="fong14_interspeech.html"> <p> Towards a neural measure of perceptual distance — classification of electroencephalographic responses to synthetic vowels <br> <span class="w3-text w3-text-theme"> Manson C. -M. Fong, James W. Minett, Thierry Blu, William S. -Y. Wang </span> </p> </a> <a class="w3-text" href="scharenborg14_interspeech.html"> <p> Collecting a corpus of Dutch noise-induced `slips of the ear' <br> <span class="w3-text w3-text-theme"> Odette Scharenborg, Eric Sanders, Bert Cranen </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Language and Lexical Modeling"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Language and Lexical Modeling</h4> <hr> <a class="w3-text" href="hanai14_interspeech.html"> <p> Lexical modeling for Arabic ASR: a systematic approach <br> <span class="w3-text w3-text-theme"> Tuka Al Hanai, James R. Glass </span> </p> </a> <a class="w3-text" href="orosanu14_interspeech.html"> <p> Hybrid language models for speech transcription <br> <span class="w3-text w3-text-theme"> Luiza Orosanu, Denis Jouvet </span> </p> </a> <a class="w3-text" href="gandhe14_interspeech.html"> <p> Neural network language models for low resource languages <br> <span class="w3-text w3-text-theme"> Ankur Gandhe, Florian Metze, Ian Lane </span> </p> </a> <a class="w3-text" href="gangireddy14_interspeech.html"> <p> Feed forward pre-training for recurrent neural network language models <br> <span class="w3-text w3-text-theme"> Siva Reddy Gangireddy, Fergus McInnes, Steve Renals </span> </p> </a> <a class="w3-text" href="roy14_interspeech.html"> <p> Grounding language models in spatiotemporal context <br> <span class="w3-text w3-text-theme"> Brandon C. Roy, Soroush Vosoughi, Deb Roy </span> </p> </a> <a class="w3-text" href="jalalvand14_interspeech.html"> <p> Direct word graph rescoring using a* search and RNNLM <br> <span class="w3-text w3-text-theme"> Shahab Jalalvand, Daniele Falavigna </span> </p> </a> <a class="w3-text" href="chelba14_interspeech.html"> <p> One billion word benchmark for measuring progress in statistical language modeling <br> <span class="w3-text w3-text-theme"> Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson </span> </p> </a> <a class="w3-text" href="schnall14_interspeech.html"> <p> Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenario <br> <span class="w3-text w3-text-theme"> Andrea Schnall, Martin Heckmann </span> </p> </a> <a class="w3-text" href="biadsy14_interspeech.html"> <p> Backoff inspired features for maximum entropy language models <br> <span class="w3-text w3-text-theme"> Fadi Biadsy, Keith Hall, Pedro J. Moreno, Brian Roark </span> </p> </a> <a class="w3-text" href="telaar14_interspeech.html"> <p> BioKIT — real-time decoder for biosignal processing <br> <span class="w3-text w3-text-theme"> Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz </span> </p> </a> <a class="w3-text" href="harwath14b_interspeech.html"> <p> Speech recognition without a lexicon — bridging the gap between graphemic and phonetic systems <br> <span class="w3-text w3-text-theme"> David Harwath, James R. Glass </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Enhancement (Single- and Multi-Channel) 1, 2"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Enhancement (Single- and Multi-Channel) 1, 2</h4> <hr> <a class="w3-text" href="zhao14b_interspeech.html"> <p> A new auxiliary-vector algorithm with conjugate orthogonality for speech enhancement <br> <span class="w3-text w3-text-theme"> Shengkui Zhao, Douglas L. Jones </span> </p> </a> <a class="w3-text" href="jathar14_interspeech.html"> <p> Acoustic characteristics of critical message utterances in noise applied to speech intelligibility enhancement <br> <span class="w3-text w3-text-theme"> Neehar Jathar, Preeti Rao </span> </p> </a> <a class="w3-text" href="xu14f_interspeech.html"> <p> Dynamic noise aware training for speech enhancement based on deep neural networks <br> <span class="w3-text w3-text-theme"> Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee </span> </p> </a> <a class="w3-text" href="pertila14_interspeech.html"> <p> Microphone array post-filtering using supervised machine learning for speech enhancement <br> <span class="w3-text w3-text-theme"> Pasi Pertilä, Joonas Nikunen </span> </p> </a> <a class="w3-text" href="mani14_interspeech.html"> <p> Novel speech duration modifier for packet based communication system <br> <span class="w3-text w3-text-theme"> Senthil Kumar Mani, Jitendra Kumar Dhiman, K. Sri Rama Murty </span> </p> </a> <a class="w3-text" href="liu14g_interspeech.html"> <p> Experiments on deep learning for speech denoising <br> <span class="w3-text w3-text-theme"> Ding Liu, Paris Smaragdis, Minje Kim </span> </p> </a> <a class="w3-text" href="mohammadiha14_interspeech.html"> <p> Single-channel dynamic exemplar-based speech enhancement <br> <span class="w3-text w3-text-theme"> Nasser Mohammadiha, Simon Doclo </span> </p> </a> <a class="w3-text" href="kato14_interspeech.html"> <p> Using hidden Markov models for speech enhancement <br> <span class="w3-text w3-text-theme"> Akihiro Kato, Ben Milner </span> </p> </a> <a class="w3-text" href="pfeifenberger14_interspeech.html"> <p> Blind source extraction based on a direction-dependent a-priori SNR <br> <span class="w3-text w3-text-theme"> Lukas Pfeifenberger, Franz Pernkopf </span> </p> </a> <a class="w3-text" href="chacon14_interspeech.html"> <p> Least squares phase estimation of mixed signals <br> <span class="w3-text w3-text-theme"> Carlos Eduardo Cancino Chacón, Pejman Mowlaee </span> </p> </a> <a class="w3-text" href="ming14_interspeech.html"> <p> Speech enhancement from additive noise and channel distortion — a corpus-based approach <br> <span class="w3-text w3-text-theme"> Ji Ming, Danny Crookes </span> </p> </a> <a class="w3-text" href="zhou14b_interspeech.html"> <p> Multi-channel speech enhancement using sparse coding on local time-frequency structures <br> <span class="w3-text w3-text-theme"> Zhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang, Qingmin Liao </span> </p> </a> <a class="w3-text" href="mirsamadi14_interspeech.html"> <p> Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications <br> <span class="w3-text w3-text-theme"> Seyedmahdad Mirsamadi, John H. L. Hansen </span> </p> </a> <a class="w3-text" href="chen14m_interspeech.html"> <p> Speech enhancement by low-rank and convolutive dictionary spectrogram decomposition <br> <span class="w3-text w3-text-theme"> Zhuo Chen, Brian McFee, Daniel P. W. Ellis </span> </p> </a> <a class="w3-text" href="jaureguiberry14_interspeech.html"> <p> Multiple-order non-negative matrix factorization for speech enhancement <br> <span class="w3-text w3-text-theme"> Xabier Jaureguiberry, Emmanuel Vincent, Gaël Richard </span> </p> </a> <a class="w3-text" href="kang14b_interspeech.html"> <p> NMF-based speech enhancement incorporating deep neural network <br> <span class="w3-text w3-text-theme"> Tae Gyoon Kang, Kisoo Kwon, Jong Won Shin, Nam Soo Kim </span> </p> </a> <a class="w3-text" href="sonowal14_interspeech.html"> <p> A data-driven approach to speech enhancement using Gaussian process <br> <span class="w3-text w3-text-theme"> Sukanya Sonowal, Kisoo Kwon, Nam Soo Kim, Jong Won Shin </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Speech Coding and Transmission"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Speech Coding and Transmission</h4> <hr> <a class="w3-text" href="backstrom14_interspeech.html"> <p> Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix <br> <span class="w3-text w3-text-theme"> Tom Bäckström, Christian R. Helmrich </span> </p> </a> <a class="w3-text" href="cernak14_interspeech.html"> <p> Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding <br> <span class="w3-text w3-text-theme"> Milos Cernak, Alexandros Lazaridis, Philip N. Garner, Petr Motlicek </span> </p> </a> <a class="w3-text" href="pulakka14_interspeech.html"> <p> Subjective voice quality evaluation of artificial bandwidth extension: comparing different audio bandwidths and speech codecs <br> <span class="w3-text w3-text-theme"> Hannu Pulakka, Anssi Rämö, Ville Myllylä, Henri Toukomaa, Paavo Alku </span> </p> </a> <a class="w3-text" href="fu14b_interspeech.html"> <p> Stereo acoustic echo suppression using widely linear filtering in the frequency domain <br> <span class="w3-text w3-text-theme"> Zhong-Hua Fu, Lei Xie </span> </p> </a> <a class="w3-text" href="lee14e_interspeech.html"> <p> Enhanced muting method in packet loss concealment of ITU-t g.722 using sigmoid function with on-line optimized parameters <br> <span class="w3-text w3-text-theme"> Bong-Ki Lee, Inyoung Hwang, Jihwan Park, Joon-Hyuk Chang </span> </p> </a> <a class="w3-text" href="wu14c_interspeech.html"> <p> A robust step-size control algorithm for frequency domain acoustic echo cancellation <br> <span class="w3-text w3-text-theme"> Chao Wu, Kaiyu Jiang, Yanmeng Guo, Qiang Fu, Yonghong Yan </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Unsupervised or Corrective Lexical Modeling"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Unsupervised or Corrective Lexical Modeling</h4> <hr> <a class="w3-text" href="byambakhishig14_interspeech.html"> <p> Error correction of automatic speech recognition based on normalized web distance <br> <span class="w3-text w3-text-theme"> E. Byambakhishig, K. Tanaka, Ryo Aihara, Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki </span> </p> </a> <a class="w3-text" href="dikici14_interspeech.html"> <p> Unsupervised training methods for discriminative language modeling <br> <span class="w3-text w3-text-theme"> Erinç Dikici, Murat Saraçlar </span> </p> </a> <a class="w3-text" href="qin14_interspeech.html"> <p> Building a vocabulary self-learning speech recognition system <br> <span class="w3-text w3-text-theme"> Long Qin, Alexander I. Rudnicky </span> </p> </a> <a class="w3-text" href="schlippe14_interspeech.html"> <p> Methods for efficient semi-automatic pronunciation dictionary bootstrapping <br> <span class="w3-text w3-text-theme"> Tim Schlippe, Matthias Merz, Tanja Schultz </span> </p> </a> <a class="w3-text" href="akbacak14_interspeech.html"> <p> Rapidly building domain-specific entity-centric language models using semantic web knowledge sources <br> <span class="w3-text w3-text-theme"> Murat Akbacak, Dilek Hakkani-Tür, Gokhan Tur </span> </p> </a> <a class="w3-text" href="lee14f_interspeech.html"> <p> Context-dependent pronunciation error pattern discovery with limited annotations <br> <span class="w3-text w3-text-theme"> Ann Lee, James R. Glass </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Meta Data"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Meta Data</h4> <hr> <a class="w3-text" href="sapru14_interspeech.html"> <p> Detecting speaker roles and topic changes in multiparty conversations using latent topic models <br> <span class="w3-text w3-text-theme"> Ashtosh Sapru, Hervé Bourlard </span> </p> </a> <a class="w3-text" href="xu14g_interspeech.html"> <p> A deep neural network approach for sentence boundary detection in broadcast news <br> <span class="w3-text w3-text-theme"> Chenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Eng Siong Chng, Haizhou Li </span> </p> </a> <a class="w3-text" href="gupta14b_interspeech.html"> <p> Variable Span disfluency detection in ASR transcripts <br> <span class="w3-text w3-text-theme"> Rahul Gupta, Sankaranarayanan Ananthakrishnan, Zhaojun Yang, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="dutrey14_interspeech.html"> <p> A CRF-based approach to automatic disfluency detection in a French call-centre corpus <br> <span class="w3-text w3-text-theme"> Camille Dutrey, Chloé Clavel, Sophie Rosset, Ioana Vasilescu, Martine Adda-Decker </span> </p> </a> <a class="w3-text" href="hasan14_interspeech.html"> <p> Multi-pass sentence-end detection of lecture speech <br> <span class="w3-text w3-text-theme"> Madina Hasan, Rama Doddipatla, Thomas Hain </span> </p> </a> <a class="w3-text" href="zayats14_interspeech.html"> <p> Multi-domain disfluency and repair detection <br> <span class="w3-text w3-text-theme"> Victoria Zayats, Mari Ostendorf, Hannaneh Hajishirzi </span> </p> </a> </div> </div> <br> <div class="w3-content" style="height:10px" id="Language Recognition"></div> <div class="w3-card w3-round w3-white w3-padding"> <div class="w3-container" style="margin-top:40px"> <h4 class="w3-center">Language Recognition</h4> <hr> <a class="w3-text" href="jiang14b_interspeech.html"> <p> Task-aware deep bottleneck features for spoken language identification <br> <span class="w3-text w3-text-theme"> Bing Jiang, Yan Song, Si Wei, Ian Vince McLoughlin, Li-Rong Dai </span> </p> </a> <a class="w3-text" href="tong14_interspeech.html"> <p> Virtual example for phonotactic language recognition <br> <span class="w3-text w3-text-theme"> Rong Tong, Bin Ma, Haizhou Li </span> </p> </a> <a class="w3-text" href="liu14h_interspeech.html"> <p> Phonotactic language recognition based on time-gap-weighted lattice kernels <br> <span class="w3-text w3-text-theme"> Wei-Wei Liu, Wei-Qiang Zhang, Jia Liu </span> </p> </a> <a class="w3-text" href="segbroeck14b_interspeech.html"> <p> UBM fused total variability modeling for language identification <br> <span class="w3-text w3-text-theme"> Maarten van Segbroeck, Ruchir Travadi, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="diez14b_interspeech.html"> <p> On the complementarity of short-time fourier analysis windows of different lengths for improved language recognition <br> <span class="w3-text w3-text-theme"> Mireia Diez, Mikel Penagarikano, German Bordel, Amparo Varona, Luis Javier Rodriguez-Fuentes </span> </p> </a> <a class="w3-text" href="travadi14_interspeech.html"> <p> Modified-prior i-vector estimation for language identification of short duration utterances <br> <span class="w3-text w3-text-theme"> Ruchir Travadi, Maarten Van Segbroeck, Shrikanth S. Narayanan </span> </p> </a> <a class="w3-text" href="dharo14_interspeech.html"> <p> Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers <br> <span class="w3-text w3-text-theme"> Luis Fernando D'Haro, Ricardo Cordoba, Christian Salamea, Javier Ferreiros </span> </p> </a> <a class="w3-text" href="plchot14_interspeech.html"> <p> PLLR features in language recognition system for RATS <br> <span class="w3-text w3-text-theme"> Oldřich Plchot, Mireia Diez, Mehdi Soufifar, Lukáš Burget </span> </p> </a> <a class="w3-text" href="yeong14_interspeech.html"> <p> Language identification of code Switching sentences and multilingual sentences of under-resourced languages by using multi structural word information <br> <span class="w3-text w3-text-theme"> Yin-Lai Yeong, Tien-Ping Tan </span> </p> </a> </div> </div> <br> </div> </div> <!-- Paper search table --> <div class="w3-container" id="bypaper"> <div class="w3-content" style="max-width:1200px;margin-top:60px"> <div class="w3-container w3-card w3-padding w3-white"> <div class="w3-text w3-center"> <span class='w3-large'> <b>Search papers</b> </span> <button class='w3-text w3-button w3-right' onclick="document.getElementById('help_papers').style.display='block'"> <i class='icon-question-circle'></i> </button> </div> <table id="paper_table" class="display" style="width:95%"> <thead> <tr> <th width="100%">Article</th> <th width="0%"></th> <th width="0%"></th> <th width="0%"></th> </tr> </thead> </table> </div> <!-- <p class="w3-small" style="margin-bottom: 50px"></p> --> </div> </div> </div> <!-- Session chooser --> <div id="sessionchooser" class="w3-modal" > <div class="w3-modal-content w3-card-4 w3-greyscale w3-theme-d4 w3-padding w3-bordered" onclick="document.getElementById('sessionchooser').style.display='none'"> <span onclick="document.getElementById('sessionchooser').style.display='none'" class="w3-button w3-display-topright">×</span> <p><a class="w3-text" href="#Keynote">Keynote</a></p> <p><a class="w3-text" href="#Multi-Lingual ASR">Multi-Lingual ASR</a></p> <p><a class="w3-text" href="#Prosody Processing">Prosody Processing</a></p> <p><a class="w3-text" href="#Speaker Recognition — Applications">Speaker Recognition — Applications</a></p> <p><a class="w3-text" href="#Phonetics and Phonology 1, 2">Phonetics and Phonology 1, 2</a></p> <p><a class="w3-text" href="#Open Domain Situated Conversational Interaction (Special Session)">Open Domain Situated Conversational Interaction (Special Session)</a></p> <p><a class="w3-text" href="#Speech Production: Models and Acoustics">Speech Production: Models and Acoustics</a></p> <p><a class="w3-text" href="#Extraction of Para-Linguistic Information">Extraction of Para-Linguistic Information</a></p> <p><a class="w3-text" href="#Spoken Language Understanding">Spoken Language Understanding</a></p> <p><a class="w3-text" href="#Spoken Dialogue Systems">Spoken Dialogue Systems</a></p> <p><a class="w3-text" href="#DNN Architectures and Robust Recognition">DNN Architectures and Robust Recognition</a></p> <p><a class="w3-text" href="#Speaker Recognition — Evaluation and Forensics">Speaker Recognition — Evaluation and Forensics</a></p> <p><a class="w3-text" href="#Speech Production I, II">Speech Production I, II</a></p> <p><a class="w3-text" href="#INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)">INTERSPEECH 2014 Computational Paralinguistics ChallengE (ComParE)</a></p> <p><a class="w3-text" href="#Hearing and Perception">Hearing and Perception</a></p> <p><a class="w3-text" href="#Cross-Linguistic Studies">Cross-Linguistic Studies</a></p> <p><a class="w3-text" href="#Speaker Diarization">Speaker Diarization</a></p> <p><a class="w3-text" href="#Robust ASR 1, 2">Robust ASR 1, 2</a></p> <p><a class="w3-text" href="#Implementation of Language Model Algorithms">Implementation of Language Model Algorithms</a></p> <p><a class="w3-text" href="#Speaker Recognition — Noise and Channel Robustness">Speaker Recognition — Noise and Channel Robustness</a></p> <p><a class="w3-text" href="#Speech Synthesis I-III">Speech Synthesis I-III</a></p> <p><a class="w3-text" href="#Multi-Lingual Cross-Lingual and Low-Resource ASR">Multi-Lingual Cross-Lingual and Low-Resource ASR</a></p> <p><a class="w3-text" href="#Speech Estimation and Sound Source Separation">Speech Estimation and Sound Source Separation</a></p> <p><a class="w3-text" href="#Feature Extraction and Modeling for ASR 1, 2">Feature Extraction and Modeling for ASR 1, 2</a></p> <p><a class="w3-text" href="#Speech Analysis I, II">Speech Analysis I, II</a></p> <p><a class="w3-text" href="#Speech Technologies and Applications">Speech Technologies and Applications</a></p> <p><a class="w3-text" href="#Source Separation and Computational Auditory Scene Analysis">Source Separation and Computational Auditory Scene Analysis</a></p> <p><a class="w3-text" href="#Speech Technologies for Ambient Assisted Living (Special Session)">Speech Technologies for Ambient Assisted Living (Special Session)</a></p> <p><a class="w3-text" href="#DNN for ASR">DNN for ASR</a></p> <p><a class="w3-text" href="#Speaker Recognition — General Topics">Speaker Recognition — General Topics</a></p> <p><a class="w3-text" href="#Speech Processing with Multi-Modalities">Speech Processing with Multi-Modalities</a></p> <p><a class="w3-text" href="#Normalization and Discriminative Training Methods">Normalization and Discriminative Training Methods</a></p> <p><a class="w3-text" href="#Paralinguistic and Extralinguistic Information">Paralinguistic and Extralinguistic Information</a></p> <p><a class="w3-text" href="#Text Processing for Speech Synthesis">Text Processing for Speech Synthesis</a></p> <p><a class="w3-text" href="#Cross-language Perception and Production">Cross-language Perception and Production</a></p> <p><a class="w3-text" href="#Text-Dependent Speaker Verification With Short Utterances (Special">Text-Dependent Speaker Verification With Short Utterances (Special</a></p> <p><a class="w3-text" href="#Speech and Audio Analysis">Speech and Audio Analysis</a></p> <p><a class="w3-text" href="#Cross-Lingual and Adaptive Language Modeling">Cross-Lingual and Adaptive Language Modeling</a></p> <p><a class="w3-text" href="#Pronunciation Modeling and Learning">Pronunciation Modeling and Learning</a></p> <p><a class="w3-text" href="#Show and Tell Session 1, 1">Show and Tell Session 1, 1</a></p> <p><a class="w3-text" href="#Statistical Parametric Speech Synthesis">Statistical Parametric Speech Synthesis</a></p> <p><a class="w3-text" href="#Voice Activity Detection">Voice Activity Detection</a></p> <p><a class="w3-text" href="#Disordered Speech">Disordered Speech</a></p> <p><a class="w3-text" href="#Speech and Multimodal Resources">Speech and Multimodal Resources</a></p> <p><a class="w3-text" href="#Phase Importance in Speech Processing Applications (Special Session)">Phase Importance in Speech Processing Applications (Special Session)</a></p> <p><a class="w3-text" href="#Spoken Term Detection and Document Retrieval">Spoken Term Detection and Document Retrieval</a></p> <p><a class="w3-text" href="#Prosody and Paralinguistic Information">Prosody and Paralinguistic Information</a></p> <p><a class="w3-text" href="#Features and Robustness in Speaker and Language Recognition">Features and Robustness in Speaker and Language Recognition</a></p> <p><a class="w3-text" href="#Topic Spotting and Summarization of Spoken Documents">Topic Spotting and Summarization of Spoken Documents</a></p> <p><a class="w3-text" href="#DNN Learning">DNN Learning</a></p> <p><a class="w3-text" href="#Perception of Emotion and Prosody">Perception of Emotion and Prosody</a></p> <p><a class="w3-text" href="#Deep Neural Networks for Speech Generation and Synthesis (Special">Deep Neural Networks for Speech Generation and Synthesis (Special</a></p> <p><a class="w3-text" href="#Speech Analysis and Perception">Speech Analysis and Perception</a></p> <p><a class="w3-text" href="#Intelligibility Enhancement and Predictive Measures">Intelligibility Enhancement and Predictive Measures</a></p> <p><a class="w3-text" href="#Speech and Language Processing — General Topics">Speech and Language Processing — General Topics</a></p> <p><a class="w3-text" href="#Language, Dialect and Accent Recognition">Language, Dialect and Accent Recognition</a></p> <p><a class="w3-text" href="#Adaptation 1, 2">Adaptation 1, 2</a></p> <p><a class="w3-text" href="#Speaker Localization">Speaker Localization</a></p> <p><a class="w3-text" href="#Speech Representation, Detection and Classification">Speech Representation, Detection and Classification</a></p> <p><a class="w3-text" href="#Spoken Term Detection for Low-Resource Languages I, II">Spoken Term Detection for Low-Resource Languages I, II</a></p> <p><a class="w3-text" href="#Voice Conversion">Voice Conversion</a></p> <p><a class="w3-text" href="#Speech and Audio Segmentation and Classification">Speech and Audio Segmentation and Classification</a></p> <p><a class="w3-text" href="#Language Acquisition">Language Acquisition</a></p> <p><a class="w3-text" href="#Speech Perception">Speech Perception</a></p> <p><a class="w3-text" href="#Language and Lexical Modeling">Language and Lexical Modeling</a></p> <p><a class="w3-text" href="#Speech Enhancement (Single- and Multi-Channel) 1, 2">Speech Enhancement (Single- and Multi-Channel) 1, 2</a></p> <p><a class="w3-text" href="#Speech Coding and Transmission">Speech Coding and Transmission</a></p> <p><a class="w3-text" href="#Unsupervised or Corrective Lexical Modeling">Unsupervised or Corrective Lexical Modeling</a></p> <p><a class="w3-text" href="#Meta Data">Meta Data</a></p> <p><a class="w3-text" href="#Language Recognition">Language Recognition</a></p> </div> </div> <script> function myFunction() { var x = document.getElementById("smallnav"); if (x.className.indexOf("w3-show") == -1) { x.className += " w3-show"; } else { x.className = x.className.replace(" w3-show", ""); } } // Get the modal var modal = document.getElementById('sessionchooser'); // When the user clicks anywhere outside of the modal, close it window.onclick = function(event) { if (event.target == modal) { modal.style.display = "none"; } } $(document).ready(function() { $('#paper_table').DataTable( { data: [['Anne Cutler', 'Learning about speech', 'cutler14_interspeech', 'life language learn birth acquire later efficiency child inhibits pronunciation'], ['K. J. Ray Liu', 'Decision learning in data science: where John Nash meets social media', 'liu14_interspeech', 'user big user-generated strategic analyze local machine notion growing outcome'], ['Lori Lamel', 'Language diversity: speech processing in a multi-lingual context', 'lamel14_interspeech', 'technology spoken said downstream program acoustic-phonetics variety quaero transcription analytics'], ['William S.-Y. Wang', 'Sound patterns in language', 'wang14_interspeech', 'specie organization world tone faculty infrastructure mutually biological comment ease'], ['Li Deng', 'Achievements and challenges of deep learning — from speech analysis and recognition to language and multimodal processing', 'deng14_interspeech', 'vision frontier big machine knowledge part artificial community neural success'], ['Yu Zhang, Ekapol Chuangsuwanich, James R. Glass', 'Language ID-based training of multilingual stacked bottleneck features', 'zhang14_interspeech', 'similar source iarpa-babel feature-level available pick target trained sharing dnn'], ['Van Hai Do, Xiong Xiao, Eng Siong Chng, Haizhou Li', 'Kernel density-based acoustic model with cross-lingual bottleneck features for resource limited LVCSR', 'do14_interspeech', 'density dnn gmm non-parametric emission street speech wall journal reliably'], ['Ngoc Thang Vu, Yuanfan Wang, Marten Klose, Zlatka Mihaylova, Tanja Schultz', 'Improving ASR performance on non-native speech using multilingual and crosslingual information', 'vu14_interspeech', 'chinese german accent bulgarian indian english adaptation globalphone lingual case'], ['Kate M. Knill, Mark J. F. Gales, Anton Ragni, Shakti P. Rath', 'Language independent and unsupervised acoustic models for speech recognition and keyword spotting', 'knill14_interspeech', 'multi-language dependent haitian creole training babel resource zero data target'], ['Peter Bell, Joris Driesen, Steve Renals', 'Cross-lingual adaptation with multi-task adaptive networks', 'bell14_interspeech', 'applied language target dnn network posterior-based parliament multitask ted data'], ['Marzieh Razavi, Mathew Magimai Doss', 'On recognition of non-native speech using probabilistic lexical model', 'razavi14_interspeech', 'kl-hmm ann precisely context-dependent unit asr uttered framework hybrid pronunciation'], ['Kou Tanaka, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura', 'Direct F<SUB>0</SUB> control of an electrolarynx based on statistical excitation feature prediction and its evaluation through simulation', 'tanaka14_interspeech', 'speech laryngectomees face-to-face produced enhanced electrolaryngeal enhancement naturalness laryngectomee device'], ['Daniel R. van Niekerk, Etienne Barnard', 'A target approximation intonation model for yorùbá TTS', 'niekerk14_interspeech', 'hts analytically analytical under-resourced encouraging perceptually quantitative complete efficiently typical'], ['Anandaswarup Vadapalli, Kishore Prahallad', 'Learning continuous-valued word representations for phrase break prediction', 'vadapalli14_interspeech', 'valued discrete po tag continuous fixing lsa specific modeling induce'], ['Hao Che, Jianhua Tao, Ya Li', 'Improving Mandarin prosodic boundary prediction with rich syntactic features', 'che14_interspeech', 'phrase dependency benefited in-depth usefulness labeling indicated yet scale performance'], ['Rasmus Dall, Marcus Tomalin, Mirjam Wester, William Byrne, Simon King', 'Investigating automatic & human filled pause insertion for speech synthesis', 'dall14_interspeech', 'insert seek fluent right practice actual system several people conversational'], ['Rasmus Dall, Mirjam Wester, Martin Corley', 'The effect of filled pauses and speaking rate on speech comprehension in natural, vocoded and synthetic speech', 'dall14b_interspeech', 'reaction slower pause listener time silent synthesis target experiment following'], ['Elie Khoury, Tomi Kinnunen, Aleksandr Sizov, Zhizheng Wu, Sébastien Marcel', 'Introducing i-vectors for joint anti-spoofing and speaker verification', 'khoury14_interspeech', 'attack asv spoofing countermeasure conversion provoke unless stand-alone proof replay'], ['Ryan Leary, Walter Andrews', 'Random projections for large-scale speaker search', 'leary14_interspeech', 'vector miss space returning distance hashing derived locality feature reduction'], ['Corinne Fredouille, Delphine Charlet', 'Analysis of i-vector framework for speaker identification in TV-shows', 'fredouille14_interspeech', 'monomodal repere joint task challenge verification i-vector-based win like invited'], ['Antoine Laurent, Nathalie Camelin, Christian Raymond', 'Boosting bonsai trees for efficient features combination: application to speaker role identification', 'laurent14_interspeech', 'decision weak learning learner algorithm machine mention denoted many coming'], ['Yves Raimond, Thomas Nixon', 'Identifying contributors in the BBC world service archive', 'raimond14_interspeech', 'identity speaker publishing hashing locality propagate crowdsourced crowdsourcing programme refine'], ['Finnian Kelly, Rahim Saeidi, Naomi Harte, David A. van Leeuwen', 'Effect of long-term ageing on i-vector speaker verification', 'kelly14_interspeech', 'age gmm-ubm difference increase degrades system absolute trinity impact dublin'], ['Maarten Versteegh, Amanda Seidl, Alejandrina Cristia', 'Acoustic correlates of phonological status', 'versteegh14_interspeech', 'tenseness divergence nasality french allophonic larger measurement english contrast sound'], ['Manu Airaksinen, Paavo Alku', 'Parameterization of the glottal source with the phase plane plot', 'airaksinen14_interspeech', 'naq pps normalized flow represented parameter graphically tie symmetry spanned'], ['Phil Rose', "Transcribing tone — a likelihood-based quantitative evaluation of chao's tone letters", 'rose14_interspeech', 'tonal equidistant five-point lying integer conform pitch reflection target quantitatively'], ['Diyana Hamzah, James Sneed German', 'Intonational phonology and prosodic hierarchy in malay', 'hamzah14_interspeech', 'edge phrase bear interview right story-telling tone beckman singapore pierrehumbert'], ['Uwe D. Reichel, Katalin Mády', 'Comparing parameterizations of pitch register and its discontinuities at prosodic boundaries for Hungarian', 'reichel14_interspeech', 'topline stylization strength pool boundary base valley fitted declination turned'], ['George Christodoulides, Mathieu Avanzi', 'An evaluation of machine learning methods for prominence detection in French', 'christodoulides14_interspeech', 'prominent syllable automatic demarcation different prosodic sex analysing regional grouping'], ['Gang Chen, Soo Jin Park, Jody Kreiman, Abeer Alwan', 'Investigating the effect of F0 and vocal intensity on harmonic magnitudes: data from high-speed laryngeal videoendoscopy', 'chen14_interspeech', 'glottal assumed skewness quotient decreasing pulse measure decrease covary increasing'], ['Elisabeth Delais-Roussarie, Damien Lolive, Hiyon Yoo, Nelly Barbot, Olivier Rosec', 'Adapting prosodic chunking algorithm and synthesis system to specific style: the case of dictation', 'delaisroussarie14_interspeech', 'chunk speaking realization tts-system confronting conformity preliminary module elaboration pupil'], ['Jae-Hyun Sung', 'The articulation of lexical and post-lexical palatalization in Korean', 'sung14_interspeech', 'morpheme tongue ultrasound boundary morphological gesture intervenes evidence mounting consonant'], ['Diana Archangeli, Samuel Johnston, Jae-Hyun Sung, Muriel Fisher, Michael Hammond, Andrew Carnie', 'Articulation and neutralization: a preliminary study of lenition in scottish gaelic', 'archangeli14_interspeech', 'sound mutation merging distinct neutralizing neutralisation idiosyncratic distinguishes aspiration ultrasound'], ['Kanae Amino, Hisanori Makinae, Tatsuya Kitamura', 'Nasality in speech and its contribution to speaker individuality', 'amino14_interspeech', 'nasal phonemic incidental similarity characterises existed speaker-related sound assimilation type'], ['Jason Brown, Eden Matene', 'Is speech rhythm an intrinsic property of language?', 'brown14_interspeech', 'rhythmic byproduct phonotactic implicitly accepted linked traditionally profile fit inherent'], ['Anke Jackschina, Barbara Schuppler, Rudolf Muhr', 'Where /ar/ the /r/s in standard austrian German?', 'jackschina14_interspeech', 'realization absent tends read reduced trill word part inform stem'], ['Fang Hu, Minghui Zhang', 'Diphthongized vowels in the yi county hui Chinese dialect', 'hu14_interspeech', 'diphthong monophthongs neutralized target offset distinctive falling rising intermediate onset'], ['Volker Dellwo, Peggy Mok, Mathias Jenny', 'Rhythmic variability between some asian languages: results from an automatic analysis of temporal characteristics', 'dellwo14_interspeech', 'amplitude thai retrieved cantonese organization interval envelope vary peak mandarin'], ['Angelika Braun, Daniela Decker', 'Listener estimation of speaker age based on whispered speech', 'braun14_interspeech', 'assessed disguise phonated whispering group forensic belonging rest laryngeal chance'], ['Benjawan Kasisopa, Virginie Attina, Denis Burnham', 'The Lombard effect with Thai lexical tones: an acoustic analysis of articulatory modifications in noise', 'kasisopa14_interspeech', 'contour speech heightened tone exaggerated citation quiet realisation loudness modify'], ['Aasish Pappu, Alexander I. Rudnicky', 'Learning situated knowledge bases through dialog', 'pappu14_interspeech', 'commonsense base user augment domain query information solicits freebase seminar'], ['Teruhisa Misu', 'Crowdsourcing for situated dialog systems in a moving car', 'misu14_interspeech', 'query collected user similarity surroundings real smartphones prompting worker crowd'], ['Ryuichiro Higashinaka, Toyomi Meguro, Kenji Imamura, Hiroaki Sugiyama, Toshiro Makino, Yoshihiro Matsuo', 'Evaluating coherence in open domain conversational systems', 'higashinaka14_interspeech', 'utterance generated dialogue incoherent breakdown ascertain possible user distinguishes avoided'], ['Frederic Bechet, Alexis Nasr, Benoit Favre', 'Adapting dependency parsing to spontaneous speech for open domain spoken language understanding', 'bechet14_interspeech', 'syntactic conversation message non-canonical semantic parser annotation framenet call-centre superfluous'], ['M. Gašić, Dongho Kim, Pirros Tsiakoulis, Catherine Breslin, Matthew Henderson, M. Szummer, B. Thomson, Steve Young', 'Incremental on-line adaptation of POMDP-based dialogue managers to extended domains', 'gasic14_interspeech', 'concept manager domain previously new gaussian accommodated recursively repeatedly restaurant'], ['Jean-Philippe Robichaud, Paul A. Crook, Puyang Xu, Omar Zia Khan, Ruhi Sarikaya', 'Hypotheses ranking for robust domain classification and tracking in dialogue systems', 'robichaud14_interspeech', 'slu ranker multi-turn upfront non-contextual gbdt detection multi-domain boosted relative'], ['Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan', 'Motor control primitives arising from a learned dynamical systems model of speech articulation', 'ramanarayanan14_interspeech', 'articulatory movement produce derive dictionary input optimal envision stochastically convolutive'], ['Chia-Hsin Yeh, Chiung-Yao Wang, Jung-Yueh Tu', 'Nonword repetition of taiwanese disyllabic tonal sequences in adults with language attrition', 'yeh14_interspeech', 'tone mid-level low-rising high-falling accuracy american learner picture-naming participant low-falling'], ['Andreas Windmann, Juraj Šimko, Petra Wagner', 'A unified account of prominence effects in an optimization-based model of speech timing', 'windmann14_interspeech', 'pitch-accented reproduces underline non-prominent polysyllabic shortening replicate differential suprasegmental durational'], ['Jangwon Kim, Sungbok Lee, Shrikanth S. Narayanan', 'Estimation of the movement trajectories of non-crucial articulators based on the detection of crucial moments and physiological constraints', 'kim14_interspeech', 'articulatory forced-alignment speech governed production realizing nearest develops mathematical linguistically'], ['Prasad Sudhakar, Prasanta Kumar Ghosh', 'Sparse smoothing of articulatory features from Gaussian mixture model based acoustic-to-articulatory inversion: benefit to speech recognition', 'sudhakar14_interspeech', 'aai low-pass mmse estimate optimality postprocessing using gmm accuracy without'], ['Jun Wang, William Katz, Thomas F. Campbell', 'Contribution of tongue lateral to consonant production', 'wang14b_interspeech', 'lip sensor movement body articulator tip attached upper side motion'], ['Min Liu, Shuju Shi, Jinsong Zhang', 'A preliminary study on acoustic correlates of tone2+tone2 disyllabic word stress in Mandarin', 'liu14b_interspeech', 'initial-stressed nucleus difference boundary prosodic level segment rising duration enlargement'], ['Mohammad Abuoudeh, Olivier Crouzet', 'Vowel length impact on locus equation parameters: an investigation on jordanian Arabic', 'abuoudeh14_interspeech', 'change though place articulation duration-related consonant intercept contrast role temporal'], ['Philip J. Roberts, Henning Reetz, Aditi Lahiri', 'Corpus-testing a fricative discriminator; or, just how invariant is this invariant?', 'roberts14_interspeech', 'dft khz sibilant distinction token caveat english excess kiel rate'], ['Brian O. Bush, Alexander Kain', 'Modeling coarticulation in continuous speech', 'bush14_interspeech', 'vocoded slope trajectory grid-search sigmoidal local position limited combination sentence-level'], ['Khalid Daoudi, Blaise Bertrac', 'On classification between normal and pathological voices using the MEEI-kayPENTAX database: issues and consequences', 'daoudi14_interspeech', 'dysphonic sustained voice meei infirmary massachusetts corp pursuit gabor dozen'], ['Véronique Bukmaier, Jonathan Harrington, Ulrich Reubold, Felicitas Kleber', 'Synchronic variation in the articulation and the acoustics of the Polish three-way place distinction in sibilants and its implications for diachronic change', 'bukmaier14_interspeech', 'instability orientation position tongue greater articulatory together similarity potential either'], ['Rahul Gupta, Panayiotis G. Georgiou, David C. Atkins, Shrikanth S. Narayanan', "Predicting client's inclination towards target behavior change in motivational interviewing and investigating the role of laughter", 'gupta14_interspeech', 'changetalk valence unweighted coding addiction counselor cure specifically psychotherapy observational'], ['Bo Xiao, Daniel Bone, Maarten Van Segbroeck, Zac E. Imel, David C. Atkins, Panayiotis G. Georgiou, Shrikanth S. Narayanan', 'Modeling therapist empathy through prosody in drug addiction counseling', 'xiao14_interspeech', 'distribution prosodic energy disposition quantize pitch intuition pattern normalize negatively'], ['Daniel Bone, Chi-Chun Lee, Alexandros Potamianos, Shrikanth S. Narayanan', 'An investigation of vocal arousal dynamics in child-psychologist interactions using synchrony measures and a conversation-based model', 'bone14_interspeech', 'asd affective psychologist child severity modulation conversational bone responsive seemingly'], ['Kun Han, Dong Yu, Ivan Tashev', 'Speech emotion recognition using deep neural network and extreme learning machine', 'han14_interspeech', 'utterance-level feature dnns elm probability distribution effective segment-level partly low-level'], ['Khiet P. Truong, Gerben J. Westerhof, Franciska de Jong, Dirk Heylen', 'An annotation scheme for sighs in spontaneous dialogue', 'truong14_interspeech', 'emotional kappa cohen associated annotated emotion sigh content relief vocalisation'], ['Lei He, Volker Dellwo', 'Speaker idiosyncratic variability of intensity across syllables', 'he14_interspeech', 'tevoid holistically idiosyncrasy change sixteen forensic pairwise locally syllabic deviation'], ['Soroosh Mariooryad, Reza Lotfian, Carlos Busso', 'Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora', 'mariooryad14_interspeech', 'database conversation build reaction recording semaine conversational non-emotional collected sample'], ['Saeid Safavi, Martin Russell, Peter Jančovič', "Identification of age-group from children's speech by computers and humans", 'safavi14_interspeech', 'age-id gmm-svm i-vector gmm-ubm khz gender-independent system band-limited kid gender-dependent'], ['Mohamed Morchid, Richard Dufour, Mohamed Bouallegue, Georges Linarès, Renato De Mori', 'Theme identification in human-human conversations with features from specific speaker type hidden spaces', 'morchid14_interspeech', 'customer separate agent topic semantic probability strategy possible survey word-based'], ['Alex Marin, Roman Holenstein, Ruhi Sarikaya, Mari Ostendorf', 'Learning phrase patterns for text classification using a knowledge graph and unlabeled data', 'marin14_interspeech', 'fully-supervised self-training feature pattern intent conjunction filtered employing confidence mapping'], ['Puyang Xu, Ruhi Sarikaya', 'Targeted feature dropout for robust slot filling in natural language understanding', 'xu14_interspeech', 'entity dictionary dangerous crf boosting degrading tradeoff coverage extends exist'], ['Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee', 'Spoken question answering using tree-structured conditional random fields and two-layer random walk', 'shiang14_interspeech', 'webpage re-ranking query formulated n-best retrieval answer information noisy inevitably'], ['Ruhi Sarikaya, Asli Celikyilmaz, Anoop Deoras, Minwoo Jeong', 'Shrinkage based features for slot tagging with conditional random fields', 'sarikaya14_interspeech', 'crfs shrinking generated class-based overhead exponential performance expect behind sum'], ['Yangyang Shi, Yi-Cheng Pan, Mei-Yuh Hwang', 'Cluster based Chinese abbreviation modeling', 'shi14_interspeech', 'generation latent sparseness topic clustering crf data using training character-based'], ['Xiantao Zhang, Dongchen Li, Xihong Wu', 'Parsing named entity as syntactic structure', 'zhang14b_interspeech', 'ner nested chinese recognizing differentiates recognition natural output previous processing'], ['Gokhan Tur, Anoop Deoras, Dilek Hakkani-Tür', 'Detecting out-of-domain utterances addressed to a virtual personal assistant', 'tur14_interspeech', 'grammar fusion induction late rule using lexicalized non-terminal ontology knowledge-based'], ['Spiros Georgiladakis, Christina Unger, Elias Iosif, Sebastian Walter, Philipp Cimiano, Euripides Petrakis, Alexandros Potamianos', 'Fusion of knowledge-based and data-driven approaches to grammar induction', 'georgiladakis14_interspeech', 'late rule using lexicalized non-terminal ontology mid f-measure weakness fusing'], ['Denys Katerenchuk, Andrew Rosenberg', 'Improving named entity recognition with prosodic features', 'katerenchuk14_interspeech', 'crf-based ne ace asr leaving tobi unable nlp oov system'], ['Suman V. Ravuri, Andreas Stolcke', 'Neural network models for lexical addressee detection', 'ravuri14_interspeech', 'n-gram classification phase dialog utterance combining likelihood neural-network class someone'], ['Valerie Freeman, Julian Chan, Gina-Anne Levow, Richard Wright, Mari Ostendorf, Victoria Zayats', 'Manipulating stance and involvement using collaborative tasks: an exploratory comparison', 'freeman14_interspeech', 'stance-taking elicit speaking consistent corpus style unscripted design subjectivity dyad'], ['Fabrizio Ghigi, Maxine Eskenazi, M. Ines Torres, Sungjin Lee', 'Incremental dialog processing in a task-oriented dialog', 'ghigi14_interspeech', 'idp user system month interrupted shorten strategy eliminating gradually utterance'], ['Naoki Hotta, Kazunori Komatani, Satoshi Sato, Mikio Nakano', 'Detecting incorrectly-segmented utterances for posteriori restoration of turn-taking and ASR results', 'hotta14_interspeech', 'domain-independent feature cross-domain cast incorrectly inappropriate method erroneous fragment utterance'], ['Hany Hassan, Lee Schwartz, Dilek Hakkani-Tür, Gokhan Tur', 'Segmentation and disfluency removal for conversational speech translation', 'hassan14_interspeech', 'latency on-line followed allowable newer ahead sentential best off-line second'], ['Shinji Watanabe, John R. Hershey, Tim K. Marks, Youichi Fujii, Yusuke Koji', 'Cost-level integration of statistical and rule-based dialog managers', 'watanabe14_interspeech', 'manager penalizes cost action system decision automobile existing use inefficient'], ['Dongho Kim, Catherine Breslin, Pirros Tsiakoulis, M. Gašić, Matthew Henderson, Steve Young', 'Inverse reinforcement learning for micro-turn management', 'kim14b_interspeech', 'turn-taking interaction optimise reward dialogue human-human natural irl responsive interleaved'], ['John Kane, Irena Yanushevskaya, Céline de Looze, Brian Vaughan, Ailbhe Ní Chasaide', 'Analysing the prosodic characteristics of speech-chunks preceding silences in task-based interactions', 'kane14_interspeech', 'gap pause within- predict automatic contour dyadic descriptive tune parameter'], ['Haşim Sak, Andrew Senior, Françoise Beaufays', 'Long short-term memory recurrent neural network architectures for large scale acoustic modeling', 'sak14_interspeech', 'lstm rnns rnn layer two-layer deep machine architecture converges effective'], ['George Saon, Hagen Soltau, Ahmad Emami, Michael Picheny', 'Unfolded recurrent neural networks for speech recognition', 'saon14_interspeech', 'layer depth dnns minibatches non-recurrent unfolding fmllr gpu feedforward tied'], ['Vikrant Singh Tomar, Richard C. Rose', 'Manifold regularized deep neural networks', 'tomar14_interspeech', 'regularization asr dnn constraint extraction locality speech-in-noise feature imposing amongst'], ['Bo Li, Khe Chai Sim', 'Modeling long temporal contexts for robust DNN-based speech recognition', 'li14_interspeech', 'dnns single dnn layer hidden boltzmann multi-style initialized softmax independence'], ['Feipeng Li, Phani S. Nidadavolu, Hynek Hermansky', 'A long, deep and wide artificial neural net for robust speech recognition in unknown noise', 'li14b_interspeech', 'subbands ensemble speech-shaped frequency subband multiple adapts accommodate approximated trained'], ['Ladislav Seps, Jiri Malek, Petr Cerva, Jan Nouza', 'Investigation of deep neural networks for robust recognition of nonlinearly distorted speech', 'seps14_interspeech', 'dnn-hmm heq architecture context-dependent compensation distortion low-bit-rate wer via shallower'], ['Désiré Bansé, George R. Doddington, Daniel Garcia-Romero, John J. Godfrey, Craig S. Greenberg, Alvin F. Martin, Alan McCree, Mark Przybocki, Douglas A. Reynolds', 'Summary and initial results of the 2013-2014 speaker recognition i-vector machine learning challenge', 'banse14_interspeech', 'sre nist evaluation participant sres increase saw coordinated audio fixed-length'], ['David A. van Leeuwen, Niko Brümmer', 'Constrained speaker linking', 'leeuwen14_interspeech', 'assignment channel partitioning identity cgn constraint unspecified intractable distribution tractable'], ['Sergey Novoselov, Timur Pekhovsky, Konstantin Simonchik, Andrey Shulipa', 'RBM-PLDA subsystem for the NIST i-vector challenge', 'novoselov14_interspeech', 'stc mindcf nist- rbm pseudo reaching inferior system examining extractor'], ['Stephen H. Shum, Najim Dehak, James R. Glass', 'Limited labels for unlimited data: active learning for speaker recognition', 'shum14_interspeech', 'query pairwise state-of-the-art needed anecdotal noiseless nearest-neighbor mere generalizability fraction'], ['Niko Brümmer, Albert Swart', 'Bayesian calibration for forensic evidence reporting', 'brummer14_interspeech', 'likelihood-ratio problem solution plugin principled vehicle score recipe experimentally treatment'], ['Shunichi Ishihara', 'Replicate mismatch between test/background and development databases: the impact on the performance of likelihood ratio-based forensic voice comparison', 'ishihara14_interspeech', 'fvc validity reliability number term monte carlo report repeatedly within-speaker'], ['Manu Airaksinen, Tom Bäckström, Paavo Alku', 'Automatic estimation of the lip radiation effect in glottal inverse filtering', 'airaksinen14b_interspeech', 'method non-invasive minimization approximated blind parameter tuning obtaining determining proved'], ['Marcelo de Oliveira Rosa', 'Simulation of 3d larynges with asymmetric distribution of viscoelastic properties in their vocal folds', 'rosa14_interspeech', 'glottal larynx closure superficial left-right initiate tissue mass airflow altered'], ['Hironori Takemoto, Parham Mokhtari, Tatsuya Kitamura', 'Comparison of vocal tract transfer functions calculated using one-dimensional and three-dimensional acoustic simulation methods', 'takemoto14_interspeech', 'khz valid dip method frequency fossa transverse pharynx subject range'], ['Jangwon Kim, Donna Erickson, Sungbok Lee, Shrikanth S. Narayanan', 'A study of invariant properties and variation patterns in the converter/distributor model for emotional speech', 'kim14c_interspeech', 'emotion-dependent articulatory parameter surface-level movement relationship shadow reported two-fold production'], ['Alexander Hewer, Ingmar Steiner, Stefanie Wuhrer', 'A hybrid approach to 3d tongue modeling from vocal tract MRI using unsupervised image segmentation and mesh deformation', 'hewer14_interspeech', 'cloud imaging deforming fallback extract point minimally scan even registered'], ['Tokihiko Kaburagi', 'Estimation of vocal-tract shape from speech spectrum and speech resynthesis based on a generative model', 'kaburagi14_interspeech', 'optimizes area parameter perturbation determined length estimated average time-series cross-sectional'], ['Andrés Benítez, Vikram Ramanarayanan, Louis Goldstein, Shrikanth S. Narayanan', 'A real-time MRI study of articulatory setting in second language speech', 'benitez14_interspeech', 'posture inter-speech non-native speaker tract english german vocal imaged phonological'], ['Takayuki Arai', 'Retroflex and bunched English /r/ with physical models of the human vocal tract', 'arai14_interspeech', 'sound produced lip teach produce see tongue american student clear'], ['Panying Rong, Yana Yunusova, James D. Berry, Lorne Zinman, Jordan R. Green', 'Parameterization of articulatory pattern in speakers with ALS', 'rong14_interspeech', 'parafac pca subject tongue mode two-factor amyotrophic parameterize sclerosis individualized'], ['Sujith P, Prasanta Kumar Ghosh', 'Missing samples estimation in electromagnetic articulography data using equality constrained kalman smoother', 'p14_interspeech', 'ema articulatory movement sensor square dynamic give a-posteriori subject estimate'], ['An Ji, Michael T. Johnson, Jeff Berry', 'Palate-referenced articulatory features for acoustic-to-articulator inversion', 'ji14_interspeech', 'normalized sensor separation palate space height working lip variance direct'], ['Hidetsugu Uchida, Kohei Wakamiya, Tokihiko Kaburagi', 'A study on the improvement of measurement accuracy of the three-dimensional electromagnetic articulography', 'uchida14_interspeech', 'coil receiver transmitter d-ema position alignment magnetic estimation signal region'], ['Björn Schuller, Stefan Steidl, Anton Batliner, Julien Epps, Florian Eyben, Fabien Ringeval, Erik Marchi, Yue Zhang', 'The INTERSPEECH 2014 computational paralinguistics challenge: cognitive & physical load', 'schuller14_interspeech', 'test-bed opensmile sub-challenges baseline toolkit unified participant provided procedure generated'], ['Jouni Pohjalainen, Paavo Alku', 'Filtering and subspace selection for spectral features in detecting speech under physical stress', 'pohjalainen14_interspeech', 'short-time long-term predictive time timbral method interrelationship spectrum detection challenge'], ['Ming Li', 'Automatic recognition of speaker physical load using posterior probability based features from acoustic and phonetic tokens', 'li14c_interspeech', 'phoneme gmm zero-order munich opensmile component paralinguistics tandem calculating histogram'], ['Heysem Kaya, Tuğçe Özkaptan, Albert Ali Salah, Sadık Fikret Gürgen', 'Canonical correlation analysis and local fisher discriminant analysis based multi-view acoustic feature reduction for physical load prediction', 'kaya14_interspeech', 'projection pls cca single-view selection lld selector virtue hint uar'], ['How Jing, Ting-Yao Hu, Hung-Shin Lee, Wei-Chen Chen, Chi-Chun Lee, Yu Tsao, Hsin-Min Wang', 'Ensemble of machine learning algorithms for cognitive and physical speaker load detection', 'jing14_interspeech', 'crbm rectified knn blending prediction boltzmann challenge k-nearest dropout paralinguistics'], ['Gábor Gosztolya, Tamás Grósz, Róbert Busa-Fekete, László Tóth', 'Detecting the intensity of cognitive and physical load using AdaBoost and deep rectifier neural networks', 'gosztolya14_interspeech', 'hidden machine sub-challenges learning neuron experimented interspeech former svm besides'], ['Claude Montacié, Marie-José Caraty', 'High-level speech event analysis for cognitive load classification', 'montacie14_interspeech', 'pause assessed overload nonintrusive three-class fatigue narrowing breathing uar egg'], ['Tin Lay Nwe, Trung Hieu Nguyen, Bin Ma', 'On the use of Bhattacharyya based GMM distance and neural net features for identification of cognitive load levels', 'nwe14_interspeech', 'stress speech nonlinear response gmm-svm bootstrapped acoustic particular challenge characteristic'], ['Mark Huckvale', 'Prediction of cognitive load from speech with the VOQAL voice quality toolbox for the interspeech 2014 computational paralinguistics challenge', 'huckvale14_interspeech', 'ucl set system test stroop sub-task subtasks feature development best'], ['Jia Min Karen Kua, Vidhyasaharan Sethu, Phu Le, Eliathamby Ambikairajah', 'The UNSW submission to INTERSPEECH 2014 compare cognitive load challenge', 'kua14_interspeech', 'system estimation maturity utilised evaluated supervector uar best framework outlined'], ['Maarten Van Segbroeck, Ruchir Travadi, Colin Vaz, Jangwon Kim, Matthew P. Black, Alexandros Potamianos, Shrikanth S. Narayanan', 'Classification of cognitive load from speech using an i-vector framework', 'segbroeck14_interspeech', 'factorize variability affords heterogeneity battery manifested adopting experimented inter-speaker level'], ['Nandini Iyer, Eric Thompson, Brian Simpson, Griffin Romigh', 'Revisiting the right-ear advantage for speech: implications for speech displays', 'iyer14_interspeech', 'ear unattended dichotic multichannel presented uncertainty listener target personnel factor'], ['L. ten Bosch, Miriam Ernestus, Lou Boves', 'Comparing reaction time sequences from human participants and computational models', 'bosch14_interspeech', 'speed local participant question correlation fatigue increase observed variation slowly'], ['Valentin Andrei, Horia Cucu, Andi Buzo, Corneliu Burileanu', 'Detecting the number of competing speakers — human selective hearing versus spectrogram distance based estimator', 'andrei14_interspeech', 'establish listener cumulated journalist politician counted detection produced spaced volunteer'], ['Guo Li, Gang Peng', 'The influence of sensory memory and attention on the context effect in talker normalization', 'li14d_interspeech', 'tone block interruption secondary attentional visual identification cue encoded noise'], ['Payton Lin, Fei Chen, Syu Siang Wang, Ying-Hui Lai, Yu Tsao', 'Automatic speech recognition with primarily temporal envelope information', 'lin14_interspeech', 'asr training-test computational vocoded devise human implant screening cochlear displayed'], ['Ying-Hui Lai, Fei Chen, Yu Tsao', 'An adaptive envelope compression strategy for speech processing in cochlear implants', 'lai14_interspeech', 'aec dynamic range hearing-impaired static hearing patient confine narrowed vocoded'], ['Brian S. Helfer, Thomas F. Quatieri, James R. Williamson, Laurel Keyes, Benjamin Evans, W. Nicholas Greene, Trina Vian, Joseph Lacirignola, Trey Shenk, Thomas Talavage, Jeff Palmer, Kristin Heaton', 'Articulatory dynamics and coordination in classifying cognitive change with preclinical mTBI', 'helfer14_interspeech', 'brain roc auc feature curve track formant trauma formant-frequency athlete'], ['Nozomi Jinbo, Shinnosuke Takamichi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura', 'A hearing impairment simulation method using audiogram-based approximation of auditory charatecteristics', 'jinbo14_interspeech', 'person hearing-impaired individual characteristic normal-hearing impaired easily educate person-to-person audiogram'], ['Dongmei Wang, James M. Kates, John H. L. Hansen', 'Investigation of the relative perceptual importance of temporal envelope and temporal fine structure between tonal and non-tonal languages', 'wang14c_interspeech', 'tfs american mandarin chinese english language perception interference sensitive noise'], ['Daniel Fogerty, Fei Chen', 'Vowel spectral contributions to English and Mandarin sentence intelligibility', 'fogerty14_interspeech', 'cue interrupted ensured distribution confined high-pass segmentally higher similar interruption'], ['Vinay Kumar Mittal, B. Yegnanarayana', 'Significance of aperiodicity in the pitch perception of expressive voices', 'mittal14_interspeech', 'impulse subharmonics sequence excitation noh signal zero-frequency saliency laughter voice'], ['Mirjam Wester, María Luisa García Lecumberri, Martin Cooke', 'DIAPIX-FL: a symmetric corpus of problem-solving dialogues in first and second languages', 'wester14_interspeech', 'elongation speaking spanish incomplete language voicing english turn diapixuk diapix'], ['Christophe Coupé, Yoon Mi Oh, François Pellegrino, Egidio Marsico', 'Cross-linguistic investigations of oral and silent reading', 'coupe14_interspeech', 'rate syllable language text serbian arose subject mixed-effects amount different'], ['Juul Coumans, Roeland van Hout, Odette Scharenborg', 'Non-native word recognition in noise: the role of word-initial and word-final information', 'coumans14_interspeech', 'listening harder condition important masked proficiency listener onset become forty-seven'], ['Janice Wing Sze Wong', 'The effects of high and low variability phonetic training on the perception and production of English vowels /e/-/æ/ by Cantonese ESL learners with high and low L2 proficiency levels', 'wong14_interspeech', 'lvpt hvpt subject group learning learned whereas perceptually-based classroom transferred'], ['Pepi Burgos, Mátyás Jani, Catia Cucchiarini, Roeland van Hout, Helmer Strik', 'Dutch vowel production by Spanish learners: duration and spectral features', 'burgos14_interspeech', 'capt detailed pronunciation contrast property study realizing assisted relate vocalic'], ['Angelos Lengeris, Katerina Nicolaidis', 'English consonant confusions by Greek listeners in quiet and noise and the role of phonological short-term memory', 'lengeris14_interspeech', 'pstm identification snr non-word -speaker serial score vcv type twenty'], ['Sylvain Detey, Isabelle Racine, Julien Eychenne, Yuji Kawaguchi', 'Corpus-based L2 phonological data and semi-automatic perceptual analysis: the case of nasal vowels produced by beginner Japanese learners of French', 'detey14_interspeech', 'non-native nasality repetition contemporain wordlist phonetic-phonological scrutiny procedure coarse-grained production'], ['Gábor Pintér, Shinobu Mizuguchi, Koichi Tateishi', 'Perception of prosodic prominence and boundaries by L1 and L2 speakers of English', 'pinter14_interspeech', 'rpt native cole japanese efl learner boundary identical direct stimulus'], ['Rose Thomas Kalathottukaren, Suzanne C. Purdy, Elaine Ballard', 'Prosody perception, reading accuracy, nonliteral language comprehension, and music and tonal pitch discrimination in school aged children', 'kalathottukaren14_interspeech', 'hearing paralanguage normal poorer musical score child subtests twenty-five loss'], ['Polina Drozdova, Roeland van Hout, Odette Scharenborg', 'Phoneme category retuning in a non-native language', 'drozdova14_interspeech', 'native listener lexically-guided ambiguous dutch nonnative retune nonnatives pronunciation encountering'], ['Bo-Chang Chiou, Chia-Ping Chen', 'Speech emotion recognition with cross-lingual databases', 'chiou14_interspeech', 'emo-db normalization emotional accuracy system berlin different equalization best data'], ['Koji Inoue, Yukoh Wakabayashi, Hiromasa Yoshimoto, Tatsuya Kawahara', 'Speaker diarization using eye-gaze information in multi-party conversations', 'inoue14_interspeech', 'method poster stochastically presumed acoustic real turn-taking ambient multi-modal distant'], ['Che-Wei Huang, Bo Xiao, Panayiotis G. Georgiou, Shrikanth S. Narayanan', 'Unsupervised speaker diarization using riemannian manifold clustering', 'huang14_interspeech', 'pdfs gaussian segment single-gaussian lle problem fitness k-nearest impose multivariate'], ['Héctor Delgado, Corinne Fredouille, Javier Serrano', 'Towards a complete binary key system for the speaker diarization task', 'delgado14_interspeech', 'der time-critical time preliminary technique still repere modeling audio partitioning'], ['Houman Ghaemmaghami, David Dean, Sridha Sridharan', 'An iterative speaker re-diarization scheme for improving speaker-based entity extraction in multimedia archives', 'ghaemmaghami14_interspeech', 'diarization recording across corpus technique recurring reuse purity repeating within'], ['Binyam Gebrekidan Gebre, Peter Wittenburg, Sebastian Drude, Marijn Huijbregts, Tom Heskes', 'Speaker diarization using gesture and speech', 'gebre14_interspeech', 'model ami parametric indicates person problem solution speech-only gamma segment'], ['Grégor Dupuy, Sylvain Meignier, Yannick Estève', 'Is incremental cross-show speaker diarization efficient for processing large volumes of data?', 'dupuy14_interspeech', 'collection processed process clustering oct applicative sept etape chronological repere'], ['Pranay Dighe, Marc Ferràs, Hervé Bourlard', 'Detecting and labeling speakers on overlapping speech using vector taylor series', 'dighe14_interspeech', 'diarization speaker modeling multi-class bootstrapping far-field oracle precisely ongoing meeting'], ['Sree Harsha Yella, Petr Motlicek, Hervé Bourlard', 'Phoneme background model for information bottleneck based speaker diarization', 'yella14_interspeech', 'done hypothesizes ignores characteristic pronounce ground-truth arises ami roughly reflected'], ['Marc Ferràs, Stefano Masneri, Oliver Schreer, Hervé Bourlard', 'Diarizing large corpora using multi-modal speaker linking', 'ferras14_interspeech', 'diarization recording involving scale ward inter-session system multiparty uniquely factor'], ['Frederic Bechet, Meriem Bendris, Delphine Charlet, Géraldine Damnati, Benoit Favre, Mickael Rouvier, Remi Auguste, Benjamin Bigot, Richard Dufour, Corinne Fredouille, Georges Linarès, Jean Martinet, Gregory Senay, Pierre Tirilly', 'Multimodal understanding for person recognition in video broadcasts', 'bechet14b_interspeech', 'identification challenge modality speaker scene team face overlay disposition extraction'], ['James Gibson, Maarten Van Segbroeck, Shrikanth S. Narayanan', 'Comparing time-frequency representations for directional derivative features', 'gibson14_interspeech', 'compression spectrogram recognition filter-banks dynamic range extracted log-mel gammatone bin'], ['Jun Du, Qing Wang, Tian Gao, Yong Xu, Li-Rong Dai, Chin-Hui Lee', 'Robust speech recognition with speech enhanced deep neural networks', 'du14_interspeech', 'pre-processing post-processing aurora training-testing enhancement coherently clean-condition pre-processor system showcase'], ['Emmanuel Vincent, Aggelos Gkiokas, Dominik Schnitzer, Arthur Flexer', 'An investigation of likelihood normalization for robust ASR', 'vincent14_interspeech', 'compensation operate s-norm technique feature symmetric noise-robust chime log-likelihood motivation'], ['Constantin Spille, Bernd T. Meyer', 'Identifying the human-machine differences in complex binaural scenes: what can be learned from our auditory system', 'spille14_interspeech', 'hsr srt comparison gap asr monaural man-machine recognition signal response'], ['Jürgen T. Geiger, Zixing Zhang, Felix Weninger, Björn Schuller, Gerhard Rigoll', 'Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling', 'geiger14_interspeech', 'setup phoneme multi-stream prediction state lstm predicting medium-vocabulary network hmm'], ['Shilin Liu, Khe Chai Sim', 'Joint adaptation and adaptive training of TVWR for robust automatic speech recognition', 'liu14c_interspeech', 'dnn gmm dnns unsupervised system unadapted multi-style speaker noise temporally'], ['Hyung-Min Park, Matthew Maciejewski, Chanwoo Kim, Richard M. Stern', 'Robust speech recognition in reverberant environments using subband-based steady-state monaural and binaural suppression', 'park14_interspeech', 'component precedence later onset processing direction place enhancement accuracy monaurally'], ['Rui Zhao, Jinyu Li, Yifan Gong', 'Variable-component deep neural network for robust speech recognition', 'zhao14_interspeech', 'vcdnn werr dnn layer snr hidden bias relative vphmm per'], ['Yu-Chen Kao, Yi-Ting Wang, Berlin Chen', 'Effective modulation spectrum factorization for robust speech recognition', 'kao14_interspeech', 'nmf empirical cluster-specific noise-invariant basis sparser cluster-based robustness distill nonnegative'], ['Suman V. Ravuri', 'Hybrid MLP/structured-SVM tandem systems for large vocabulary and robust ASR', 'ravuri14b_interspeech', 'nd-order mlps layer large-vocabulary noisy corpus temporal model sequence icsi'], ['Chanwoo Kim, Kean K. Chin, Michiel Bacchiani, Richard M. Stern', 'Robust speech recognition using temporal masking and thresholding algorithm', 'kim14d_interspeech', 'ssf tmt dereverberation reverberant characterize slowly-varying precedence highpass called environment'], ['Xurong Xie, Rongfeng Su, Xunying Liu, Lan Wang', 'Deep neural network bottleneck features for generalized variable parameter HMMs', 'xie14_interspeech', 'tandem acoustic gvp-hmms gvp-hmm hmm multi-style learnt inherently augment modelled'], ['Suliang Bu, Yanmin Qian, Kai Yu', 'A novel dynamic parameters calculation approach for model compensation', 'bu14_interspeech', 'approximation mismatch first-order vt taylor pseudo dct formula function calculating'], ['Naoaki Hashimoto, Shoichi Nakano, Kazumasa Yamamoto, Seiichi Nakagawa', 'Speech recognition based on Itakura-Saito divergence and dynamics/sparseness constraints from mixed sound of speech and music by non-negative matrix factorization', 'hashimoto14_interspeech', 'remove nmf kullback-leibler matched composed word isolated weight instead cost'], ['Yong-Joo Chung', 'Noise robust speech recognition based on noise-adapted HMMs using speech feature compensation', 'chung14_interspeech', 'noisy mtr clean method additive adapted vts-based multi-model vt multi-condition'], ["M. J. Alam, Patrick Kenny, Pierre Dumouchel, Douglas O'Shaughnessy", 'Noise spectrum estimation using Gaussian mixture model-based speech presence probability for robust speech recognition', 'alam14_interspeech', 'rmfcc etsi-afe pncc cepstral subband coefficient posteriori snr mfcc spp'], ['X. Chen, Y. Wang, X. Liu, Mark J. F. Gales, Philip C. Woodland', 'Efficient GPU-based training of recurrent neural network language models using spliced sentence bunch', 'chen14b_interspeech', 'rnnlms unclustered gpus class-based quantity output trained full cpu-based layer'], ['David Nolden, Ralf Schlüter, Hermann Ney', 'Word pair approximation for more efficient decoding with high-order language models', 'nolden14_interspeech', 'lattice recombination lower-order rescoring search recombined unigram expand pipeline efficiency'], ['Heike Adel, Katrin Kirchhoff, Ngoc Thang Vu, Dominic Telaar, Tanja Schultz', 'Comparing approaches to convert recurrent neural networks into backoff language models for efficient decoding', 'adel14_interspeech', 'conversion seame rnnlms perplexity mixed computational unsuitable code-switching text relative'], ['David Nolden, Hagen Soltau, Daniel Povey, Pegah Ghahremani, Lidia Mangu, Hermann Ney', 'Removing redundancy from lattices', 'nolden14b_interspeech', 'filtering algorithm transducer-based size degrading oracle site lvcsr coverage remove'], ['Martin Sundermeyer, Zoltán Tüske, Ralf Schlüter, Hermann Ney', 'Lattice decoding and rescoring with long-Span neural network language models', 'sundermeyer14_interspeech', 'lstms search assamese lattice-based babel -best relative refined pruning keyword'], ['Michael Levit, Sarangarajan Parthasarathy, Shuangyu Chang, Andreas Stolcke, Benoît Dumoulin', 'Word-phrase-entity language models: getting more mileage out of n-grams', 'levit14_interspeech', 'n-gram token word-level word-only re-estimates departs calendar ascribed class cold'], ['Sourjya Sarkar, K. Sreenivasa Rao', 'A novel boosting algorithm for improved i-vector based speaker verification in noisy environments', 'sarkar14_interspeech', 'svm utterance significance degraded adaptive disproportionate misclassifications classifier factory noisex-'], ['W. M. Campbell', 'Using deep belief networks for vector-based speaker recognition', 'campbell14_interspeech', 'dbns method dbn modeling ubm qualitative replace sre work start'], ['Yun Lei, Luciana Ferrer, Mitchell McLaren, Nicolas Scheffer', 'A deep neural network speaker verification system targeting microphone speech', 'lei14_interspeech', 'dnn gmm telephone condition sre i-vector framework nist data training'], ['Mitchell McLaren, Yun Lei, Nicolas Scheffer, Luciana Ferrer', 'Application of convolutional neural networks to speaker recognition in noisy conditions', 'mclaren14_interspeech', 'cnn i-vector ubm front end traditional sid miss opposed heavily'], ['Jason Pelecanos, Weizhong Zhu, Sibel Yaman', 'SVM based speaker recognition: harnessing trials with multiple enrollment sessions', 'pelecanos14_interspeech', 'plda kernel pruning multi-session work order exploit examine stacking score'], ['Laura Fernández Gallardo, Michael Wagner, Sebastian Möller', 'I-vector speaker verification based on phonetic information under transmission channel effects', 'gallardo14_interspeech', 'narrowband wideband fricative bandwidth speech clean principally different aforementioned gaining'], ['Xiao Zang, Zhiyong Wu, Helen Meng, Jia Jia, Lianhong Cai', 'Using conditional random fields to predict focus word pair in spontaneous spoken English', 'zang14_interspeech', 'crf prediction recall predictor svm syntactic semantic automatically neglect crfs'], ['Richard Sproat, Keith Hall', 'Applications of maximum entropy rankers to problems in spoken language processing', 'sproat14_interspeech', 'stress notoriously nato alphabetic entropy-based report usa non-standard second morphology'], ['Xavi Gonzalvo, Monika Podsiadło', 'Text-to-speech with cross-lingual neural network-based grapheme-to-phoneme models', 'gonzalvo14_interspeech', 'foreign voice tt phonotactic sparsity monolingual inventory grammar lexicon large'], ['Daiki Nagahama, Takashi Nose, Tomoki Koriyama, Takao Kobayashi', 'Transform mapping using shared decision tree context clustering for HMM-based cross-lingual speech synthesis', 'nagahama14_interspeech', 'stc language-independent average keeping voice speaker mismatch technique adaptation language'], ['B. Ramani, M. P. Actlin Jeeva, P. Vijayalakshmi, T. Nagarajan', 'Cross-lingual voice conversion-based polyglot speech synthesizer for indian languages', 'ramani14_interspeech', 'speaker target corpus listening using phoneme synthesizes proficient abx switching'], ['Qiong Hu, Yannis Stylianou, Ranniery Maia, Korin Richmond, Junichi Yamagishi, Javier Latorre', 'An investigation of the application of dynamic sinusoidal models to statistical parametric speech synthesis', 'hu14b_interspeech', 'sinusoid hts regularised cepstral selected mel-cepstra utilise synchronous reconstruct preferred'], ['Hemant A. Patil, Tanvina B. Patel', 'Chaotic mixed excitation source for speech synthesis', 'patil14_interspeech', 'region unvoiced voiced lp-based synthesized dmos relatively chaos analysis mcd'], ['Alexander Sorin, Slava Shechtman, Vincent Pollet', 'Refined inter-segment joining in multi-form speech synthesis', 'sorin14_interspeech', 'segment smoothing parametric waveform pitch statistically joint frame splicing generated'], ['Ran Zhang, Zhengqi Wen, Jianhua Tao, Ya Li, Bing Liu, Xiaoyan Lou', 'A hierarchical viterbi algorithm for Mandarin hybrid speech synthesis system', 'zhang14c_interspeech', 'cv region voiced naturalness tolerated synthetic traditional connecting use round'], ['Diandra Fabre, Thomas Hueber, Pierre Badin', 'Automatic animation of an articulatory tongue model from ultrasound images using Gaussian mixture regression', 'fabre14_interspeech', 'speaker converted phonetic decoding movement biofeedback visual pca-based animating reference'], ['Patrick Lumban Tobing, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura, Ayu Purwarianti', 'Articulatory controllable speech modification based on statistical feature mapping with Gaussian mixture models', 'tobing14_interspeech', 'inter-dimensional manipulating movement inversion manipulation phonemic capable toda parameter tokuda'], ['Chuang Ding, Pengcheng Zhu, Lei Xie, Dongmei Jiang, Zhong-Hua Fu', 'Speech-driven head motion synthesis using neural networks', 'ding14_interspeech', 'network generatively cca hmm fbank avatar mel-scale filter-bank initialized discover'], ['Peng Song, Yun Jin, Wenming Zheng, Li Zhao', 'Text-independent voice conversion using speaker model alignment method from non-parallel speech', 'song14_interspeech', 'gmm transformation sma inca local spectral source aligning target function'], ['Ling-Hui Chen, Zhen-Hua Ling, Li-Rong Dai', 'Voice conversion using generative trained deep neural networks with multiple frame spectral envelopes', 'chen14c_interspeech', 'dnn bam layer-by-layer generatively rbms associative boltzmann cascade superiority divergence'], ['Gerard Sanchez, Hanna Silen, Jani Nurminen, Moncef Gabbouj', 'Hierarchical modeling of F0 contours for voice conversion', 'sanchez14_interspeech', 'prosody wavelet scale microprosody cross-gender temporal speaker spectral pitch shifting'], ['Kento Kadowaki, Tatsuma Ishihara, Nobukatsu Hojo, Hirokazu Kameoka', 'Speech prosody generation for text-to-speech synthesis based on generative model of F<SUB>0</SUB> contours', 'kadowaki14_interspeech', 'fujisaki-model statistical generating parameter discrete-time tree-based text fujisaki associate input'], ['Xiayu Chen, Yang Zhang, Mark Hasegawa-Johnson', 'An iterative approach to decision tree training for context dependent speech synthesis', 'chen14d_interspeech', 'algorithm construction edhmm safeguard probability procedure generalizable boston ensures assuming'], ["Thi Thu Trang Nguyen, Albert Rilliard, Do Dat Tran, Christophe d'Alessandro", 'Prosodic phrasing modeling for vietnamese TTS using syntactic information', 'nguyen14_interspeech', 'break clause rule lengthening file syllable element pause vted level'], ['Tomoki Koriyama, Hiroshi Suzuki, Takashi Nose, Takahiro Shinozaki, Takao Kobayashi', 'Accent type and phrase boundary estimation using acoustic and language models for automatic prosodic labeling', 'koriyama14_interspeech', 'speech synthesis polygonal technique hmm contour use crf corpus-based time-consuming'], ['Qiang Fang, Jianguo Wei, Fang Hu', 'Reconstruction of mistracked articulatory trajectories', 'fang14_interspeech', 'ema coil kinematic additional speech articulograph data attached instrument inversion'], ['Langzhou Chen, Norbert Braunschweiler', 'Enabling controllability for continuous expression space', 'chen14e_interspeech', 'emotion expressive discrete label individual infinite information synthesising thus cat'], ['Takashi Nose, Akinori Ito', 'Analysis of spectral enhancement using global variance in HMM-based speech synthesis', 'nose14_interspeech', 'parameter generation fixed correlation generated objective affine undesirable conventional clarity'], ['Cassia Valentini-Botinhao, Markus Toman, Michael Pucher, Dietmar Schabus, Junichi Yamagishi', 'Intelligibility analysis of fast synthesized speech', 'valentinibotinhao14_interspeech', 'compression rate normal voice linear speaking intelligible hsmm-based german applying'], ['Susana Palmaz López-Peláez, Robert A. J. Clark', 'Speech synthesis reactive to dynamic noise environmental conditions', 'lopezpelaez14_interspeech', 'style changing modulate suit maintains synthesiser changed generating adaptive normal'], ['Timo Baumann', 'Partial representations improve the prosody of incremental speech synthesis', 'baumann14_interspeech', 'speak symbolic information utterance missing full yet unfolds lower-level find'], ['Pirros Tsiakoulis, Catherine Breslin, M. Gašić, Matthew Henderson, Dongho Kim, Steve Young', 'Dialogue context sensitive speech synthesis using factorized decision trees', 'tsiakoulis14_interspeech', 'domain voice appointment booking context-sensitive restaurant significant coverage preference fd-cat'], ['Xin Wang, Zhen-Hua Ling, Li-Rong Dai', 'Concept-to-speech generation by integrating syntagmatic features into HMM-based speech synthesis', 'wang14d_interspeech', 'prosodic ct kpml prediction modelling method linguistic text annotation performs'], ['Dhananjaya Gowda, Heikki Kallasjoki, Reima Karhila, Cristian Contan, Kalle Palomäki, Mircea Giurgiu, Mikko Kurimo', 'On the role of missing data imputation and NMF feature enhancement in building synthetic voices using reverberant speech', 'gowda14_interspeech', 'voice dependent clean technique cleaning noisy speaker non-negative factorization hmm-based'], ["C. -T. Do, M. Evrard, A. Leman, Christophe d'Alessandro, Albert Rilliard, J. -L. Crebouw", 'Objective evaluation of HMM-based speech synthesis system using kullback-leibler divergence', 'do14b_interspeech', 'kld synthetic likelihood difference msd multi-space discontinuous essence distribution model'], ['Javier Latorre, Kayoko Yanagisawa, Vincent Wan, BalaKrishna Kolluru, Mark J. F. Gales', 'Speech intonation for TTS: study on evaluation methodology', 'latorre14_interspeech', 'reference subject preference pattern sentence asked variance modification appropriate test'], ['Yajie Miao, Florian Metze', 'Improving language-universal feature extraction with deep maxout and convolutional neural networks', 'miao14_interspeech', 'lufes extractor dnns nice nonlinearity sigmoid asr acting dnn-based cross-language'], ['Raul Fernandez, Jia Cui, Andrew Rosenberg, Bhuvana Ramabhadran, Xiaodong Cui', 'Exploiting vocal-source features to improve ASR accuracy for low-resource languages', 'fernandez14_interspeech', 'vocal-tract excitation front-end impart configured simplification babel augmenting system speech'], ['Anton Ragni, Kate M. Knill, Shakti P. Rath, Mark J. F. Gales', 'Data augmentation for low resource languages', 'ragni14_interspeech', 'babel scheme semi-supervised perturbation down-stream consistent gain examined zulu assamese'], ['Denis Jouvet, Dominique Fohr', 'About combining forward and backward-based decoders for selecting data for unsupervised training of acoustic models', 'jouvet14_interspeech', 'transcribed julius automatically sphinx manually selection processed speech adaptation hypothesis'], ['František Grézl, Martin Karafiát', 'Combination of multilingual and semi-supervised training for under-resourced languages', 'grezl14_interspeech', 'cmllr network data neural initialization unlabeled asr random better bottle-neck'], ['Ngoc Thang Vu, Jochen Weiner, Tanja Schultz', 'Investigating the learning effect of multilingual bottle-neck features for ASR', 'vu14b_interspeech', 'characterize neural network normalize neighbor tandem level nasal preprocessing fricative'], ['Yajie Miao, Hao Zhang, Florian Metze', 'Distributed learning of multilingual DNN feature extractors using GPUs', 'miao14b_interspeech', 'distmodel gpu card tolerating distribute enlarged sgd deep nice infrequent'], ['Shakti P. Rath, Kate M. Knill, Anton Ragni, Mark J. F. Gales', 'Combining tandem and hybrid systems for improved speech recognition and keyword spotting on low resource languages', 'rath14_interspeech', 'kw asr babel approximately configuration hour gain project haitian lao'], ['Jia Cui, Bhuvana Ramabhadran, Xiaodong Cui, Andrew Rosenberg, Brian Kingsbury, Abhinav Sethy', 'Recent improvements in neural network acoustic modeling for LVCSR in low resource languages', 'cui14_interspeech', 'babel program period feature tonal fundamental-frequency ffv un-transcribed explore deep'], ['Yan Huang, Malcolm Slaney, Michael L. Seltzer, Yifan Gong', 'Towards better performance with heterogeneous training data in acoustic modeling using deep neural networks', 'huang14b_interspeech', 'multi-condition confusability dnn classifier model learning problem formalization fundamental blindly'], ['Takuya Higuchi, Hirofumi Takeda, Tomohiko Nakamura, Hirokazu Kameoka', 'A unified approach for underdetermined blind signal separation and source activity detection by multichannel factorial hidden Markov models', 'higuchi14_interspeech', 'nmf inactive density bs power non-negative non-stationary active spectral mode'], ['Colin Vaz, Dimitrios Dimitriadis, Shrikanth S. Narayanan', 'Enhancing audio source separability using spectro-temporal regularization with NMF', 'vaz14_interspeech', 'spectral separation sar sir tested sdr optimally recovered modest pesq'], ['Sayeh Mirzaei, Hugo Van\xa0hamme, Yaser Norouzi', 'Blind speech source localization, counting and separation for 2-channel convolutive mixtures in a reverberant environment', 'mirzaei14_interspeech', 'time-frequency gcc-phat diffuse anechoic predominant angular stereo angle bin arrival'], ['Felix Weninger, Jonathan Le Roux, John R. Hershey, Shinji Watanabe', 'Discriminative NMF and its application to single-channel source separation', 'weninger14_interspeech', 'reconstruction objective function basis optimization source-to-distortion criterion mixture exemplar-based multiplicative'], ['Hideki Kawahara, Tatsuya Kitamura, Hironori Takemoto, Ryuichi Nisimura, Toshio Irino', 'Vocal tract length estimation based on vowels using a database consisting of 385 speakers and a database with MRI-based vocal tract shape information', 'kawahara14_interspeech', 'vtl independent text interference-free method calibrate matlab biasing vowel physically'], ['Haipeng Wang, Tan Lee, Cheung-Chi Leung, Bin Ma, Haizhou Li', 'A graph-based Gaussian component clustering approach to unsupervised acoustic modeling', 'wang14e_interspeech', 'untranscribed sub-word cluster phoneme-like unit several induced initially timit different'], ['Ali Ziaei, Abhijeet Sangwan, John H. L. Hansen', 'A speech system for estimating daily word counts', 'ziaei14_interspeech', 'prof-life-log wang detection count estimation syllable audio wce lang device'], ['Xugang Lu, Yu Tsao, Shigeki Matsuda, Chiori Hori', 'Ensemble modeling of denoising autoencoder for speech spectrum restoration', 'lu14_interspeech', 'dae transform ddae local function learned noisy daes clean learn'], ['Zoltán Tüske, Pavel Golik, Ralf Schlüter, Hermann Ney', 'Acoustic modeling with deep neural networks using raw time signal for LVCSR', 'tuske14_interspeech', 'dnn feature mfcc layer extraction band multi-resolutional weight filter step'], ['Vikramjit Mitra, Wen Wang, Horacio Franco, Yun Lei, Chris Bartels, Martin Graciarena', 'Evaluating robust features on deep neural networks for speech recognition in noisy and channel mismatched conditions', 'mitra14_interspeech', 'mel-filterbank dnns dnn gmm energy study noise-robust noise performance cnns'], ['Tara N. Sainath, Vijayaditya Peddinti, Brian Kingsbury, Petr Fousek, Bhuvana Ramabhadran, David Nahamoo', 'Deep scattering spectra with deep neural networks for LVCSR tasks', 'sainath14_interspeech', 'ds log-mel feature multi-resolution higher-resolution speaker-adaptation technique explore similar cnns'], ['Shuo-Yiin Chang, Nelson Morgan', 'Robust CNN-based speech recognition with Gabor filter kernels', 'chang14_interspeech', 'feature neural network convolutional wsj architecture learned gcnn etsi-afe layer'], ['Liang Lu, Steve Renals', 'Probabilistic linear discriminant analysis with bottleneck features for speech recognition', 'lu14b_interspeech', 'plda-based plda gmms acoustic enjoys intra-frame deep model higher neural'], ['Thomas Schatz, Vijayaditya Peddinti, Xuan-Nga Cao, Francis Bach, Hynek Hermansky, Emmanuel Dupoux', 'Evaluating speech features with the minimal-pair ABX task (II): resistance to noise', 'schatz14_interspeech', 'axis time-domain smoothing short-term robustness adaptation compressive fdlp non-linearity zero-resource'], ['Jürgen T. Geiger, Jort F. Gemmeke, Björn Schuller, Gerhard Rigoll', 'Investigating NMF speech enhancement for neural network based acoustic models', 'geiger14b_interspeech', 'question gmms dnns robustness furthermore beat lstms multi-condition noise-robust non-negative'], ['Jason Lilley, James Mahshie, H. Timothy Bunnell', 'Automatic speech feature classification for children with cochlear implants', 'lilley14_interspeech', 'pediatric recipient perception administered stimulus normally agreement clinical scoring hearing'], ['Yuuki Tachioka, Shinji Watanabe, Jonathan Le Roux, John R. Hershey', 'Sequential maximum mutual information linear discriminant analysis for speech recognition', 'tachioka14_interspeech', 'lda discriminative transformation simple variance between-class criterion technique closed-form within-class'], ['Shabnam Ghaffarzadegan, Hynek Bořil, John H. L. Hansen', 'Model and feature based compensation for whispered speech recognition', 'ghaffarzadegan14_interspeech', 'whisper neutral vt towards background recognizer scenario sample strategy rough'], ['Amir R. Moghimi, Bhiksha Raj, Richard M. Stern', 'Post-masking: a hybrid approach to array processing for speech recognition', 'moghimi14_interspeech', 'masking beamforming t-f larger time-frequency benefit unsuccessful technique accepting rejecting'], ['F. de-la-Calle-Silos, F. J. Valverde-Albacete, A. Gallardo-Antolín, C. Peláez-Moreno', 'ASR feature extraction with morphologically-filtered power-normalized cochleograms', 'delacallesilos14_interspeech', 'spectro-temporal masking representation element contribution pncc cochlea resembles structuring morphology'], ['Angel Mario Castro Martinez, Niko Moritz, Bernd T. Meyer', 'Should deep neural nets have ears? the role of auditory features in deep learning approaches', 'martinez14_interspeech', 'bank asr processing mfcc dnn self-learned task-relevant filter fbank gabor'], ['Charles Fox, Thomas Hain', 'Extending Limabeam with discrimination and coarse gradients', 'fox14_interspeech', 'rel radial local ami gradient extension minimum adapted coarser stick'], ['Sankar Mukherjee, Shyamal Kumar Das Mandal', 'Generation of F0 contour using deep boltzmann machine and twin Gaussian process hybrid model for bengali language', 'mukherjee14_interspeech', 'dbm synthesizing text output input readout plugged delta-delta prediction linguistic'], ['Juan A. Morales-Cordovilla, Hannes Pessentheiner, Martin Hagmüller, Gernot Kubin', 'Room localization for distant speech recognition', 'moralescordovilla14_interspeech', 'microphone multi-room energy house synergy installed different cross-correlation calibration creates'], ['Sara Bahaadini, Afsaneh Asaei, David Imseng, Hervé Bourlard', 'Posterior-based sparse representation for automatic speech recognition', 'bahaadini14_interspeech', 'posterior exemplar-based high-dimensional feature space sub-phone hiwire phonebook property context'], ['Marija Tabain, Andrew Butcher, Gavan Breen, Richard Beare', 'Lateral formants in three central australian languages', 'tabain14_interspeech', 'retroflex palatal dental alveolar lower higher australia light slightly examines'], ['Alina Khasanova, Jennifer Cole, Mark Hasegawa-Johnson', 'Detecting articulatory compensation in acoustic data through linear regression modeling', 'khasanova14_interspeech', 'cue stop pattern co-variation variation compensatory indirectly spontaneously relates observing'], ['Jinxi Guo, Angli Liu, Harish Arsikere, Abeer Alwan, Steven M. Lulich', 'The relationship between the second subglottal resonance and vowel class, standing height, trunk length, and F0 variation for Mandarin speakers', 'guo14_interspeech', 'sitting t-tests statistically-significant well categorically space indicate english hungarian divide'], ['Nisha Meenakshi, Chiranjeevi Yarra, B. K. Yamini, Prasanta Kumar Ghosh', 'Comparison of speech quality with and without sensors in electromagnetic articulograph AG 501 recording', 'meenakshi14_interspeech', 'subject attached articulator stimulus glued subjective recorded objective speaks dissimilarity'], ['Luciana Albuquerque, Catarina Oliveira, António Teixeira, Pedro Sa-Couto, João Freitas, Miguel Sales Dias', 'Impact of age in the production of European Portuguese vowels', 'albuquerque14_interspeech', 'ageing elderly analyse change mldc decrease formant female centralization acoustic'], ['Chengzhu Yu, John H. L. Hansen, Douglas W. Oard', "`houston, we have a solution': a case study of the analysis of astronaut speech during NASA apollo 11 for long-term speaker modeling", 'yu14_interspeech', 'mission space technology future non-neutral language man automating replicate voice'], ['Yi Luan, Richard Wright, Mari Ostendorf, Gina-Anne Levow', 'Relating automatic vowel space estimates to talker intelligibility', 'luan14_interspeech', 'including talker-dependent hull hand-corrected correlation underlie convex impractical score projected'], ['Hideki Kawahara, Masanori Morise, Tomoki Toda, Hideki Banno, Ryuichi Nisimura, Toshio Irino', 'Excitation source analysis for high-quality speech manipulation systems based on an interference-free representation of group delay with minimum phase response compensation', 'kawahara14b_interspeech', 'delay-based tandem-straight unification instantaneous weak introduction vocoder solve extension modification'], ['Christian Fischer Pedersen, Tom Bäckström', 'Sparse time-frequency representation of speech by the vandermonde transform', 'pedersen14_interspeech', 'karhunen-loève signal efficient fourier decorrelates atom pursuit gabor decorrelation uncorrelated'], ['Mahesh Kumar Nandwana, John H. L. Hansen', 'Analysis and identification of human scream: implications for speaker recognition', 'nandwana14_interspeech', 'discriminating system classify neutral distinguish reveal situation reliable real-time implementation'], ['Dongmei Wang, Philipos C. Loizou, John H. L. Hansen', 'F0 estimation in noisy speech based on long-term harmonic feature analysis combined with neural network classification', 'wang14f_interspeech', 'ann candidate long spectrum pitch noise term fluctuating output inconsistent'], ['Malcolm Slaney, Michael L. Seltzer', 'The influence of pitch and noise on the discriminability of filterbank features', 'slaney14_interspeech', 'filter filterbanks fisher triangular gammatone shape parameter help explore speech'], ['David Harwath, Alexander Gruenstein, Ian McGraw', 'Choosing useful word alternates for automatic speech recognition correction interfaces', 'harwath14_interspeech', 'user reranking unavoidable alternate logistic increasingly display mobile computing greatly'], ['X. Chen, Mark J. F. Gales, Kate M. Knill, Catherine Breslin, Langzhou Chen, K. K. Chin, Vincent Wan', 'An initial investigation of long-term adaptation for meeting transcription', 'chen14f_interspeech', 'period impact model neural tandem network-based deployment yielding date technical'], ['Tim Ng, Roger Hsiao, Le Zhang, Damianos Karakos, Sri Harish Mallidi, Martin Karafiát, Karel Veselý, Igor Szőke, Bing Zhang, Long Nguyen, Richard Schwartz', 'Progress in the BBN keyword search system for the DARPA RATS program', 'ng14_interspeech', 'false rate levantine farsi score stacked target perceptrons speech-to-text kw'], ['Jan Nouza, Petr Cerva, Jindrich Zdansky, Karel Blavka, Marek Bohac, Jan Silovsky, Josef Chaloupka, Michaela Kucharova, Ladislav Seps, Jiri Malek, Michal Rott', 'Speech-to-text technology to transcribe and disclose 100,000+ hours of bilingual documents from historical Czech and Czechoslovak radio archive', 'nouza14_interspeech', 'detector platform public etc wer search searchable -year slovak tape'], ['Emre Yılmaz, Joris Pelemans, Hugo Van\xa0hamme', "Automatic assessment of children's reading with the FLaVoR decoding using a phone confusion model", 'ylmaz14_interspeech', 'miscue skill art lattice child two-layered detection task-dependent task-independent appealing'], ['M. Ali Basha Shaik, Zoltán Tüske, M. Ali Tahir, Markus Nußbaum-Thom, Ralf Schlüter, Hermann Ney', 'RWTH LVCSR systems for quaero and EU-bridge: German, Polish, Spanish and Portuguese', 'shaik14_interspeech', 'aachen morphemic adaptation vocabulary podcasts aforementioned domain developed mentioned lecture'], ['Matthias Zöhrer, Franz Pernkopf', 'Single channel source separation with general stochastic networks', 'zohrer14_interspeech', 'gsns sc deep architecture network ill-posed unmatched dbn chime noise'], ['Yu Ting Yeung, Tan Lee, Cheung-Chi Leung', 'Large-margin conditional random fields for single-microphone speech separation', 'yeung14_interspeech', 'formulation crf state source acoustic factorial estimation without performance better'], ['Ingrid Jafari, Roberto Togneri, Sven Nordholm', 'On the use of the Watson mixture model for clustering-based under-determined blind source separation', 'jafari14_interspeech', 'wmm clustering time-frequency touched bin-wise c-means full-band permutation fuzzy superiority'], ['Chung-Chien Hsu, Jen-Tzung Chien, Tai-Shih Chi', 'Binary mask estimation based on frequency modulations', 'hsu14_interspeech', 'modulation hit-fa estimate multi-resolution segregation spectro-temporal analytical algorithm speech beginning'], ['Po-Kai Yang, Chung-Chien Hsu, Jen-Tzung Chien', 'Bayesian factorization and selection for speech and music separation', 'yang14_interspeech', 'nmf variational basis matrix exponential adaptive parameter prior poisson signal-to-distortion'], ['Michael Wohlmayr, Ludwig Mohr, Franz Pernkopf', 'Self-adaption in single-channel source separation', 'wohlmayr14_interspeech', 'adaption sc reverberated model changed utterance mixture speaker pre-trained adapt'], ['Michel Vacher, Benjamin Lecouteux, François Portet', 'Multichannel automatic recognition of voice command in a multi-room smart home: an experiment involving seniors and users with visual impairment', 'vacher14_interspeech', 'condition autonomy assisting elderly off-line daily visually distant on-line impaired'], ['Oliver Walter, Vladimir Despotovic, Reinhold Haeb-Umbach, Jort F. Gemmeke, Bart Ons, Hugo Van\xa0hamme', 'An evaluation of unsupervised acoustic model training for a dysarthric speech interface', 'walter14_interspeech', 'posteriorgrams recognition aud gaussian phone-like posteriorgram utter unit automation delivered'], ['Jose A. Gonzalez, Lam A. Cheah, Jie Bai, Stephen R. Ell, James M. Gilbert, Roger K. Moore, Phil D. Green', 'Analysis of phonetic similarity in a silent speech interface based on permanent magnetic articulography', 'gonzalez14_interspeech', 'pma articulation voicing ssi place manner connected-digits consonant production isolated-word'], ['Alexey Karpov, Lale Akarun, Hülya Yalçın, Alexander Ronzhin, Barış Evrim Demiröz, Aysun Çoban, Miloš Železný', 'Audio-visual signal processing in a multimodal assisted living environment', 'karpov14_interspeech', 'tracking audio event video smart monitoring non-speech object user enterface'], ['Mirco Ravanelli, Maurizio Omologo', 'On the selection of the impulse responses for distant-speech recognition based on contaminated speech training', 'ravanelli14_interspeech', 'critical phone-loop microphone apartment purpose set position unobtrusive contamination interaction'], ['I. Casanueva, H. Christensen, Thomas Hain, Phil D. Green', 'Adaptive speech recognition and dialogue management for users with speech disorders', 'casanueva14_interspeech', 'severe adaptation data speaker non-probabilistic known asrs control amount asr'], ['Bea Yu, Thomas F. Quatieri, James R. Williamson, James C. Mundt', 'Prediction of cognitive performance in an animal fluency task based on rate and articulatory markers', 'yu14b_interspeech', 'formant elderly speaking coordination score measure assessment predict disrupt least-squares'], ['Carlos Ishi, Hiroaki Hatano, Norihiro Hagita', 'Analysis of laughter events in real science classes by using multiple environment sensor data', 'ishi14_interspeech', 'sound spatial-temporal communication installed appearing conveys appropriateness classroom developed function'], ['Tara N. Sainath, I-hsin Chung, Bhuvana Ramabhadran, Michael Picheny, John Gunnels, Brian Kingsbury, George Saon, Vernon Austel, Upendra Chaudhari', 'Parallel deep neural network training for LVCSR tasks using blue gene/Q', 'sainath14b_interspeech', 'dnns sgd gpu processor slow dnn serially communication explore tremendous'], ['Samy Bengio, Georg Heigold', 'Word embeddings for speech recognition', 'bengio14_interspeech', 'state relax alike nearby projected decompose euclidean sub-word number rescoring'], ['Frank Seide, Hao Fu, Jasha Droppo, Gang Li, Dong Yu', '1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs', 'seide14_interspeech', 'sgd gpus per server quantization minibatches buffering aggressively accuracy speed-ups'], ['Ryu Takeda, Naoyuki Kanda, Nobuo Nukaga', 'Boundary contraction training for acoustic models based on discrete deep neural networks', 'takeda14_interspeech', 'dnns quantizing parameter continuous degrades bit simply trained accuracy shrink'], ['Yotaro Kubo, Jun Suzuki, Takaaki Hori, Atsushi Nakamura', 'Restructuring output layers of deep neural networks using minimum risk parameter clustering', 'kubo14_interspeech', 'discriminative hmms topology optimization emission dnn-based optimizing state density dnn'], ['William Chan, Ian Lane', 'Distributed asynchronous optimization of convolutional neural networks', 'chan14_interspeech', 'gradient gpus across deep training model minibatches asynchronously momentum excessively'], ['László Tóth', 'Convolutional deep maxout networks for phone recognition', 'toth14_interspeech', 'neuron activation neural pooling operation network rectified till recently sigmoid'], ['Dongpeng Chen, Brian Mak, Sunil Sivadas', 'Joint sequence training of phone and grapheme acoustic model based on multi-task learning deep neural networks', 'chen14g_interspeech', 'mtl-dnns mtl frame-wise dnns recognition stl sequence-discriminative singly single-task phoneme'], ['Roger Hsiao, Tim Ng, Le Zhang, Shivesh Ranjan, Stavros Tsakalidis, Long Nguyen, Richard Schwartz', 'Improving semi-supervised deep neural network for keyword search in low resource languages', 'hsiao14_interspeech', 'dnn training iarpa babel spotting mlp cross entropy work confidence'], ['Chao Liu, Zhiyong Zhang, Dong Wang', 'Pruning deep neural networks by optimal brain damage', 'liu14d_interspeech', 'obd unimportant pruned dnns connection dnn network magnitude-based highly parsimonious'], ["Anderson R. Avila, Milton Sarria-Paja, Francisco J. Fraga, Douglas O'Shaughnessy, Tiago H. Falk", 'Improving the performance of far-field speaker verification using multi-condition training: the case of GMM-UBM and i-vector systems', 'avila14_interspeech', 'reverberation asv detrimental effect room model clean resource-constrained mild lastly'], ['Hung-Shin Lee, Yu Tsao, Hsin-Min Wang, Shyh-Kang Jeng', 'Clustering-based i-vector formulation for speaker recognition', 'lee14_interspeech', 'umfa subspace analyzer ubm i-vectors conventional cluster utterance called reformulate'], ['Harish Arsikere, Hitesh Anand Gupta, Abeer Alwan', 'Speaker recognition via fusion of subglottal features and MFCCs', 'arsikere14_interspeech', 'sgccs sid mmse speaker-specificity mfcc-only accelerometer identification error acoustic respectively'], ['Hanwu Sun, Bin Ma', 'The NIST SRE summed channel speaker recognition system', 'sun14_interspeech', 'summed-channel excerpt trial purification common segregate multiple algorithm adopted eer'], ['Laura Fernández Gallardo, Michael Wagner, Sebastian Möller', 'Advantages of wideband over narrowband channels for speaker verification employing MFCCs and LFCCs', 'gallardo14b_interspeech', 'khz bandwidth extended filterbanks speech mel-scaled codecs linearly transmitted permit'], ['Ming Li, Wenbo Liu', 'Speaker verification and spoken language identification using a generalized i-vector framework with phonetic tokenizations and tandem features', 'li14e_interspeech', 'zero-order statistic token first-order posterior mfcc extended gmm nist -grams'], ['T. Asha, M. S. Saranya, D. S. Karthik Pandia, Srikanth Madikeri, Hema A. Murthy', 'Feature Switching in the i-vector framework for speaker verification', 'asha14_interspeech', 'fusion type late early paradigm ubm-gmm tv discriminates conventionally normalisation'], ['Jinghua Zhong, Weiwu Jiang, Wei Rao, Man-Wai Mak, Helen Meng', 'PLDA modeling in the fishervoice subspace for speaker verification', 'zhong14_interspeech', 'suppressed discriminant channel variability performing stage analysis cd nonparametric denoted'], ['Alvin F. Martin, Craig S. Greenberg, Vincent M. Stanford, John M. Howard, George R. Doddington, John J. Godfrey', 'Performance factor analysis for the 2012 NIST speaker recognition evaluation', 'martin14_interspeech', 'noise crowd segment added type examine channel test synthetic observed'], ['Hiroshi Fujimura', 'Simultaneous gender classification and voice activity detection using deep neural networks', 'fujimura14_interspeech', 'vad dnn dnns posterior female male executing probability classifier classifies'], ['Ahmed Hussen Abdelaziz, Dorothea Kolossa', 'Dynamic stream weight estimation in coupled-HMM-based audio-visual speech recognition using multilayer perceptrons', 'abdelaziz14_interspeech', 'perceptron reliability video mlp-based signal-based feature audio measure maximization grid'], ['Kuniaki Noda, Yuki Yamaguchi, Kazuhiro Nakadai, Hiroshi G. Okuno, Tetsuya Ogata', 'Lipreading using convolutional neural network', 'noda14_interspeech', 'cnn visual vsr feature phoneme acquired isolated extraction label mechanism'], ['Fei Tao, Carlos Busso', 'Lipreading approach for isolated digits recognition under whisper and neutral speech', 'tao14_interspeech', 'decrease condition performance visual confidential speaker evaluate feature appealing protect'], ['Kenta Masaka, Ryo Aihara, Tetsuya Takiguchi, Yasuo Ariki', 'Multimodal exemplar-based voice conversion using lip features in noisy environments', 'masaka14_interspeech', 'exemplar source method weight audio-visual target previous visual aam effectiveness'], ['Yunbin Deng, James T. Heaton, Geoffrey S. Meltzner', 'Towards a practical silent speech recognition system', 'deng14b_interspeech', 'semg sensor advance algorithmic optimum mono-phone extraction gold-standard electromyography modeling'], ['João Freitas, Artur Ferreira, Mário Figueiredo, António Teixeira, Miguel Sales Dias', 'Enhancing multimodal silent speech interfaces with feature selection', 'freitas14_interspeech', 'modality dimensionality well-known supervised unsupervised video doppler curse ultrasonic technique'], ['William Katz, Thomas F. Campbell, Jun Wang, Eric Farrar, J. Coleman Eubanks, Arvind Balasubramanian, Balakrishnan Prabhakaran, Rob Rennaker', 'Opti-speech: a real-time, 3d visual feedback system for speech training', 'katz14_interspeech', 'tongue talker place magnetometer articulation ontario movement canada subject synchronously'], ['Jun Wang, Ashok Samal, Jordan R. Green', 'Across-speaker articulatory normalization for speaker-independent silent speech recognition', 'wang14g_interspeech', 'procrustes ssis talker person tongue matching lip rotational translational using'], ['Marlene Zahner, Matthias Janke, Michael Wand, Tanja Schultz', 'Conversion from facial myoelectric signals to speech: a unit selection approach', 'zahner14_interspeech', 'emg signal electromyographic tackled struggle audio output frame-based selects input'], ['Michael Wand, Tanja Schultz', 'Towards real-life application of EMG-based speech recognition by using unsupervised adaptation', 'wand14_interspeech', 'session electrode reattachment paving electric recording electromyography unsuitable emg decode'], ['Yuan Liang, Koji Iwano, Koichi Shinoda', 'Simple gesture-based error correction interface for smartphone speech recognition', 'liang14_interspeech', 'region user word candidate mark operation rerank succeeding corrected n-grams'], ['Kshitiz Kumar, Chaojun Liu, Yifan Gong', 'Normalization of ASR confidence classifier scores via confidence mapping', 'kumar14_interspeech', 'threshold developer tuning histogram-based necessitating fix alters cost-effective often tradeoff'], ['Tanel Alumäe', 'Neural network phone duration model for speech recognition', 'alumae14_interspeech', 'speaker-adapted estonian calculates finnish rescoring lattice word density contextual dnn'], ['Haşim Sak, Oriol Vinyals, Georg Heigold, Andrew Senior, Erik McDermott, Rajat Monga, Mark Mao', 'Sequence discriminative distributed training of long short-term memory recurrent neural networks', 'sak14b_interspeech', 'lstm rnns criterion dnns trained scale state-level model modeling asynchronous'], ['Zhen Huang, Jinyu Li, Chao Weng, Chin-Hui Lee', 'Beyond cross-entropy: towards better frame-level objective functions for deep neural network training in automatic speech recognition', 'huang14c_interspeech', 'dnn softmax function lvcsr dnns activation target posterior tied-triphone pairing'], ['Hao Tang, Kevin Gimpel, Karen Livescu', 'A comparison of training approaches for discriminative segmental models', 'tang14_interspeech', 'loss rescoring conditional lattice surrogate cost proxy function maximizing optimizing'], ['Erik McDermott, Georg Heigold, Pedro J. Moreno, Andrew Senior, Michiel Bacchiani', 'Asynchronous stochastic optimization for sequence training of deep neural networks: towards big data', 'mcdermott14_interspeech', 'dnns efficiency hour across small-scale proof different scalability descent condition'], ['Hrishikesh Rao, Jonathan C. Kim, Mark A. Clements, Agata Rozga, Daniel S. Messinger', "Detection of children's paralinguistic events in interaction with caregivers", 'rao14_interspeech', 'fussing cry laughter detecting infant speech age wrapper-based behavior tertiary'], ['Massimo Pettorino, Elisa Pellegrino', 'Age and rhythmic variations: a study on Italian', 'pettorino14_interspeech', 'vtov speech vowel portion rhythm rate interval scarcely accompanies age-related'], ['Nicholas Cummins, Vidhyasaharan Sethu, Julien Epps, Jarek Krajewski', 'Probabilistic acoustic volume analysis for speech affected by depression', 'cummins14_interspeech', 'measure spectral level variability reflective monte carlo speaker depressed alteration'], ['Elif Bozkurt, Orith Toledo-Ronen, Alexander Sorin, Ron Hoory', 'Exploring modulation spectrum features for speech-based depression level classification', 'bozkurt14_interspeech', 'depressed frequency spectral domain feature manageable spectrum-based speech provides feature-level'], ['Florian Hönig, Anton Batliner, Elmar Nöth, Sebastian Schnieder, Jarek Krajewski', 'Automatic modelling of depressed speech: relevant features and relevance of gender', 'honig14_interspeech', 'depression sleepy sleepiness retardation psychomotor relaxation group complemented reduction enriched'], ['P. Gangamohan, Sudarsana Reddy Kadiri, Suryakanth V. Gangashetty, B. Yegnanarayana', 'Excitation source features for discrimination of anger and happy emotions', 'gangamohan14_interspeech', 'emotion like state spectral discriminating zero short-time discriminate activation strength'], ['Ke Wu, Cyril Allauzen, Keith Hall, Michael Riley, Brian Roark', 'Encoding linear models as weighted finite-state transducers', 'wu14_interspeech', 'automaton library general conversion extension shortest solid passed cmu grapheme-to-phoneme'], ['Keigo Kubo, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura', 'Structured soft margin confidence weighted learning for grapheme-to-phoneme conversion', 'kubo14b_interspeech', 'hypothesis multi-class update extends n-best method allowing discriminative parameter overfitted'], ['Wei Zhang, Robert A. J. Clark, Yongyuan Wang', 'Unsupervised language filtering using the latent dirichlet allocation', 'zhang14d_interspeech', 'identification identifying collapsed reformulated gibbs data scratch pure count pruning'], ['BalaKrishna Kolluru, Vincent Wan, Javier Latorre, Kayoko Yanagisawa, Mark J. F. Gales', 'Generating multiple-accent pronunciations for TTS using joint sequence model interpolation', 'kolluru14_interspeech', 'accent space possible graphone defined scottish point synthesis homogeneous specify'], ['Gustavo Mendonça, Sandra Aluisio', 'Using a hybrid approach to build a pronunciation dictionary for Brazilian Portuguese', 'mendonca14_interspeech', 'grapheme transcription wikipedia word tokenized machine-readable rule loanword machine -fold'], ['Matthew P. Aylett, Rasmus Dall, Arnab Ghoshal, Gustav Eje Henter, Thomas Merritt', 'A flexible front-end for HTS', 'aylett14_interspeech', 'idlak festival parametric synthesis full front-ends context mo experimenting mature'], ['Kimiko Tsukada, Felicity Cox, John Hajek', 'Cross-language perception of Japanese singleton and geminate consonants: preliminary data from non-native learners of Japanese and native speakers of Italian and australian English', 'tsukada14_interspeech', 'nnj consonant length listener accurate contrast contrastively group knowledge naïve'], ['Samra Alispahic, Paola Escudero, Karen E. Mulak', 'Difficulty in discriminating non-native vowels: are Dutch vowels easier for australian English than Spanish listeners?', 'alispahic14_interspeech', 'ause contrast vowel fewer whose contains compared monolingual discriminate language'], ['Jing Yang, Robert Allen Fox', 'Acoustic properties of shared vowels in bilingual Mandarin-English children', 'yang14b_interspeech', 'proficiency assimilatory monolingual age-matched english equidistant dynamic group low vowel'], ['María Luisa García Lecumberri, Roberto Barra-Chicote, Rubén Pérez Ramón, Junichi Yamagishi, Martin Cooke', 'Generating segmental foreign accent', 'lecumberri14_interspeech', 'localised non-native language native accented word cross-splicing naturally-produced deviating weakest'], ['Bistra Andreeva, Grażyna Demenko, Bernd Möbius, Frank Zimmerer, Jeanin Jügler, Magdalena Oleskowicz-Popiel', 'Differences of pitch profiles in Germanic and slavic languages', 'andreeva14_interspeech', 'polish span group bulgarian skewness kurtosis language succeeded variation distributional'], ['Mathieu Avanzi, Guri Bordal, Gélase Nimbona', 'The obligatory contour principle in african and European varieties of French', 'avanzi14_interspeech', 'ocp tone contact respected role lexical clash substrate permitted language'], ['Nicolas Scheffer, Yun Lei', 'Content matching for short duration speaker recognition', 'scheffer14_interspeech', 'enrollment text-independent verification text-dependent run utterance backends pave rsr mining'], ['Anthony Larcher, Kong Aik Lee, Pablo L. Sordo Martínez, Trung Hieu Nguyen, Bin Ma, Haizhou Li', 'Extended RSR2015 for text-dependent speaker verification over VHF channel', 'larcher14_interspeech', 'database transmitted khz marine infocomm singapore recording research extending institute'], ['Tianfan Fu, Yanmin Qian, Yuan Liu, Kai Yu', 'Tandem deep features for text-dependent speaker verification', 'fu14_interspeech', 'gmm-ubm discriminant network rbm boltzmann rsr feature thoroughly type used'], ['Patrick Kenny, Themos Stafylakis, M. J. Alam, Pierre Ouellet, Marcel Kockmann', 'In-domain versus out-of-domain training for text-dependent JFA', 'kenny14_interspeech', 'cslu rsr ubm adapting type cheating adaptation error rate model'], ['Hagai Aronowitz, Asaf Rendel', 'Domain adaptation for text dependent speaker verification', 'aronowitz14_interspeech', 'text-dependent target data mismatch technique content lexical authentication discarded supervectors'], ['Antonio Miguel, Jesús Villalba, Alfonso Ortega, Eduardo Lleida, Carlos Vaquero', 'Factor analysis with sampling methods for text dependent speaker recognition', 'miguel14_interspeech', 'responsibility mfa latent tied independent sample rest dimensional computing approximation'], ['Ewout van den Berg, Bhuvana Ramabhadran', 'Dictionary-based pitch tracking with dynamic programming', 'berg14_interspeech', 'swipe performance low-pitched atom halving yin high-pitched doubling detection facilitating'], ['Hongbing Hu, Stephen A. Zahorian, Peter Guzewich, Jiang Wu', 'Acoustic features for robust classification of Mandarin tones', 'hu14c_interspeech', 'yaapt pitch tone tracker yin track trajectory praat shape all-voiced'], ['Anastasia Karlsson, Håkan Lundström, Jan-Olof Svantesson', 'Preservation of lexical tones in singing in a tone language', 'karlsson14_interspeech', 'melodic template recitation mon-khmer two-tone contrast strengthening ignores vowel syllable'], ['Theodora Yakoumaki, George P. Kafentzis, Yannis Stylianou', 'Emotional speech classification using adaptive sinusoidal modelling', 'yakoumaki14_interspeech', 'eaqhm feature amplitude quasi-harmonic asm susa scheme quantizers model synthesis'], ['Shengbei Wang, Masashi Unoki, Nam Soo Kim', 'Formant enhancement based speech watermarking for tampering detection', 'wang14h_interspeech', 'scheme signal digital unauthorized symmetrically criterion watermark originality lsfs integrity'], ['Tom Barker, Hugo Van\xa0hamme, Tuomas Virtanen', 'Modelling primitive streaming of simple tone sequences through factorisation of modulation pattern tensors', 'barker14_interspeech', 'component learned predict segregated either tensor segregation alternating percept organisation'], ['Biswajit Dev Sarma, S. R. M. Prasanna', 'Detection of vowel onset points in voiced aspirated sounds of indian languages', 'sarma14_interspeech', 'vop vops accurately manual end-point electroglottograph locating egg found method'], ['Akira Sasou', 'Accuracy evaluation of esophageal voice analysis based on automatic topology generated-voicing source HMM', 'sasou14_interspeech', 'vs-hmm ar-hmm voicing ring accurately auto-regressive tract vocal glottal separate'], ['Xuejun Zhang, Xiang Xie', 'Audio watermarking based on multiple echoes hiding for FM radio', 'zhang14e_interspeech', 'imperceptibility re-sampling guaranteeing robustness premise broadcasting higher echo recovered receiver'], ['Petr Motlicek, David Imseng, Milos Cernak, Namhoon Kim', 'Development of bilingual ASR system for MediaParl corpus', 'motlicek14_interspeech', 'accented fdlp speech entropy-based tandem reverberant language-specific reverberation approximately reason'], ['Jie Li, Rong Zheng, Bo Xu', 'Investigation of cross-lingual bottleneck features in hybrid ASR systems', 'li14f_interspeech', 'dnns two-level extractor feature second-level first-level target-language non-target holistic data'], ['Oluwapelumi Giwa, Marelie H. Davel', 'Language identification of individual words with joint sequence models', 'giwa14_interspeech', 'f-measure origin importance required pronunciation -language system code-switched task asr'], ['Xavier Anguera, Jordi Luque, Ciro Gracia', 'Audio-to-text alignment for speech recognition with very limited resources', 'anguera14_interspeech', 'recognizer aligned approximate phoneme audio text transcript trained harvested initial'], ['Hoang Gia Ngo, Nancy F. Chen, Sunil Sivadas, Bin Ma, Haizhou Li', 'A minimal-resource transliteration framework for vietnamese', 'ngo14_interspeech', 'language statistical keyword word adopted suitable search training outperforms scarce'], ['Heike Adel, Dominic Telaar, Ngoc Thang Vu, Katrin Kirchhoff, Tanja Schultz', 'Combining recurrent neural networks and factored language models during decoding of code-Switching speech', 'adel14b_interspeech', 'seame cluster brown backoff interpolate modeling identifier relative weakness linearly'], ['Zoltán Tüske, Pavel Golik, David Nolden, Ralf Schlüter, Hermann Ney', 'Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages', 'tuske14b_interspeech', 'mtwv mlps project mlp unilingual absolute high-performing multilingually haitian lao'], ['Ryo Masumura, Taichi Asami, Takanobu Oba, Hirokazu Masataki, Sumitaka Sakauchi', 'Mixture of latent words language models for domain adaptation', 'masumura14_interspeech', 'lwlms space lm lwlm modeling variable merger n-gram out-of weight'], ['Robert Herms, Marc Ritter, Thomas Wilhelm-Stein, Maximilian Eibl', 'Improving spoken document retrieval by unsupervised language model adaptation using utterance-based web search', 'herms14_interspeech', 'lwlms mixture latent space lm lwlm modeling variable merger n-gram'], ['Jen-Tzung Chien, Ying-Lan Chang', 'The nested indian buffet process for flexible topic modeling', 'chien14_interspeech', 'document tree hierarchy relaxing mixture nonparametric chooses flexibly representation heterogeneous'], ['K. Levin, I. Ponomareva, A. Bulusheva, G. Chernykh, I. Medennikov, N. Merkin, A. Prudnikov, Natalia Tomashenko', 'Automated closed captioning for Russian live broadcasting', 'levin14_interspeech', 'editing asr real-time respeaking wer inflected exceeding reporting caption punctuation'], ['Lei Wang, Rong Tong', 'Pronunciation modeling of foreign words for Mandarin ASR by considering the effect of language transfer', 'wang14i_interspeech', 'english rule lexicon harming lexical speech recognition re-estimation phonetic without'], ['Attapol T. Rutherford, Fuchun Peng, Françoise Beaufays', 'Pronunciation learning for named-entities through crowd-sourcing', 'rutherford14_interspeech', 'named-entity accurate turk cheap prof mechanical origin day collecting quickly'], ['Barbara Schuppler, Martine Adda-Decker, Juan A. Morales-Cordovilla', 'Pronunciation variation in read and conversational austrian German', 'schuppler14_interspeech', 'rule speech creation realization whereas grass style tool specific phonetic'], ['Maider Lehr, Kyle Gorman, Izhak Shafran', 'Discriminative pronunciation modeling for dialectal speech recognition', 'lehr14_interspeech', 'aave model canonical dialect lexicon mismatch tease vernacular phone acoustic'], ['Thomas Pellegrini, Lionel Fontan, Julie Mauclair, Jérôme Farinas, Marina Robert', 'The goodness of pronunciation algorithm applied to disordered speech', 'pellegrini14_interspeech', 'gop mispronunciation phone detect grade speaker false realization deviance unilateral'], ['Angeliki Metallinou, Jian Cheng', 'Using deep neural networks to improve proficiency assessment for children English language learners', 'metallinou14_interspeech', 'ell dnn-based recognition spoken gmm-hmms cd-dnn-hmms relu rectified speech open-ended'], ['Han Lu, Sheng-syun Shen, Sz-Rung Shiang, Hung-yi Lee, Lin-shan Lee', 'Alignment of spoken utterances with slide content for easier learning with recorded lectures using structured support vector machine (SVM)', 'lu14c_interspeech', 'course entropy-based taiwan massive top-down offered word-based align on-line helpful'], ['Richeng Duan, Jinsong Zhang, Wen Cao, Yanlu Xie', 'A preliminary study on ASR-based detection of Chinese mispronunciation by Japanese learners', 'duan14_interspeech', 'detecting articulation pronunciation false feedback instructive moderately capt assisted erroneous'], ['Kele Xu, Yin Yang, A. Jaumard-Hakoun, Martine Adda-Decker, A. Amelot, S. K. Al Kork, L. Crevier-Buchman, P. Chawah, G. Dreyfus, T. Fux, C. Pillot-Loiseau, P. Roussel, M. Stone, B. Denby', '3d tongue motion visualization based on ultrasound image sequences', 'xu14b_interspeech', 'driven real-time visualizing mid-sagittal deformation used educational demonstration modal follow'], ["Donald Derrick, Tom De Rybel, Greg A. O'Beirne, Jennifer Hay", 'Listen with your skin: aerotak speech perception enhancement system', 'derrick14_interspeech', 'audio channel headphone drive air voiceless stored left signal right'], ['László Czap', 'Speech assistant system', 'czap14_interspeech', 'university head developed transcoder project granted union cooperation three-dimensional deaf'], ['Rafael E. Banchs, Seokhwan Kim', 'Spoken dialogue system for restaurant recommendation and reservation', 'banchs14_interspeech', 'phased singapore booking deploy demo tell city regional service easily'], ['Hayakawa Akira, Nick Campbell, Saturnino Luz', 'Interlingual map task corpus collection', 'akira14_interspeech', 'used hcrc prototyping data speech-to-speech demonstration accessible collecting system explanation'], ['Jordi Centelles, Marta R. Costa-jussà, Rafael E. Banchs', 'A client mobile application for Chinese-Spanish statistical machine translation', 'centelles14_interspeech', 'server-based ocr optical smt off-line tell supporting engine recognition operation'], ['Alberto Benin, Piero Cosi, Giuseppe Riccardo Leone, Giulio Paci', 'LuciawebGL: a new WebGL-Based talking head', 'benin14_interspeech', 'webgl ipad io iphone apple worldwide demo running first mobile'], ['Babak Naderi, Tim Polzehl, André Beyer, Tibor Pilz, Sebastian Möller', 'Crowdee: mobile crowdsourcing micro-task platform for celebrating the diversity of languages', 'naderi14_interspeech', 'community mobility underrepresented stimulate versatile foster scalability research covering operates'], ['Roger K. Moore', "On the use of the `pure data' programming language for teaching and public outreach in speech processing", 'moore14_interspeech', 'course non-real-time non-specialists scripting toolbox example prescribed specialised lend toolkits'], ['Aleksandr Dubinsky', 'Syncwords: a platform for semi-automated closed captioning and subtitles', 'dubinsky14_interspeech', 'caption captioned workplace well-structured legislation submitting post-processed progressing easy-to-use breaking'], ['Robert A. J. Clark', 'Simple<SUP>4</SUP>all', 'clark14_interspeech', 'synthesis system speech language skill expert lack widely-spoken building resource'], ['P. Chawah, S. K. Al Kork, T. Fux, Martine Adda-Decker, A. Amelot, N. Audibert, B. Denby, G. Dreyfus, A. Jaumard-Hakoun, C. Pillot-Loiseau, P. Roussel, M. Stone, Kele Xu, L. Crevier-Buchman', 'An educational platform to capture, visualize and analyze rare singing', 'chawah14_interspeech', 'module learning helmet extendable interfaced style mastering greece versatility auto'], ['Kwang Myung Jeon, Chan Jun Chun, Woo Kyeong Seong, Hong Kook Kim, Myung Kyu Choi', 'Single-channel speech enhancement based on non-negative matrix factorization and online noise adaptation', 'jeon14_interspeech', 'nmf base simulator activated incoming decomposed non-speech adapting universal separation'], ['Dieter Maurer, Peggy Mok, Daniel Friedrichs, Volker Dellwo', 'Intelligibility of high-pitched vowel sounds in the singing and speaking of a female Cantonese opera singer', 'maurer14_interspeech', 'identifiable proved perception range sung illustration exceeding org debate matter'], ['Pejman Mowlaee, Mario Kaoru Watanabe, Rahim Saeidi', 'Iterative refinement of amplitude and phase in single-channel speech enhancement', 'mowlaee14_interspeech', 'tell spectrum closed-loop justified recent degrading demo proposal corrupted signal-to-noise'], ['Sophie Roekhaut, Sandrine Brognaux, Richard Beaufort, Thierry Dutoit', 'elite-HTS: a NLP tool for French HMM-based speech synthesis', 'roekhaut14_interspeech', 'hts toolkit web file generates synthesizer service stage input training'], ['Andreea I. Niculescu, Rafael E. Banchs, Ridong Jiang, Seokhwan Kim, Kheng Hui Yeo, Arthur Niswar', "SARA — singapore's automated responsive assistant for the touristic domain", 'niculescu14_interspeech', 'android http text direction iris application sightseeing gps scanned transportation'], ['Andrew Plummer, Eric Riebling, Anuj Kumar, Florian Metze, Eric Fosler-Lussier, Rebecca Bates', 'The speech recognition virtual kitchen: launch party', 'plummer14_interspeech', 'vms community tool experimentation solicit installation collaborate grow repository promotes'], ['Kyle Marek-Spartz, Benjamin Knoll, Robert Bill, Thomas Christie, Serguei Pakhomov', 'System for automated speech and language analysis (SALSA)', 'marekspartz14_interspeech', 'cognitive assessment drug traumatic automates test injury administration neurodegenerative kaldi'], ['Ikuyo Masuda-Katsuse', 'Pronunciation practice support system for children who have difficulty correctly pronouncing words', 'masudakatsuse14_interspeech', 'test volunteer therapist probable elementary collaboration exercise tailored prepared individually'], ['Joris Driesen, Alexandra Birch, Simon Grimsey, Saeid Safarfashandi, Juliet Gauthier, Matt Simpson, Steve Renals', 'Automated production of true-cased punctuated subtitles for weather and news broadcasts', 'driesen14_interspeech', 'subtitling module pipeline bee uninterrupted red content resulting forecast system'], ['Minghui Dong, S. W. Lee, Haizhou Li, Paul Chan, Xuejian Peng, Jochen Walter Ehnes, Dongyan Huang', "I<SUP>2</SUP>r speech2singing perfects everyone's singing", 'dong14_interspeech', 'personalized rhythm infocomm apps maybe vocal singapore nice technology trivial'], ['Gustav Eje Henter, Thomas Merritt, Matt Shannon, Catherine Mayo, Simon King', 'Measuring the perceptual effects of modelling assumptions in speech synthesis using stimuli constructed from repeated natural speech', 'henter14_interspeech', 'limit mean-based parameter naturalness conditionally question diagonal shortcoming synthesised aloud'], ['Thomas Merritt, Tuomo Raitio, Simon King', 'Investigating source and filter contributions, and their interaction, to statistical parametric speech synthesis', 'merritt14_interspeech', 'ask independence stimulus assumption perceptual vocoded optionally listener justified permutation'], ['Javier Latorre, Vincent Wan, Kayoko Yanagisawa', 'Voice expression conversion with factorised HMM-TTS models', 'latorre14b_interspeech', 'transforms identity altering speaker sample modify desired without combined speech'], ['Kayoko Yanagisawa, Langzhou Chen, Mark J. F. Gales', 'Noise-robust TTS speaker adaptation with statistics smoothing', 'yanagisawa14_interspeech', 'similarity quality synthesis severe reasonable audio attribute target sub-space low-quality'], ['Sandrine Brognaux, Benjamin Picart, Thomas Drugman', 'Speech synthesis in various communicative situations: impact of pronunciation variations', 'brognaux14_interspeech', 'commentary sport phonetic phonetization elision plausibility schwa genuine analyzes message'], ['Ming-Qi Cai, Zhen-Hua Ling, Li-Rong Dai', 'Formant-controlled speech synthesis using hidden trajectory model', 'cai14_interspeech', 'formant htm controllability bandwidth residual quinphone formant-related decision-tree-based synthetic phone-dependent'], ['Xiao-Lei Zhang, DeLiang Wang', 'Boosted deep neural networks and multi-resolution cochleagram features for voice activity detection', 'zhang14f_interspeech', 'mrcg vad prediction base frame vads spectrotemporal concatenates multiple many'], ['Abhay Prasad, Prasanta Kumar Ghosh, Shrikanth S. Narayanan', 'Selection of optimal vocal tract regions using real-time magnetic resonance imaging for robust voice activity detection', 'prasad14_interspeech', 'vad mri audio rtmri image signal multimodal selected scanner sequence'], ['Ali Ziaei, Lakshmish Kaushik, Abhijeet Sangwan, John H. L. Hansen, Douglas W. Oard', 'Speech activity detection for NASA apollo space missions: challenges and solutions', 'ziaei14b_interspeech', 'combo-sad sad hansen john audio well sparse segment perspective distortion'], ['Ming Tu, Xiang Xie, Yishan Jiao', 'Towards improving statistical model based voice activity detection', 'tu14_interspeech', 'power-law vad nonlinearity coefficient dft subband method existing comprehensively feature'], ['Ian Vince McLoughlin', 'The use of low-frequency ultrasound for voice activity detection', 'mcloughlin14_interspeech', 'excitation ultrasonic lip reflected located closed signal human open normal'], ['Jeff Ma', 'Improving the speech activity detection for the DARPA RATS phase-3 evaluation', 'ma14_interspeech', 'sad bottleneck channel phase effort nns design bigger feature performance'], ['Duc Le, Emily Mower Provost', 'Modeling pronunciation, rhythm, and intonation for automatic assessment of speech quality in aphasia rehabilitation', 'le14_interspeech', 'umap aphasic patient verbal unpredictability non-ideal collaborating human-level speech-language attending'], ['Sofia Strömbergsson, Christina Tånnander, Jens Edlund', 'Ranking severity of speech errors by their phonological impact in context', 'strombergsson14_interspeech', 'pcc disorder child edit clinician metric factor rating homonymy intelligibility'], ['J. R. Orozco-Arroyave, Florian Hönig, J. D. Arias-Londoño, J. F. Vargas-Bonilla, S. Skodda, J. Rusz, Elmar Nöth', "Automatic detection of parkinson's disease from words uttered in three different languages", 'orozcoarroyave14_interspeech', 'possible speech people develop imprecise aided symptom czech accuracy addressing'], ['Jason Lilley, Susan Nittrouer, H. Timothy Bunnell', 'Automating an objective measure of pediatric speech intelligibility', 'lilley14b_interspeech', 'csim setting clinical child response listener scoring human hearing research'], ['Mostafa Shahin, Beena Ahmed, Jacqueline McKechnie, Kirrie Ballard, Ricardo Gutierrez-Osuna', 'A comparison of GMM-HMM and DNN-HMM based pronunciation verification techniques for use in the assessment of childhood apraxia of speech', 'shahin14_interspeech', 'disordered correctness hybrid child phoneme conventional probable phone-based mispronunciation creates'], ['Jeff Berry, Andrew Kolb, Cassandra North, Michael T. Johnson', 'Acoustic and kinematic characteristics of vowel production through a virtual vocal tract in dysarthria', 'berry14_interspeech', 'sensorimotor talker learning working control idiosyncratic dysarthric speech space healthy'], ['Michael Wand, Matthias Janke, Tanja Schultz', 'The EMG-UKA corpus for electromyographic speech processing', 'wand14b_interspeech', 'emg data hour audibly silently recorded download whispered available articulated'], ['Pei Xuan Lee, Darren Wee, Hilary Si Yin Toh, Boon Pang Lim, Nancy F. Chen, Bin Ma', 'A whispered Mandarin corpus for speech technology applications', 'lee14b_interspeech', 'neutral absent human explicit dilemma tone unresolved hindered fruitful presently'], ['Roberto Gretter', 'Euronews: a multilingual benchmark for ASR and LID', 'gretter14_interspeech', 'polish language corpus data hour purpose training portal turkish iwslt'], ['Antigoni Tsiami, Isidoros Rodomagoulakis, Panagiotis Giannoulis, Athanasios Katsamanis, Gerasimos Potamianos, Petros Maragos', 'ATHENA: a Greek multi-sensory database for home automation control uthor: isidoros rodomagoulakis (NTUA, Greece)', 'tsiami14_interspeech', 'microphone command event activity condenser one-minute mem kinect close-talk ceiling'], ['Marco Matassoni, Ramón Fernandez Astudillo, Athanasios Katsamanis, Mirco Ravanelli', 'The DIRHA-GRID corpus: baseline and tools for multi-room distant speech recognition using distributed microphones', 'matassoni14_interspeech', 'multi-channel apartment distant-talking discussing environment accompanying grid highlighting initiative oriented'], ['Diogo Henriques, Isabel Trancoso, Daniel Mendes, Alfredo Ferreira', 'Verbal description of LEGO blocks', 'henriques14_interspeech', 'object query request corpus involving diminutive participant immersive resorting metaphor'], ['Pejman Mowlaee, Rahim Saeidi, Yannis Stylianou', 'Phase importance in speech processing applications', 'mowlaee14b_interspeech', 'spectrum proceed neglected spread giving dominant interspeech introduction overview separation'], ['Estefanía Cano, Mark Plumbley, Christian Dittmar', 'Phase-based harmonic/percussive separation', 'cano14_interspeech', 'discriminates method expectation music track element peak superior followed phase'], ['Gilles Degottex, Nicolas Obin', 'Phase distortion statistics as a representation of the glottal source: application to the classification of voice qualities', 'degottex14_interspeech', 'analytical pulse paramount para-linguistic thus mood attitude priori model assume'], ['Gilles Degottex, Daniel Erro', 'A measure of phase randomness for the harmonic model in speech synthesis', 'degottex14b_interspeech', 'voicing vocoders decision vocoder time-frequency feature randomization parametrizing aperiodicity statistical'], ['Emma Jokinen, Marko Takanen, Hannu Pulakka, Paavo Alku', 'Enhancement of speech intelligibility in near-end noise conditions with phase modification', 'jokinen14_interspeech', 'post-processing method mobile amplitude spectrum equalized amplification unprocessed modifies adverse'], ['S. Aswin Shanmugam, Hema Murthy', 'A hybrid approach to segmentation of speech using group delay processing and HMM based embedded reestimation', 'shanmugam14_interspeech', 'boundary syllable monophone accurate hmms reestimated ste synthesis vicinity spurious'], ['Maria Koutsogiannaki, Olympia Simantiraki, Gilles Degottex, Yannis Stylianou', 'The importance of phase on voice quality assessment', 'koutsogiannaki14_interspeech', 'glottal ranking spectrum irregularity reveals source component feature spasmodic minimum-phase'], ['Karthika Vijayan, Vinay Kumar, K. Sri Rama Murty', 'Feature extraction from analytic phase of speech signals for speaker verification', 'vijayan14_interspeech', 'ifcc mfcc instantaneous cepstral performance wrapping coefficient nist- strategy delivers'], ['Jon Sanchez, Ibon Saratxaga, Inma Hernaez, Eva Navas, Daniel Erro', 'A cross-vocoder study of speaker independent synthetic speech detection using phase information', 'sanchez14b_interspeech', 'rps parameterization technique vocoder human dependency conversion system allow minimum-phase'], ['Peng Yang, Cheung-Chi Leung, Lei Xie, Bin Ma, Haizhou Li', 'Intrinsic spectral analysis based on temporal context features for query-by-example spoken term detection', 'yang14c_interspeech', 'isa qbe-std manifold query utterance posteriorgram separability untranscribed baseline independence'], ['Julien van Hout, Vikramjit Mitra, Yun Lei, Dimitra Vergyri, Martin Graciarena, Arindam Mandal, Horacio Franco', "Recent improvements in SRI's keyword detection system for noisy audio", 'hout14_interspeech', 'feature logistic-regression keywords levantine farsi speech rat combination filter-bank output'], ['Mitsuaki Makino, Naoki Yamamoto, Atsuhiko Kai', 'Utilizing state-level distance vector representation for improved spoken term detection by text and spoken queries', 'makino14_interspeech', 'scoring std subword-level subword oov query expanded approximate matching confidence-based'], ['Raghavendra Reddy Pappagari, Shekhar Nayak, K. Sri Rama Murty', 'Unsupervised spoken word retrieval using Gaussian-bernoulli restricted boltzmann machines', 'pappagari14_interspeech', 'swr mfccs activation feature hidden spanned telugu critically data dtw'], ['Basil George, Abhijeet Saxena, Gautam Mantena, Kishore Prahallad, B. Yegnanarayana', 'Unsupervised query-by-example spoken term detection using bag of acoustic words and non-segmental dynamic time warping', 'george14_interspeech', 'retrieval quick search stage optimum document mediaeval boaw space prune'], ['Jie Li, Xiaorui Wang, Bo Xu', 'An empirical study of multilingual and low-resource spoken term detection using deep neural networks', 'li14g_interspeech', 'std cross-lingual dropout transfer system shared-hidden-layer mandarin shl-mdnn sgmms elevated'], ['Peter Schulam, Murat Akbacak', 'Diagnostic techniques for spoken keyword discovery', 'schulam14_interspeech', 'alignment feature unsupervised algorithm subsequence find technology intrinsically characteristic used'], ['Sho Kawasaki, Tomoyosi Akiba', 'Robust retrieval models for false positive errors in spoken documents', 'kawasaki14_interspeech', 'scr co-occur document query referred negative word deal pre-process std'], ['Yuan-ming Liou, Yi-sheng Fu, Hung-yi Lee, Lin-shan Lee', 'Semantic retrieval of personal photos using matrix factorization and two-layer random walk fusing sparse speech annotations with visual features', 'liou14_interspeech', 'retrieve latent collection topic bill technically columbia reinforced house mutually'], ['Guillaume Gravier, Nathan Souviraà-Labastie, Sébastien Campion, Frédéric Bimbot', 'Audio thumbnails for spoken content without transcription based on a maximum motif coverage criterion', 'gravier14_interspeech', 'discovery short keywords relevant authored resorting word-like extractive asr-based panel'], ['Fernando García, Emilio Sanchis, Ferran Pla', 'Semantically based search in a social speech task', 'garcia14_interspeech', 'segment audio semantic unbounded row lexical repository whatever enriched representation'], ['Vinay Kumar Mittal, B. Yegnanarayana', 'Study of changes in glottal vibration characteristics during laughter', 'mittal14b_interspeech', 'egg excitation normal examined signal zero-frequency electroglottograph correspondingly production source'], ['Stavros Ntalampiras, Ilyas Potamitis', 'On predicting the unpleasantness level of a sound event', 'ntalampiras14_interspeech', 'assessment baby cry animal mean framework squared mel-frequency characterize modulation'], ['Bilal Piot, Olivier Pietquin, Matthieu Geist', 'Predicting when to laugh with structured classification', 'piot14_interspeech', 'ecas interaction laughter eca laughing produce embodied human user-friendly imitation'], ['Benjamin Weiss, Katrin Schoenenberg', 'Conversational structures affecting auditory likeability', 'weiss14_interspeech', 'conference rating regression interlocutor manually exchanging back-channels pair-wise facilitation pre-defined'], ['Mathieu Avanzi, George Christodoulides, Damien Lolive, Elisabeth Delais-Roussarie, Nelly Barbot', 'Towards the adaptation of prosodic models for expressive text-to-speech synthesis', 'avanzi14b_interspeech', 'addressed style child adult audience register span differ fairy overtly'], ['Sho Matsumiya, Sakriani Sakti, Graham Neubig, Tomoki Toda, Satoshi Nakamura', 'Data-driven generation of text balloons based on linguistic and acoustic features of a comics-anime corpus', 'matsumiya14_interspeech', 'text-based said communication expressive emotional emotion comic system speech preferable'], ['Chiu-yu Tseng, Chao-yu Su', 'Learning L2 prosody is more difficult than you realize — F0 characteristics and chunking size of L1 English, TW L2 English and TW L1 Mandarin', 'tseng14_interspeech', 'prosodic unit transfer accentuating mastering contrast lower-level sharper higher-level difference'], ['Khiet P. Truong, Jürgen Trouvain', 'Investigating prosodic relations between initiating and responding laughs', 'truong14b_interspeech', 'laugh overlapping mimicry accommodation non-overlapping laughter uncommon point acoustic indication'], ['Dmytro Prylipko, Olga Egorow, Ingo Siegert, Andreas Wendemuth', 'Application of image processing methods to filled pauses detection from spontaneous speech', 'prylipko14_interspeech', 'interaction filter signal hereby short attitudinal react minimally utilising adaptable'], ['Sofoklis Kakouros, Okko Räsänen', 'Perception of sentence stress in English infant directed speech', 'kakouros14_interspeech', 'id ad acoustic prominence correlate adult corpus analysis property inter-annotator'], ['Noor Alhusna Madzlan, JingGuang Han, Francesca Bonin, Nick Campbell', 'Automatic recognition of attitudes in video blogs — prosodic and visual feature analysis', 'madzlan14_interspeech', 'attitude manifestation manual annotation relation cue prediction machine report state'], ['Denys Katerenchuk, David Guy Brizan, Andrew Rosenberg', '“was that your mother on the phone?”: classifying interpersonal relationships between dialog participants with lexical and acoustic properties', 'katerenchuk14b_interspeech', 'social understanding communication friend exploratory callhome spoken assignment chance distinguishing'], ['Rohan Kumar Das, S. Abhiram, S. R. M. Prasanna, A. G. Ramakrishnan', 'Combining source and system information for limited data speaker verification', 'das14_interspeech', 'mfcc feature using performance score-level characteristic found characterization drastically speaker-specific'], ['Mireia Diez, Amparo Varona, Mikel Penagarikano, Luis Javier Rodriguez-Fuentes, German Bordel', 'New insight into the use of phone log-likelihood ratios as features for language recognition', 'diez14_interspeech', 'pllr projected lre subspace nist made dataset shifted retrieved figure'], ['Sriram Ganapathy, Kyu Han, Samuel Thomas, Mohamed Omar, Maarten Van Segbroeck, Shrikanth S. Narayanan', 'Robust language identification using convolutional neural network features', 'ganapathy14_interspeech', 'lid bottleneck cnn rat acoustic complimentary segment task duration conjunction'], ['Chengzhu Yu, Gang Liu, John H. L. Hansen', 'Acoustic feature transformation using UBM-based LDA for speaker recognition', 'yu14c_interspeech', 'ubm baum-welch region trained mixture acoustical core statistic matrix estimated'], ['Man-Wai Mak', 'SNR-dependent mixture of PLDA for noise robust speaker verification', 'mak14_interspeech', 'i-vector utterance snr marginal test posterior wide soft-decision likelihood probability'], ['Seyed Omid Sadjadi, Jason Pelecanos, Weizhong Zhu', 'Nearest neighbor discriminant analysis for robust speaker recognition', 'sadjadi14_interspeech', 'nda lda channel inter-speaker i-vectors parametric non-speaker traditional gaussian scatter'], ['Shih-Hung Liu, Kuan-Yu Chen, Yu-Lun Hsieh, Berlin Chen, Hsin-Min Wang, Hsu-Chun Yen, Wen-Lian Hsu', 'Enhanced language modeling for extractive speech summarization with sentence relatedness information', 'liu14e_interspeech', 'document sentence-level structural unsupervised concisely method relationship resorting challenge viability'], ['Mohamed Morchid, Mohamed Bouallegue, Richard Dufour, Georges Linarès, Driss Matrouf, Renato De Mori', 'I-vector based representation of highly imperfect automatic transcriptions', 'morchid14b_interspeech', 'decoda multi-model transportation analytics paris dirichlet theme allocation reaching company'], ['Catherine Lai, Steve Renals', 'Incorporating lexical and prosodic information at different levels for meeting summarization', 'lai14b_interspeech', 'summary feature act icsi dialogue ami redundancy content prosody auroc'], ['Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori', 'Subspace Gaussian mixture models for dialogues classification', 'bouallegue14_interspeech', 'theme representation variability identification service multi-model homogenous transportation hidden paris'], ['Mohamed Bouallegue, Mohamed Morchid, Richard Dufour, Driss Matrouf, Georges Linarès, Renato De Mori', 'Factor analysis based semantic variability compensation for automatic conversation representation', 'bouallegue14b_interspeech', 'theme paradigm compensate additive service dialogue respect transportation super-vector vectorial'], ['Abdessalam Bouchekif, Géraldine Damnati, Delphine Charlet', 'Speech cohesion for topic segmentation of spoken contents', 'bouchekif14_interspeech', 'speaker distribution identity content confront lexical boundary reinforce mentioned format'], ['Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong', 'A comparative analytic study on the Gaussian mixture and context dependent deep neural network hidden Markov models', 'huang14d_interspeech', 'speaking cd-dnn-hmm rate snr dnn robustness phone performance gap remains'], ['Michiel Bacchiani, Andrew Senior, Georg Heigold', 'Asynchronous, online, GMM-free training of a context dependent acoustic model for speech recognition', 'bacchiani14_interspeech', 'bootstrapped dnn gmm allows state bootstrap optimizes scratch flat process'], ['Navdeep Jaitly, Vincent Vanhoucke, Geoffrey Hinton', 'Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models', 'jaitly14_interspeech', 'frame state neural dnn network-hidden predict using deep forced-alignment dropout'], ['Jinyu Li, Rui Zhao, Jui-Ting Huang, Yifan Gong', 'Learning small-size DNN with output-distribution-based criteria', 'li14h_interspeech', 'senone small standard number node training utilizing process large generation'], ['Li Deng, John C. Platt', 'Ensemble deep learning for speech recognition', 'deng14c_interspeech', 'stacking accuracy fully-connected convex log-linear formula analytical subsystem weakness dramatically'], ['Yucan Zhou, Qinghua Hu, Jie Liu, Yuan Jia', 'Learning conditional random field with hierarchical representations for dialogue act recognition', 'zhou14_interspeech', 'crf model utterance natural-language characterizes important baseline adapts intention verify'], ['Cristiane Hsu, Yi Xu', 'Can adolescents with autism perceive emotional prosody?', 'hsu14b_interspeech', 'asd perception emotion young finding sensitivity adult lack developing high-functioning'], ['Juliane Schmidt, Esther Janse, Odette Scharenborg', 'Age, hearing loss and the perception of affective utterances in conversational speech', 'schmidt14_interspeech', 'rating arousal participant group older valence aroused associated intensity younger'], ['Zhaojun Yang, Shrikanth S. Narayanan', 'Analysis of emotional effect on speech-body gesture interplay', 'yang14d_interspeech', 'speech-gesture valence body activation coupling emotion quantified low-level high-level communicative'], ['Cyrielle Chappuis, Didier Grandjean', 'When voices get emotional: a study of emotion-enhanced memory and impairment during emotional prosody exposure', 'chappuis14_interspeech', 'periphery expression recall enhancement fearful impairing consolidation vocal single-word facilitation'], ['Margaret Zellers', 'Perception of pitch tails at potential turn boundaries in Swedish', 'zellers14_interspeech', 'judgment height listener influence hold rising difference peak turn-transition wanted'], ['Robert Fuchs', 'Towards a perceptual model of speech rhythm: integrating the influence of f0 on perceived duration', 'fuchs14_interspeech', 'npvi-v dur vocalic difference interval quantifies prototypical account syllable-timing stress-timing'], ['Ling-Hui Chen, Tuomo Raitio, Cassia Valentini-Botinhao, Junichi Yamagishi, Zhen-Hua Ling', 'DNN-based stochastic postfilter for HMM-based speech synthesis', 'chen14h_interspeech', 'synthetic spectral reconstruct fine compared conditional modulation data-driven confirm variance'], ['Shiyin Kang, Helen Meng', 'Statistical parametric speech synthesis using weighted multi-distribution deep belief network', 'kang14_interspeech', 'flag spectrum synthesized weighting pitch contour generate vibrating smoother imbalance'], ['Yuchen Fan, Yao Qian, Feng-Long Xie, Frank K. Soong', 'TTS synthesis with bidirectional LSTM based recurrent neural networks', 'fan14_interspeech', 'feed-forward blstm-rnn trajectory dnn smooth speech outperform layer long hmm'], ['Tuomo Raitio, Antti Suni, Lauri Juvela, Martti Vainio, Paavo Alku', 'Deep neural network based trainable voice source model for synthesis of speech with varying vocal effort', 'raitio14_interspeech', 'dnn-based pulse flow glottal desired excitation matching dnn adapted waveform'], ['Dong Yu, Adam Eversole, Michael L. Seltzer, Kaisheng Yao, Brian Guenter, Oleksii Kuchaiev, Frank Seide, Huaming Wang, Jasha Droppo, Zhiheng Huang, Geoff Zweig, Chris Rossbach, Jon Currey', 'An introduction to computational networks and the computational network toolkit (invited talk)', 'yu14d_interspeech', 'generalization neural recurrence non-commercial introduce connectivity license microsoft topology functionality'], ['Raul Fernandez, Asaf Rendel, Bhuvana Ramabhadran, Ron Hoory', 'Prosody contour prediction with long short-term memory, bi-directional, deep recurrent neural networks', 'fernandez14b_interspeech', 'rnns dnns mean-opinion-score speech-synthesis fixed-size mean-square arbitrarily prosodic provide relative'], ['Xiang Yin, Ming Lei, Yao Qian, Frank K. Soong, Lei He, Zhen-Hua Ling, Li-Rong Dai', 'Modeling DCT parameterized F0 trajectory at intonation phrase level with DNN or decision tree', 'yin14_interspeech', 'state-level contour residual smoothed predictor model hpm predict capture usually'], ['Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki', 'High-order sequence modeling using speaker-dependent recurrent temporal restricted boltzmann machines for voice conversion', 'nakashika14_interspeech', 'rtrbms dependency speaker rtrbm network ant training abstraction source neural'], ['Feng-Long Xie, Yao Qian, Yuchen Fan, Frank K. Soong, Haifeng Li', 'Sequence error (SE) minimization training of neural network for voice conversion', 'xie14b_interspeech', 'frame converted target speaker trained nn-based mge source-target source stereo'], ['Florent Bocquelet, Thomas Hueber, Laurent Girin, Pierre Badin, Blaise Yvert', 'Robust articulatory speech synthesis using deep neural networks for BCI applications', 'bocquelet14_interspeech', 'dnn-based ema parameter assessed synthesizer bcis paralyzed auto-encoders envision robustness'], ['Shufang Xu', 'Acoustic investigation of /t<SUP>h</SUP>/ lenition in brunei Mandarin', 'xu14c_interspeech', 'spectrographic intensity reasonably judgment little chinese measurement tha perceptual increase'], ['Ting Wang, Hongwei Ding, Jianjing Kuang, Qiuwu Ma', 'Mapping emotions into acoustic space: the role of voice quality', 'wang14j_interspeech', 'cue listener explain revealed conjecture native perception component measure multi-dimensional'], ['Nagaraj Mahajan, Nima Mesgarani, Hynek Hermansky', 'Principal components of auditory spectro-temporal receptive fields', 'mahajan14_interspeech', 'spectral cortical relatively channel strfs mammalian octave processing engineering thousand'], ['Marwa Thlithi, Thomas Pellegrini, Julien Pinquier, Régine André-Obrecht', 'Segmentation in singer turns with the Bayesian information criterion', 'thlithi14_interspeech', 'bic value recording decision coefficient audio best kept penalty f-measure'], ['Catherine I. Watson', 'Mappings between vocal tract area functions, vocal tract resonances and speech formants for multiple speakers', 'watson14_interspeech', 'nze pc variance accounted first zealand backness combined individual speaker'], ['Sebastian Arndt, Markus Wenzel, Jan-Niklas Antons, Friedemann Köster, Sebastian Möller, Gabriel Curio', 'A next step towards measuring perceived quality of speech through physiology', 'arndt14_interspeech', 'eeg qoe subjective recorded itu expose electroencephalography provider study score'], ['Fei Chen, Sharon W. K. Wong, Lena L. N. Wong', 'Effect of spectral degradation to the intelligibility of vowel sentences', 'chen14i_interspeech', 'mandarin paradigm sinewave noise-vocoded perceptional flattening deleting flattened unprocessed spectrally'], ['Jeff Berry, John Jaeger, Melissa Wiedenhoeft, Brittany Bernal, Michael T. Johnson', 'Consonant context effects on vowel sensorimotor adaptation', 'berry14b_interspeech', 'formant coarticulation greater cvcs articulation articulatory perturb inter-articulator manipulates bulk'], ['Gérard Bailly, Amélie Martin', 'Assessing objective characterizations of phonetic convergence', 'bailly14_interspeech', 'domino lda interlocutor verbal llr interaction dyad characteristic dense comment'], ['Michael I. Mandel, Sarah E. Yoho, Eric W. Healy', 'Generalizing time-frequency importance functions across noises, talkers, and phonemes', 'mandel14_interspeech', 'generalize involving different mixture intelligibility novel trained utterance instance framework'], ['Yatin Mahajan, Jeesun Kim, Chris Davis', 'Does elderly speech recognition in noise benefit from spectral and visual cues?', 'mahajan14b_interspeech', 'young modulated group recognizing adult auditory participant effectiveness poorer relative'], ['Kornel Laskowski', 'On the conversant-specificity of stochastic turn-taking models', 'laskowski14_interspeech', 'conversant activity conditioning interlocutor asymptotically incipient variability contributor establishes relaxed'], ['Toshihiro Sakano, Yosuke Kobayashi, Kazuhiro Kondo', 'Single-ended estimation of speech intelligibility using the ITU p.563 feature set', 'sakano14_interspeech', 'sample degraded unknown noise require clean reference svr itu-t prof'], ['Emma Jokinen, Ulpu Remes, Marko Takanen, Kalle Palomäki, Mikko Kurimo, Paavo Alku', 'Spectral tilt modelling with GMMs for intelligibility enhancement of narrowband telephone speech', 'jokinen14b_interspeech', 'post-processing envelope estimation method word-error unprocessed near-end lombard noise condition'], ['Friedemann Köster, Sebastian Möller', 'Analyzing perceptual dimensions of conversational speech quality', 'koster14_interspeech', 'telecommunication reason conversation system overall deeply diagnose provider subjectively separating'], ['Vincent Aubanel, Chris Davis, Jeesun Kim', 'Interplay of informational content and energetic masking in speech perception in noise', 'aubanel14_interspeech', 'preview cse information region understanding convey defined distorts suppresses listener'], ['Tudor-Cătălin Zorilă, Yannis Stylianou', 'On spectral and time domain energy reallocation for speech-in-noise intelligibility enhancement', 'zorila14_interspeech', 'hybrid algorithm ssn clarification masker shaped preservation noise reviewed snrs'], ['Fei Chen, Yi Hu', 'Objective quality evaluation of noise-suppressed speech: effects of temporal envelope and fine-structure cues', 'chen14j_interspeech', 'rating subjective distortion intelligibility measure noise-suppression ncm signal-to-noise-ratio noise objectively'], ['Dongmei Wang, Philipos C. Loizou, John H. L. Hansen', 'Noisy speech enhancement based on long term harmonic model to improve speech intelligibility for hearing impaired listeners', 'wang14k_interspeech', 'spectrum ea algorithm estimation recipient pitch implant target noise cochlear'], ['Cassia Valentini-Botinhao, Mirjam Wester', 'Using linguistic predictability and the lombard effect to increase the intelligibility of synthetic speech in noise', 'valentinibotinhao14b_interspeech', 'predictable word voice sentence spin audibility low speech-shaped unpredictable plain'], ['Maryam Al Dabel, Jon Barker', 'Speech pre-enhancement using a discriminative microscopic intelligibility model', 'dabel14_interspeech', 'interpretation maximise optimise uniquely strategy speech-shaped masker optimisation optimally noise'], ['Mark J. Harvilla, Richard M. Stern', 'Least squares signal declipping for robust speech recognition', 'harvilla14_interspeech', 'constrained blind reconstruction amplitude algorithm interpolates legitimate least-squares clipping ensuring'], ['Haihua Xu, Hang Su, Eng Siong Chng, Haizhou Li', 'Semi-supervised training for bottle-neck feature based DNN-HMM hybrid systems', 'xu14d_interspeech', 'sst atwv dnn tandem lattice technique shrink necessarily subspace keyword'], ['Olga Kapralova, John Alex, Eugene Weinstein, Pedro J. Moreno, Olivier Siohan', 'A big data approach to acoustic model training corpus selection', 'kapralova14_interspeech', 'transcript speech large logged flattening set voice-search vocabulary train state'], ['Patrick Cardinal, Ahmed Ali, Najim Dehak, Yu Zhang, Tuka Al Hanai, Yifan Zhang, James R. Glass, Stephan Vogel', 'Recent advances in ASR applied to an Arabic transcription system for Al-Jazeera', 'cardinal14_interspeech', 'broadcast news wer best i-vector-based report mpe test sequential error'], ['Martin Sundermeyer, Ralf Schlüter, Hermann Ney', 'rwthlm — the RWTH aachen university neural network language modeling toolkit', 'sundermeyer14b_interspeech', 'lstm easy software srilm allows download license treebank standard htk'], ['Wei-Chen Cheng, Stanley Kok, Hoai Vu Pham, Hai Leong Chieu, Kian Ming A. Chai', 'Language modeling with sum-product networks', 'cheng14_interspeech', 'spn node spns sum product tractable layer hidden inference interleaving'], ['Xiaodong Cui, Brian Kingsbury, Jia Cui, Bhuvana Ramabhadran, Andrew Rosenberg, Mohammad Sadegh Rasooli, Owen Rambow, Nizar Habash, Vaibhava Goel', 'Improving deep neural network acoustic modeling for audio corpus indexing under the IARPA babel program', 'cui14b_interspeech', 'dnn ffv sfm asr option keyword augmentation stochastic period already'], ['Shammur Absar Chowdhury, Arindam Ghosh, Evgeny A. Stepanov, Ali Orkan Bayer, Giuseppe Riccardi, Ioannis Klasinas', 'Cross-language transfer of semantic annotation via targeted crowdsourcing', 'chowdhury14_interspeech', 'language worker domain-specific reference source lack application coped skewed inter-annotator'], ['Dilek Hakkani-Tür, Asli Celikyilmaz, Larry Heck, Gokhan Tur, Geoff Zweig', 'Probabilistic enrichment of knowledge graph entities for relation detection in conversational understanding', 'hakkanitur14_interspeech', 'semantic unweighted natural usage type freebase weight version populated wikipedia'], ['Philip N. Garner, David Imseng, Thomas Meyer', 'Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch', 'garner14_interspeech', 'feasible small size grapheme-based lends switzerland station west bootstrapping proximity'], ['S. Harrat, K. Meftouh, M. Abbas, K. Smaili', 'Building resources for Algerian Arabic dialects', 'harrat14_interspeech', 'msa translation diacritization corpus forum tool machine language restore analyzer'], ['Luciana Ferrer, Yun Lei, Mitchell McLaren, Nicolas Scheffer', 'Spoken language recognition based on senone posteriors', 'ferrer14_interspeech', 'senones backends sample depth sad dimensionality heavily compared degraded posterior'], ['Javier Gonzalez-Dominguez, Ignacio Lopez-Moreno, Haşim Sak, Joaquin Gonzalez-Rodriguez, Pedro J. Moreno', 'Automatic language identification using long short-term memory recurrent neural networks', 'gonzalezdominguez14_interspeech', 'rnns lstm feed forward dnn better lid system fewer i-vector'], ['Brecht Desplanques, Kris Demuynck, Jean-Pierre Martens', 'Robust language recognition via adaptive language factor extraction', 'desplanques14_interspeech', 'broadcast variability ivector-based background accent documentary belgian adaptation classifier flemish'], ['Hamid Behravan, Ville Hautamäki, Sabato Marco Siniscalchi, Elie Khoury, Tommi Kurki, Tomi Kinnunen, Chin-Hui Lee', 'Dialect levelling in Finnish: a universal speech attribute approach', 'behravan14_interspeech', 'adopt among finland recognition adversely characterisation age-related characterise nuisance harder'], ['Mingming Chen, Zhanlei Yang, Hao Zheng, Wenju Liu', 'Improving native accent identification using deep neural networks', 'chen14k_interspeech', 'dnns label frame method gmm discriminative mandarin identify voting english'], ['Marie-José Kolly, Adrian Leemann, Volker Dellwo', 'Foreign accent recognition based on temporal information contained in lowpass-filtered speech', 'kolly14_interspeech', 'monotonized lowpass condition suprasegmental recognized speaker german applying filter afc'], ['Penny Karanasou, Yongqiang Wang, Mark J. F. Gales, Philip C. Woodland', 'Adaptation of deep neural network acoustic models using factorised i-vectors', 'karanasou14_interspeech', 'factorisation speaker cat particular environment represent speaker-based orthogonality individual noise'], ['Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura, Steven J. Rennie, Vaibhava Goel', 'Regularized feature-space discriminative adaptation for robust ASR', 'fukuda14_interspeech', 'fmpe map map-adapted model-space automotive objective porting mpe general-purpose favorably'], ['Yajie Miao, Hao Zhang, Florian Metze', 'Towards speaker adaptive training of deep neural network acoustic models', 'miao14c_interspeech', 'dnn sat sat-dnn dnns speaker-normalized speaker-adapted space feature babel updating'], ['Arseniy Gorin, Denis Jouvet', 'Component structuring and trajectory modeling for speech recognition', 'gorin14_interspeech', 'speaker gaussian class mixture density class-dependent tidigits phonetic large gmms'], ['Rama Doddipatla, Madina Hasan, Thomas Hain', 'Speaker dependent bottleneck layer training for speaker adaptation in automatic speech recognition', 'doddipatla14_interspeech', 'cmllr sat transformation gain respectively relative supervised discriminative wer yield'], ['Zhao You, Bo Xu', 'Improving wideband acoustic models using mixed-bandwidth training data via DNN adaptation', 'you14_interspeech', 'dnns bandwidth log-linear speech transforms domain-adaptation layer intractable deep narrowband'], ['Thomas Pellegrini, Vahid Hedayati, Isabel Trancoso, Annika Hämäläinen, Miguel Sales Dias', 'Speaker age estimation for elderly speech recognition in European Portuguese', 'pellegrini14b_interspeech', 'asr adapted range young decrease adult age-specific absolute five-year wer'], ['Maryam Najafian, Andrea DeMarco, Stephen Cox, Martin Russell', 'Unsupervised model selection for recognition of regional accented speech', 'najafian14_interspeech', 'adaptation speaker accent aid accent-dependent phonotactic-based unadapted asr baseline using'], ['Wen-Lin Zhang, Dan Qu, Wei-Qiang Zhang, Bi-Cheng Li', 'Speaker adaptation based on sparse and low-rank eigenphone matrix estimation', 'zhang14g_interspeech', 'constraint limited over-fitting eigenvoice data norm suffers rank nuclear regularization'], ['Yan Huang, Dong Yu, Chaojun Liu, Yifan Gong', 'Multi-accent deep neural network acoustic model with accent-specific top layer using the KLD-regularized model adaptation', 'huang14e_interspeech', 'accent bottom indian british shared baseline trained hour smd kl-divergence'], ['S. Shahnawazuddin, Rohit Sinha', 'A low complexity model adaptation approach involving sparse coding over multiple dictionaries', 'shahnawazuddin14_interspeech', 'eigenvoices pursuit on-line matching adapted cluster-specific parameter selection atom estimated'], ['Yuichi Kubota, Motoi Omachi, Tetsuji Ogawa, Tetsunori Kobayashi, Tsuneo Nitta', 'Effect of frequency weighting on MLP-based speaker canonicalization', 'kubota14_interspeech', 'vtln mapping optimal accurate spectrum intra-class vowel asr run-time function'], ['Zhen Huang, Jinyu Li, Sabato Marco Siniscalchi, I-Fan Chen, Chao Weng, Chin-Hui Lee', 'Feature space maximum a posteriori linear regression for adaptation of deep neural networks', 'huang14f_interspeech', 'lin dnn werr map parameter cd-dnn-hmms straight-forward cd-dnn-hmm robustness impair'], ['Natalia Tomashenko, Yuri Khokhlov', 'Speaker adaptation of context dependent deep neural networks based on MAP-adaptation and GMM-derived feature processing', 'tomashenko14_interspeech', 'gmm-hmm cd-dnn-hmm auxiliary dnn technique model performed multi-stream network regarded'], ['Martin Karafiát, František Grézl, Karel Veselý, Mirko Hannemann, Igor Szőke, Jan Černocký', 'BUT 2014 Babel system: analysis of adaptation in NN based systems', 'karafiat14_interspeech', 'bottle-neck sbn un-transcribed compressive feature stacked summarizes shown several dnn-based'], ['Mickael Rouvier, Benoit Favre', 'Speaker adaptation of DNN-based ASR with i-vectors: does it actually adapt models to speakers?', 'rouvier14_interspeech', 'dnns acoustic stacking i-vector feature dnn gain repere exactly concatenated'], ['Kushagra Singhal, Rajesh M. Hegde', 'A sparse reconstruction method for speech source localization using partial dictionaries over a spherical microphone array', 'singhal14_interspeech', 'doa complexity elevation estimation formulates circular mvdr angle arrival uniform'], ['Weiwei Cui, Jaeyeon Cho, Seungyeol Lee', 'A robust TDOA estimation method for in-car-noise environments', 'cui14c_interspeech', 'ratio anti-noise time highway anomaly in-car environment spatially difference arrival'], ['Lorin Netsch, Jacek Stachurski', 'Robust low-resource sound localization in correlated noise', 'netsch14_interspeech', 'gcc-phat direction cross-spectrum phase subtracted tdoa low-complexity cross-correlation augmenting arrival'], ['Dongwen Ying, Ruohua Zhou, Junfeng Li, Jielin Pan, Yonghong Yan', 'Direction-of-arrival estimation of multiple speakers using a planar array', 'ying14_interspeech', 'bin-wise doas interference efficiency concave method computational picking azimuth closed-form'], ['Wei Xue, Shan Liang, Wenju Liu', 'Weighted spatial bispectrum correlation matrix for DOA estimation in the presence of interferences', 'xue14_interspeech', 'wsbcm bpd undirected interference method decision-directed derived robust formulating eigenvalue'], ['Mariem Bouafif, Zied Lachiri', 'Multi-sources separation for sound source localization', 'bouafif14_interspeech', 'microphone distance mixing scatter azimuth tdoa plot big arrival delayed'], ['Chiyuan Zhang, Stephen Voinea, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio', 'Phone classification by a hierarchy of invariant representation layers', 'zhang14h_interspeech', 'template transformation neuron cortex learning speech assembling variability auditory computational'], ['Mark Sinclair, Peter Bell, Alexandra Birch, Fergus McInnes', 'A semi-Markov model for speech segmentation with an utterance-break prior', 'sinclair14_interspeech', 'asr translation impact passing optimised utterance downstream desired stream task'], ['G. Aneeja, B. Yegnanarayana', 'Speech detection in transient noises', 'aneeja14_interspeech', 'vad resolution variance noise frequency temporal method representation high filtering'], ['Yongjun He, Guanglu Sun, Guibin Zheng, Jiqing Han', 'Evaluation of dictionary for sparse coding in speech processing', 'he14b_interspeech', 'denoising reasonable measure elemental urgently signal sparsely good representation sparsity'], ['Colin Vaz, Vikram Ramanarayanan, Shrikanth S. Narayanan', 'Joint filtering and factorization for recovering latent structure from noisy speech data', 'vaz14b_interspeech', 'matrix nmf basis algorithm capture cost recovers mvdr distortionless optimizes'], ['A. Gallardo-Antolín, J. M. Montero, Simon King', 'A comparison of open-source segmentation architectures for dealing with imperfect data from the media in speech synthesis', 'gallardoantolin14_interspeech', 'recording music new non-expressive mono-speaker tt low-quality foreground scripted noise'], ['Taichi Asami, Ryo Masumura, Hirokazu Masataki, Sumitaka Sakauchi', 'Read and spontaneous speech classification based on variance of GMM supervectors', 'asami14_interspeech', 'style reading consecutive classify utterance unknown speaker-dependency discriminate extracting mixed'], ['Navid Shokouhi, Seyed Omid Sadjadi, John H. L. Hansen', 'Co-channel speech detection via spectral analysis of frequency modulated sub-bands', 'shokouhi14_interspeech', 'overlapped-speech sub-band gammatone filterbank segment overlap frequency-modulated disperse signal-to-interference feature'], ['Stephen Voinea, Chiyuan Zhang, Georgios Evangelopoulos, Lorenzo Rosasco, Tomaso Poggio', 'Word-level invariant representations from acoustic waveforms', 'voinea14_interspeech', 'raw extracting word speech waveform level variability projecting one-dimensional mfcc-based'], ['Paul Dalsgaard, Ove Andersen', 'On closed form calculation of line spectral frequencies (LSF)', 'dalsgaard14_interspeech', 'chebyshev polynomial recurrence zero lsfs function relationship expansion theory series'], ['Chahid Ouali, Pierre Dumouchel, Vishwa Gupta', 'Robust features for content-based audio copy detection', 'ouali14_interspeech', 'fingerprinting min trecvid system localization latest parameter lowest transformed compression'], ['Yi Jiang, DeLiang Wang, RunSheng Liu', 'Binaural deep neural network classification for reverberant speech segregation', 'jiang14_interspeech', 'dnn multisource untrained scene dnns systematically binary employing generalization configuration'], ['Xavier Anguera, Luis Javier Rodriguez-Fuentes, Igor Szőke, Andi Buzo, Florian Metze, Mikel Penagarikano', 'Query-by-example spoken term detection on multilingual unconstrained speech', 'anguera14b_interspeech', 'sw query condition mediaeval challenging search qbe-std top-performing proving edition'], ['Victor Soto, Lidia Mangu, Andrew Rosenberg, Julia Hirschberg', 'A comparison of multiple methods for rescoring keyword search lists for low resource languages', 'soto14_interspeech', 'rescored normalization alarm false confusion cns iarpa-babel stage lao zulu'], ['Damianos Karakos, Richard Schwartz', 'Subword and phonetic search for detecting out-of-vocabulary keywords', 'karakos14_interspeech', 'oov unit fuzzy atwv whole keyword term lattice recognized resulting'], ['Yun Wang, Florian Metze', 'An in-depth comparison of keyword specific thresholding and sum-to-one score normalization', 'wang14l_interspeech', 'oov unit search fuzzy atwv whole term phonetic lattice recognized'], ['Hung-yi Lee, Yu Zhang, Ekapol Chuangsuwanich, James R. Glass', 'Graph-based re-ranking using acoustic feature similarity between search results for spoken term detection on low-resource languages', 'lee14c_interspeech', 'std graph score confidence lao assamese limited propagate subwords higher'], ['Viet-Bac Le, Lori Lamel, Abdel Messaoudi, William Hartmann, Jean-Luc Gauvain, Cécile Woehrling, Julien Despres, Anindya Roy', 'Developing STT and KWS systems using limited language resources', 'le14b_interspeech', 'oov unit keywords iarpa-babel haitian zulu assamese creole pack graphemic'], ['William Hartmann, Viet-Bac Le, Abdel Messaoudi, Lori Lamel, Jean-Luc Gauvain', 'Comparing decoding strategies for subword-based keyword spotting in low-resourced languages', 'hartmann14_interspeech', 'subword oov unit lattice separate haitian zulu assamese creole detection'], ['Min Ma, Justin Richards, Victor Soto, Julia Hirschberg, Andrew Rosenberg', 'Strategies for rescoring keyword search results using word-burst and acoustic features', 'ma14b_interspeech', 'hit four-way denotes identification rescored interpolating data shortcoming burst alarm'], ['Di Xu, Florian Metze', 'Word-based probabilistic phonetic retrieval for low-resource spoken term detection', 'xu14e_interspeech', 'std oov haitian lao vocabulary-independent assamese bengali iarpa babel framework'], ['I-Fan Chen, Nancy F. Chen, Chin-Hui Lee', 'A keyword-boosted sMBR criterion to enhance keyword search performance in deep neural network based acoustic modeling', 'chen14l_interspeech', 'atwv appearing keywords training state-level rationale iarpa babel vietnamese given'], ['Justin Chiu, Yun Wang, Jan Trmal, Daniel Povey, Guoguo Chen, Alexander I. Rudnicky', 'Combination of FST and CN search in spoken term detection', 'chiu14_interspeech', 'std asr query better performs find different two-step usual transducer'], ['Chunxi Liu, Aren Jansen, Guoguo Chen, Keith Kintzley, Jan Trmal, Sanjeev Khudanpur', 'Low-resource open vocabulary keyword search using point process models', 'liu14f_interspeech', 'ppm oov phonetic state-of-the-art event setting keywords pronunciation atwv computational'], ['Yamato Ohtani, Masatsune Tamura, Masahiro Morita, Masami Akamine', 'GMM-based bandwidth extension using sub-band basis spectrum model', 'ohtani14_interspeech', 'sbm bwe wideband method component frequency gmm low-band high-band signal'], ['Kazuhiro Nakamura, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda', 'A mel-cepstral analysis technique restoring high frequency components from low-sampling-rate speech', 'nakamura14_interspeech', 'sampling quality data synthesized restores rate recorded statistical process internet'], ['S. W. Lee, Zhizheng Wu, Minghui Dong, Xiaohai Tian, Haizhou Li', 'A comparative study of spectral transformation techniques for singing voice synthesis', 'lee14d_interspeech', 'conversion adaptation highest similarity spectrum dataset pleasant maximum-likelihood best delivers'], ['Daisuke Saito, Hidenobu Doi, Nobuaki Minematsu, Keikichi Hirose', 'Application of matrix variate Gaussian mixture model to statistical voice conversion', 'saito14_interspeech', 'space joint function construct vector gmm precise source proper target'], ['Zhizheng Wu, Eng Siong Chng, Haizhou Li', 'Joint nonnegative matrix factorization for exemplar-based voice conversion', 'wu14b_interspeech', 'exemplar low-resolution high-resolution activation employed spectrum spectral method one-frame detail'], ['Kazuhiro Kobayashi, Tomoki Toda, Graham Neubig, Sakriani Sakti, Satoshi Nakamura', 'Statistical singing voice conversion with direct waveform modification based on the spectrum differential', 'kobayashi14_interspeech', 'singer svc converted arbitrary convert source target gmm directly vocoder-based'], ['Daniel P. W. Ellis, Hiroyuki Satoh, Zhuo Chen', 'Detecting proximity from personal audio recordings', 'ellis14_interspeech', 'body-worn close device fingerprinting poster privacy-preserving individual smartphone comparably virtually'], ['Huy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins', 'Acoustic event detection and localization with regression forests', 'phan14_interspeech', 'forest random displacement offset onset location learned category-specific category densely'], ['Marc Ferràs, Hervé Bourlard', 'Multi-source posteriors for speech activity detection on public talks', 'ferras14b_interspeech', 'sad non-speech ted shout variety energy-based conceptually evaluate source large'], ['Jonathan Dennis, Huy Dat Tran, Eng Siong Chng', 'Analysis of spectrogram image methods for sound event classification', 'dennis14_interspeech', 'time-frequency popular field inspiration standardised signature representation recognise concentrated frame-based'], ['Aharon Satt, Ron Hoory, Alexandra König, Pauline Aalten, Philippe H. Robert', 'Speech-based automatic and robust detection of very early dementia', 'satt14_interspeech', 'ctrl mci respective spoken drug evidence vocal potential statistical cheap'], ['Ganna Raboshchuk, Climent Nadeu, Omid Ghahabi, Sergi Solvez, Blanca Muñoz Mahamud, Ana Riverola de Veciana, Santiago Navarro Hervas', 'On the acoustic environment of a neonatal intensive care unit: initial description, and detection of equipment alarms', 'raboshchuk14_interspeech', 'nicu sound dbn-based preterm harmful dbn neurological loud belief gmm-based'], ['Robert Allen Fox, Ewa Jacewicz, Florence Hardjono', 'Non-native perception of regionally accented speech in a multitalker context', 'fox14b_interspeech', 'competing listener talker american masking english snr detail level phonetic-acoustic'], ['Giuseppina Turco, Elisabeth Delais-Roussarie', 'A crosslinguistic and acquisitional perspective on intonational rises in French', 'turco14_interspeech', 'learnt accent native phonetic learner implementation final mon dip produced'], ['Jung-Yueh Tu, Yuwen Hsiung, Min-Da Wu, Yao-Ting Sung', 'Error patterns of Mandarin disyllabic tones by Japanese learners', 'tu14b_interspeech', 'tone dominate production syllable requested monosyllable previous found research hierarchy'], ['Victoria Leong, Marina Kalashnikova, Denis Burnham, Usha Goswami', 'Infant-directed speech enhances temporal rhythmic structure in the envelope', 'leong14_interspeech', 'modulation id multi-timescale infant ad synchronised brain sampling stress-based s-amph'], ['Dilu Wewalaarachchi, Leher Singh', 'Influences of tone sandhi on word recognition in preschool children', 'wewalaarachchi14_interspeech', 'year substitution morphophonemic condition recognize able change form correctly warranted'], ['Hwee Hwee Goh, Charlene Hu, Kheng Hui Yeo, Leher Singh', 'Lexical representation of consonant, vowels and tones in early childhood', 'goh14_interspeech', 'sensitivity tone vowel year lexicon age predominance dissociation point preschool'], ['Ana A. Francisco, Alexandra Jesse, Margriet A. Groen, James M. McQueen', 'Audiovisual temporal sensitivity in typical and dyslexic adult readers', 'francisco14_interspeech', 'dyslexia reading event deficit cross-modal developmental non-speech unsynchronized difficulty synchrony'], ["Donald Derrick, Greg A. O'Beirne, Tom De Rybel, Jennifer Hay", 'Aero-tactile integration in fricatives: converting audio to air flow information for speech perception enhancement', 'derrick14b_interspeech', 'sha air-flow pair enhance auditory enhanced contacting uninformed turbulent extracted'], ['Guangting Mai', 'Relative importance of AM and FM cues for speech comprehension: effects of speaking rate and their implications for neurophysiological processing of speech', 'mai14_interspeech', '-hz study intelligibility modulation previous mandarin slowly-varying contribution made current'], ['Louise Stringer, Paul Iverson', 'The effect of regional and non-native accents on word recognition processes: a comparison of EEG responses in quiet to speech recognition in noise', 'stringer14_interspeech', 'evoked distance perceptual classification trial least neurally cortically classifiable aeps'], ['Manson C. -M. Fong, James W. Minett, Thierry Blu, William S. -Y. Wang', 'Towards a neural measure of perceptual distance — classification of electroencephalographic responses to synthetic vowels', 'fong14_interspeech', 'evoked trial least neurally cortically classifiable aeps aep behaviorally using'], ['Odette Scharenborg, Eric Sanders, Bert Cranen', "Collecting a corpus of Dutch noise-induced `slips of the ear'", 'scharenborg14_interspeech', 'misperceptions confusion consistent listener speaker shed different onto collect light'], ['Tuka Al Hanai, James R. Glass', 'Lexical modeling for Arabic ASR: a systematic approach', 'hanai14_interspeech', 'diacritic pronunciation lexicon rule compose resolved geminate deriving displayed covering'], ['Luiza Orosanu, Denis Jouvet', 'Hybrid language models for speech transcription', 'orosanu14_interspeech', 'word threshold occurrence recognized syllabified setting size unit helping syllable'], ['Ankur Gandhe, Florian Metze, Ian Lane', 'Neural network language models for low resource languages', 'gandhe14_interspeech', 'nnlms n-gram feed-forward lower amount training perplexity token recurrent data'], ['Siva Reddy Gangireddy, Fergus McInnes, Steve Renals', 'Feed forward pre-training for recurrent neural network language models', 'gangireddy14_interspeech', 'rnnlm nnlm weight asr perplexity penn across treebank ted output'], ['Brandon C. Roy, Soroush Vosoughi, Deb Roy', 'Grounding language models in spatiotemporal context', 'roy14_interspeech', 'spatially regularity temporally spatial information word grammar folding extralinguistic govern'], ['Shahab Jalalvand, Daniele Falavigna', 'Direct word graph rescoring using a* search and RNNLM', 'jalalvand14_interspeech', 'n-best stack asr list decoding rescores method trellis directly rnnlms'], ['Ciprian Chelba, Tomas Mikolov, Mike Schuster, Qi Ge, Thorsten Brants, Phillipp Koehn, Tony Robinson', 'One billion word benchmark for measuring progress in statistical language modeling', 'chelba14_interspeech', 'held-out perplexity baseline rebuild unpruned log-probability technique kneser-ney model available'], ['Andrea Schnall, Martin Heckmann', 'Integrating sequence information in the audio-visual detection of word prominence in a human-machine interaction scenario', 'schnall14_interspeech', 'crf neighboring hypoarticulation hand classifier misunderstanding segment accompanied chain parameter'], ['Fadi Biadsy, Keith Hall, Pedro J. Moreno, Brian Roark', 'Backoff inspired features for maximum entropy language models', 'biadsy14_interspeech', 'maxent hundred katz word-error billion regularize even regularized avoiding shown'], ['Dominic Telaar, Michael Wand, Dirk Gehrig, Felix Putze, Christoph Amma, Dominic Heger, Ngoc Thang Vu, Mark Erhardt, Tim Schlippe, Matthias Janke, Christian Herff, Tanja Schultz', 'BioKIT — real-time decoder for biosignal processing', 'telaar14_interspeech', 'toolkit kaldi researcher povey blaming employ parallelization scripting thread biosignals'], ['David Harwath, James R. Glass', 'Speech recognition without a lexicon — bridging the gap between graphemic and phonetic systems', 'harwath14b_interspeech', 'pronunciation model low-resource whatsoever haitian lao acoustic featured language iarpa'], ['Shengkui Zhao, Douglas L. Jones', 'A new auxiliary-vector algorithm with conjugate orthogonality for speech enhancement', 'zhao14b_interspeech', 'mvdr convergence favorite filter establishes converges state-of-the-art orthogonal formal suffer'], ['Neehar Jathar, Preeti Rao', 'Acoustic characteristics of critical message utterances in noise applied to speech intelligibility enhancement', 'jathar14_interspeech', 'announcement modification modified increased listening marathi noisy articulatory-acoustic station shaping'], ['Yong Xu, Jun Du, Li-Rong Dai, Chin-Hui Lee', 'Dynamic noise aware training for speech enhancement based on deep neural networks', 'xu14f_interspeech', 'dnn non-stationary equalization enrich logarithmic type suppress squared quality pesq'], ['Pasi Pertilä, Joonas Nikunen', 'Microphone array post-filtering using supervised machine learning for speech enhancement', 'pertila14_interspeech', 'post-filters ann noise enhance beamforming suppression perceptual captured spatial high'], ['Senthil Kumar Mani, Jitendra Kumar Dhiman, K. Sri Rama Murty', 'Novel speech duration modifier for packet based communication system', 'mani14_interspeech', 'pitch real-time modification overlap mark add method wsola lp-psola frame'], ['Ding Liu, Paris Smaragdis, Minje Kim', 'Experiments on deep learning for speech denoising', 'liu14g_interspeech', 'denoised lightweight various single-channel generalize signal-to-noise choice predict clean impact'], ['Nasser Mohammadiha, Simon Doclo', 'Single-channel dynamic exemplar-based speech enhancement', 'mohammadiha14_interspeech', 'nonnegative supervised enhanced nmf-based model high-resolution manifold holistic stft ensures'], ['Akihiro Kato, Ben Milner', 'Using hidden Markov models for speech enhancement', 'kato14_interspeech', 'filtering-based surface time-frequency noisy conventional straight operates reconstruct motivation musical'], ['Lukas Pfeifenberger, Franz Pernkopf', 'Blind source extraction based on a direction-dependent a-priori SNR', 'pfeifenberger14_interspeech', 'pmwf psd diffuse noise speech pea near-field estimate encounter far-field'], ['Carlos Eduardo Cancino Chacón, Pejman Mowlaee', 'Least squares phase estimation of mixed signals', 'chacon14_interspeech', 'lspe estimator replacing monte-carlo noisy speech sinusoid reconstructing noise removed'], ['Ji Ming, Danny Crookes', 'Speech enhancement from additive noise and channel distortion — a corpus-based approach', 'ming14_interspeech', 'lm algorithm preprocessor longest present wideband single-channel pesq aurora add'], ['Zhiyuan Zhou, Zhaogui Ding, Weifeng Li, Zhiyong Wu, Longbiao Wang, Qingmin Liao', 'Multi-channel speech enhancement using sparse coding on local time-frequency structures', 'zhou14b_interspeech', 'target meantime interferer signal conventional noise bin method suppress interfering'], ['Seyedmahdad Mirsamadi, John H. L. Hansen', 'Multichannel speech dereverberation based on convolutive nonnegative tensor factorization for ASR applications', 'mirsamadi14_interspeech', 'array room microphone clean-trained algorithm channel meter source dsr stft'], ['Zhuo Chen, Brian McFee, Daniel P. W. Ellis', 'Speech enhancement by low-rank and convolutive dictionary spectrogram decomposition', 'chen14m_interspeech', 'noise pre-learned noisy template sparse activation assumption requires background highly'], ['Xabier Jaureguiberry, Emmanuel Vincent, Gaël Richard', 'Multiple-order non-negative matrix factorization for speech enhancement', 'jaureguiberry14_interspeech', 'nmf averaging model selection order separation interesting entropic nicely inefficient'], ['Tae Gyoon Kang, Kisoo Kwon, Jong Won Shin, Nam Soo Kim', 'NMF-based speech enhancement incorporating deep neural network', 'kang14b_interspeech', 'encoding vector algorithm dnn technique estimation noisy conventional non-negative nmf'], ['Sukanya Sonowal, Kisoo Kwon, Nam Soo Kim, Jong Won Shin', 'A data-driven approach to speech enhancement using Gaussian process', 'sonowal14_interspeech', 'mmse-lsa gain residual stage estimator module mean-square log-spectral cascaded snrs'], ['Tom Bäckström, Christian R. Helmrich', 'Decorrelated innovative codebooks for ACELP using factorization of autocorrelation matrix', 'backstrom14_interspeech', 'quantization optimization decorrelate optimal decorrelation eigenvalue guaranteed celp exhaustive avoided'], ['Milos Cernak, Alexandros Lazaridis, Philip N. Garner, Petr Motlicek', 'Stress and accent transmission in HMM-based syllable-context very low bit rate speech coding', 'cernak14_interspeech', 'symbol signal-based synthesis receiver codec energy syllable-level bitrate scalar measure'], ['Hannu Pulakka, Anssi Rämö, Ville Myllylä, Henri Toukomaa, Paavo Alku', 'Subjective voice quality evaluation of artificial bandwidth extension: comparing different audio bandwidths and speech codecs', 'pulakka14_interspeech', 'abe clean background acr test highpass cutoff noisy evaluated score'], ['Zhong-Hua Fu, Lei Xie', 'Stereo acoustic echo suppression using widely linear filtering in the frequency domain', 'fu14b_interspeech', 'lcmv near-end signal single-channel preserve filter duplex inter-channel double-talk teleconferencing'], ['Bong-Ki Lee, Inyoung Hwang, Jihwan Park, Joon-Hyuk Chang', 'Enhanced muting method in packet loss concealment of ITU-t g.722 using sigmoid function with on-line optimized parameters', 'lee14e_interspeech', 'adaptive algorithm itut search-based steepest unnecessary conventional click grid minimizes'], ['Chao Wu, Kaiyu Jiang, Yanmeng Guo, Qiang Fu, Yonghong Yan', 'A robust step-size control algorithm for frequency domain acoustic echo cancellation', 'wu14c_interspeech', 'filter adaptive interference path bin-wise presence constraint diverging double-talk change'], ['E. Byambakhishig, K. Tanaka, Ryo Aihara, Toru Nakashika, Tetsuya Takiguchi, Yasuo Ariki', 'Error correction of automatic speech recognition based on normalized web distance', 'byambakhishig14_interspeech', 'confusion null semantic problem availability solve network similarity calculating using'], ['Erinç Dikici, Murat Saraçlar', 'Unsupervised training methods for discriminative language modeling', 'dikici14_interspeech', 'dlm reranking supervised determine semi-supervised rank transcription asr confusion output'], ['Long Qin, Alexander I. Rudnicky', 'Building a vocabulary self-learning speech recognition system', 'qin14_interspeech', 'oov word recover unknown correctly recognized learned phoneme-to-grapheme detect recognizer'], ['Tim Schlippe, Matthias Merz, Tanja Schultz', 'Methods for efficient semi-automatic pronunciation dictionary bootstrapping', 'schlippe14_interspeech', 'editing grapheme-to-phoneme converter phoneme-level effort spanish wiktionary german haitian initial'], ['Murat Akbacak, Dilek Hakkani-Tür, Gokhan Tur', 'Rapidly building domain-specific entity-centric language models using semantic web knowledge sources', 'akbacak14_interspeech', 'query domain movie user recognition sourcing fulfilled statistical word-error-rate listed'], ['Ann Lee, James R. Glass', 'Context-dependent pronunciation error pattern discovery with limited annotations', 'lee14f_interspeech', 'nonnative deriving template cuhk propagates label imbalanced kong hong phone-level'], ['Ashtosh Sapru, Hervé Bourlard', 'Detecting speaker roles and topic changes in multiparty conversations using latent topic models', 'sapru14_interspeech', 'meeting labeling reveal multistream multinomial distribution governed feature browsing accessing'], ['Chenglin Xu, Lei Xie, Guangpu Huang, Xiong Xiao, Eng Siong Chng, Haizhou Li', 'A deep neural network approach for sentence boundary detection in broadcast news', 'xu14g_interspeech', 'non-boundary inter-word crf position transcript posterior feature dnn linear-chain label'], ['Rahul Gupta, Sankaranarayanan Ananthakrishnan, Zhaojun Yang, Shrikanth S. Narayanan', 'Variable Span disfluency detection in ASR transcripts', 'gupta14b_interspeech', 'revision repetition reference system interjection slt word roc receiver lexically'], ['Camille Dutrey, Chloé Clavel, Sophie Rosset, Ioana Vasilescu, Martine Adda-Decker', 'A CRF-based approach to automatic disfluency detection in a French call-centre corpus', 'dutrey14_interspeech', 'portion linguistic task reparandum acoustic readability mining feature edit follow-up'], ['Madina Hasan, Rama Doddipatla, Thomas Hain', 'Multi-pass sentence-end detection of lecture speech', 'hasan14_interspeech', 'sed ser feature backward yield absolute end crf-based distance give'], ['Victoria Zayats, Mari Ostendorf, Hannaneh Hajishirzi', 'Multi-domain disfluency and repair detection', 'zayats14_interspeech', 'self-repairs crf-based reparandum annotates space state government cross-domain casual context'], ['Bing Jiang, Yan Song, Si Wei, Ian Vince McLoughlin, Li-Rong Dai', 'Task-aware deep bottleneck features for spoken language identification', 'jiang14b_interspeech', 'dbf dnn discriminative lre lid gmm-mmi nist layer recently ivector'], ['Rong Tong, Bin Ma, Haizhou Li', 'Virtual example for phonotactic language recognition', 'tong14_interspeech', '-seconds open-set closed-set training respectively non-target data method condition lre'], ['Wei-Wei Liu, Wei-Qiang Zhang, Jia Liu', 'Phonotactic language recognition based on time-gap-weighted lattice kernels', 'liu14h_interspeech', 'phone kernel gap n-gram long-context slr decaying length subsequence permissible'], ['Maarten van Segbroeck, Ruchir Travadi, Shrikanth S. Narayanan', 'UBM fused total variability modeling for language identification', 'segbroeck14b_interspeech', 'ubms short-duration i-vector fusion class robust rat baum-welch boosting representation'], ['Mireia Diez, Mikel Penagarikano, German Bordel, Amparo Varona, Luis Javier Rodriguez-Fuentes', 'On the complementarity of short-time fourier analysis windows of different lengths for improved language recognition', 'diez14b_interspeech', 'system event combining provided short size sharper heterogeneous several attained'], ['Ruchir Travadi, Maarten Van Segbroeck, Shrikanth S. Narayanan', 'Modified-prior i-vector estimation for language identification of short duration utterances', 'travadi14_interspeech', 'lid small variability length rat amount segment available representation darpa'], ["Luis Fernando D'Haro, Ricardo Cordoba, Christian Salamea, Javier Ferreiros", 'Language recognition using phonotactic-based shifted delta coefficients and multiple phone recognizers', 'dharo14_interspeech', 'pllr incorporation i-vector system albayzin sdc long-span philosophy cavg asr'], ['Oldřich Plchot, Mireia Diez, Mehdi Soufifar, Lukáš Burget', 'PLLR features in language recognition system for RATS', 'plchot14_interspeech', 'plp experimenting favorable frame-by-frame based phonotactic darpa seek outcome degraded'], ['Yin-Lai Yeong, Tien-Ping Tan', 'Language identification of code Switching sentences and multilingual sentences of under-resourced languages by using multi structural word information', 'yeong14_interspeech', 'lid sentence reason intersentential writer habitual attract happen inability audience']], stateSave: true, columnDefs: [ { targets: [0], className: 'dt-left', "mRender": function (data, type, full) { return '<a class="w3-text" href="' + full[2] + '.html' + '">' + full[1] + '<br><span class="w3-text w3-text-theme">' + full[0] + '</span></a>'; } }, { targets: [1, 2, 3], visible: false, }, ], "lengthMenu": [7, 10, 20, 50, 100, 200, 500], "pageLength": 50, "order": [[ 0, 'asc' ]], scrollY: '60vh', "dom": '<"top"l>rft<"bottom"ip><"clear">', "pagingType": "full_numbers", paging: true }); }); </script> </body> </html>