CINXE.COM
Search results for: labeled corpus
<!DOCTYPE html> <html lang="en" dir="ltr"> <head> <!-- Google tag (gtag.js) --> <script async src="https://www.googletagmanager.com/gtag/js?id=G-P63WKM1TM1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-P63WKM1TM1'); </script> <!-- Yandex.Metrika counter --> <script type="text/javascript" > (function(m,e,t,r,i,k,a){m[i]=m[i]||function(){(m[i].a=m[i].a||[]).push(arguments)}; m[i].l=1*new Date(); for (var j = 0; j < document.scripts.length; j++) {if (document.scripts[j].src === r) { return; }} k=e.createElement(t),a=e.getElementsByTagName(t)[0],k.async=1,k.src=r,a.parentNode.insertBefore(k,a)}) (window, document, "script", "https://mc.yandex.ru/metrika/tag.js", "ym"); ym(55165297, "init", { clickmap:false, trackLinks:true, accurateTrackBounce:true, webvisor:false }); </script> <noscript><div><img src="https://mc.yandex.ru/watch/55165297" style="position:absolute; left:-9999px;" alt="" /></div></noscript> <!-- /Yandex.Metrika counter --> <!-- Matomo --> <!-- End Matomo Code --> <title>Search results for: labeled corpus</title> <meta name="description" content="Search results for: labeled corpus"> <meta name="keywords" content="labeled corpus"> <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, user-scalable=no"> <meta charset="utf-8"> <link href="https://cdn.waset.org/favicon.ico" type="image/x-icon" rel="shortcut icon"> <link href="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/css/bootstrap.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/plugins/fontawesome/css/all.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/css/site.css?v=150220211555" rel="stylesheet"> </head> <body> <header> <div class="container"> <nav class="navbar navbar-expand-lg navbar-light"> <a class="navbar-brand" href="https://waset.org"> <img src="https://cdn.waset.org/static/images/wasetc.png" alt="Open Science Research Excellence" title="Open Science Research Excellence" /> </a> <button class="d-block d-lg-none navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#navbarMenu" aria-controls="navbarMenu" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="w-100"> <div class="d-none d-lg-flex flex-row-reverse"> <form method="get" action="https://waset.org/search" class="form-inline my-2 my-lg-0"> <input class="form-control mr-sm-2" type="search" placeholder="Search Conferences" value="labeled corpus" name="q" aria-label="Search"> <button class="btn btn-light my-2 my-sm-0" type="submit"><i class="fas fa-search"></i></button> </form> </div> <div class="collapse navbar-collapse mt-1" id="navbarMenu"> <ul class="navbar-nav ml-auto align-items-center" id="mainNavMenu"> <li class="nav-item"> <a class="nav-link" href="https://waset.org/conferences" title="Conferences in 2024/2025/2026">Conferences</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/disciplines" title="Disciplines">Disciplines</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/committees" rel="nofollow">Committees</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownPublications" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> Publications </a> <div class="dropdown-menu" aria-labelledby="navbarDropdownPublications"> <a class="dropdown-item" href="https://publications.waset.org/abstracts">Abstracts</a> <a class="dropdown-item" href="https://publications.waset.org">Periodicals</a> <a class="dropdown-item" href="https://publications.waset.org/archive">Archive</a> </div> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/page/support" title="Support">Support</a> </li> </ul> </div> </div> </nav> </div> </header> <main> <div class="container mt-4"> <div class="row"> <div class="col-md-9 mx-auto"> <form method="get" action="https://publications.waset.org/abstracts/search"> <div id="custom-search-input"> <div class="input-group"> <i class="fas fa-search"></i> <input type="text" class="search-query" name="q" placeholder="Author, Title, Abstract, Keywords" value="labeled corpus"> <input type="submit" class="btn_search" value="Search"> </div> </div> </form> </div> </div> <div class="row mt-3"> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Commenced</strong> in January 2007</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Frequency:</strong> Monthly</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Edition:</strong> International</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Paper Count:</strong> 617</div> </div> </div> </div> <h1 class="mt-3 mb-3 text-center" style="font-size:1.6rem;">Search results for: labeled corpus</h1> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">617</span> Developing an Intonation Labeled Dataset for Hindi</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Esha%20Banerjee">Esha Banerjee</a>, <a href="https://publications.waset.org/abstracts/search?q=Atul%20Kumar%20Ojha"> Atul Kumar Ojha</a>, <a href="https://publications.waset.org/abstracts/search?q=Girish%20Nath%20Jha"> Girish Nath Jha</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This study aims to develop an intonation labeled database for Hindi. Although no single standard for prosody labeling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behavior of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 100 sentences ( 500 words) each for declarative and interrogative types have been labeled using Praat. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=speech%20dataset" title="speech dataset">speech dataset</a>, <a href="https://publications.waset.org/abstracts/search?q=Hindi" title=" Hindi"> Hindi</a>, <a href="https://publications.waset.org/abstracts/search?q=intonation" title=" intonation"> intonation</a>, <a href="https://publications.waset.org/abstracts/search?q=labeled%20corpus" title=" labeled corpus"> labeled corpus</a> </p> <a href="https://publications.waset.org/abstracts/142503/developing-an-intonation-labeled-dataset-for-hindi" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/142503.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">199</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">616</span> A Preliminary Study for Building an Arabic Corpus of Pair Questions-Texts from the Web: Aqa-Webcorp</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Wided%20Bakari">Wided Bakari</a>, <a href="https://publications.waset.org/abstracts/search?q=Patrce%20Bellot"> Patrce Bellot</a>, <a href="https://publications.waset.org/abstracts/search?q=Mahmoud%20Neji"> Mahmoud Neji</a> </p> <p class="card-text"><strong>Abstract:</strong></p> With the development of electronic media and the heterogeneity of Arabic data on the Web, the idea of building a clean corpus for certain applications of natural language processing, including machine translation, information retrieval, question answer, become more and more pressing. In this manuscript, we seek to create and develop our own corpus of pair’s questions-texts. This constitution then will provide a better base for our experimentation step. Thus, we try to model this constitution by a method for Arabic insofar as it recovers texts from the web that could prove to be answers to our factual questions. To do this, we had to develop a java script that can extract from a given query a list of html pages. Then clean these pages to the extent of having a database of texts and a corpus of pair’s question-texts. In addition, we give preliminary results of our proposal method. Some investigations for the construction of Arabic corpus are also presented in this document. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Arabic" title="Arabic">Arabic</a>, <a href="https://publications.waset.org/abstracts/search?q=web" title=" web"> web</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus" title=" corpus"> corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=search%20engine" title=" search engine"> search engine</a>, <a href="https://publications.waset.org/abstracts/search?q=URL" title=" URL"> URL</a>, <a href="https://publications.waset.org/abstracts/search?q=question" title=" question"> question</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus%20building" title=" corpus building"> corpus building</a>, <a href="https://publications.waset.org/abstracts/search?q=script" title=" script"> script</a>, <a href="https://publications.waset.org/abstracts/search?q=Google" title=" Google"> Google</a>, <a href="https://publications.waset.org/abstracts/search?q=html" title=" html"> html</a>, <a href="https://publications.waset.org/abstracts/search?q=txt" title=" txt"> txt</a> </p> <a href="https://publications.waset.org/abstracts/46758/a-preliminary-study-for-building-an-arabic-corpus-of-pair-questions-texts-from-the-web-aqa-webcorp" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/46758.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">323</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">615</span> Native Language Identification with Cross-Corpus Evaluation Using Social Media Data: ’Reddit’</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Yasmeen%20Bassas">Yasmeen Bassas</a>, <a href="https://publications.waset.org/abstracts/search?q=Sandra%20Kuebler"> Sandra Kuebler</a>, <a href="https://publications.waset.org/abstracts/search?q=Allen%20Riddell"> Allen Riddell</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Native language identification is one of the growing subfields in natural language processing (NLP). The task of native language identification (NLI) is mainly concerned with predicting the native language of an author’s writing in a second language. In this paper, we investigate the performance of two types of features; content-based features vs. content independent features, when they are evaluated on a different corpus (using social media data “Reddit”). In this NLI task, the predefined models are trained on one corpus (TOEFL), and then the trained models are evaluated on different data using an external corpus (Reddit). Three classifiers are used in this task; the baseline, linear SVM, and logistic regression. Results show that content-based features are more accurate and robust than content independent ones when tested within the corpus and across corpus. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=NLI" title="NLI">NLI</a>, <a href="https://publications.waset.org/abstracts/search?q=NLP" title=" NLP"> NLP</a>, <a href="https://publications.waset.org/abstracts/search?q=content-based%20features" title=" content-based features"> content-based features</a>, <a href="https://publications.waset.org/abstracts/search?q=content%20independent%20features" title=" content independent features"> content independent features</a>, <a href="https://publications.waset.org/abstracts/search?q=social%20media%20corpus" title=" social media corpus"> social media corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=ML" title=" ML"> ML</a> </p> <a href="https://publications.waset.org/abstracts/142396/native-language-identification-with-cross-corpus-evaluation-using-social-media-data-reddit" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/142396.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">137</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">614</span> Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Vald%C3%AAnia%20Carvalho%20e%20Almeida">Valdênia Carvalho e Almeida</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=applied%20linguistics" title="applied linguistics">applied linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus%20linguistics" title=" corpus linguistics"> corpus linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=chemistry" title=" chemistry"> chemistry</a>, <a href="https://publications.waset.org/abstracts/search?q=research%20article" title=" research article"> research article</a>, <a href="https://publications.waset.org/abstracts/search?q=semantic%20preference" title=" semantic preference"> semantic preference</a> </p> <a href="https://publications.waset.org/abstracts/107205/semantic-preference-across-research-articles-a-corpus-based-study-of-adjectives-in-english" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/107205.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">185</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">613</span> Specialized Translation Teaching Strategies: A Corpus-Based Approach</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Yingying%20Ding">Yingying Ding</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This study presents a methodology of specialized translation with the objective of helping teachers to improve the strategies in teaching translation. In order to allow students to acquire skills to translate specialized texts, they need to become familiar with the semantic and syntactic features of source texts and target texts. The aim of our study is to use a corpus-based approach in the teaching of specialized translation between Chinese and Italian. This study proposes to construct a specialized Chinese - Italian comparable corpus that consists of 50 economic contracts from the domain of food. With the help of AntConc, we propose to compile a comparable corpus in for translation teaching purposes. This paper attempts to provide insight into how teachers could benefit from comparable corpus in the teaching of specialized translation from Italian into Chinese and through some examples of passive sentences how students could learn to apply different strategies for translating appropriately the voice. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=contrastive%20studies" title="contrastive studies">contrastive studies</a>, <a href="https://publications.waset.org/abstracts/search?q=specialised%20translation" title=" specialised translation"> specialised translation</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus-based%20approach" title=" corpus-based approach"> corpus-based approach</a>, <a href="https://publications.waset.org/abstracts/search?q=teaching" title=" teaching"> teaching</a> </p> <a href="https://publications.waset.org/abstracts/84027/specialized-translation-teaching-strategies-a-corpus-based-approach" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/84027.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">371</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">612</span> Grammatically Coded Corpus of Spoken Lithuanian: Methodology and Development</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=L.%20Kamandulyt%C4%97-Merfeldien%C4%97">L. Kamandulytė-Merfeldienė</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The paper deals with the main issues of methodology of the <em>Corpus of Spoken Lithuanian </em>which was started to be developed in 2006. At present, the corpus consists of 300,000 grammatically annotated word forms. The creation of the corpus consists of three main stages: collecting the data, the transcription of the recorded data, and the grammatical annotation. Collecting the data was based on the principles of balance and naturality. The recorded speech was transcribed according to the CHAT requirements of CHILDES. The transcripts were double-checked and annotated grammatically using CHILDES. The development of the Corpus of Spoken Lithuanian has led to the constant increase in studies on spontaneous communication, and various papers have dealt with a distribution of parts of speech, use of different grammatical forms, variation of inflectional paradigms, distribution of fillers, syntactic functions of adjectives, the mean length of utterances. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=CHILDES" title="CHILDES">CHILDES</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus%20of%20spoken%20Lithuanian" title=" corpus of spoken Lithuanian"> corpus of spoken Lithuanian</a>, <a href="https://publications.waset.org/abstracts/search?q=grammatical%20annotation" title=" grammatical annotation"> grammatical annotation</a>, <a href="https://publications.waset.org/abstracts/search?q=grammatical%20disambiguation" title=" grammatical disambiguation"> grammatical disambiguation</a>, <a href="https://publications.waset.org/abstracts/search?q=lexicon" title=" lexicon"> lexicon</a>, <a href="https://publications.waset.org/abstracts/search?q=Lithuanian" title=" Lithuanian"> Lithuanian</a> </p> <a href="https://publications.waset.org/abstracts/58169/grammatically-coded-corpus-of-spoken-lithuanian-methodology-and-development" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/58169.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">237</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">611</span> Corporate Cautionary Statement: A Genre of Professional Communication </h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Chie%20Urawa">Chie Urawa</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Cautionary statements or disclaimers in corporate annual reports need to be carefully designed because clear cautionary statements may protect a company in the case of legal disputes and may undermine positive impressions. This study compares the language of cautionary statements using two corpora, Sony’s cautionary statement corpus (S-corpus) and Panasonic’s cautionary statement corpus (P-corpus), illustrating the differences and similarities in relation to the use of meaningful cautionary statements and critically analyzing why practitioners use the way. The findings describe the distinct differences between the two companies in the presentation of the risk factors and the way how they make the statements. The word ability is used more for legal protection in S-corpus whereas the word possibility is used more to convey a better impression in P-corpus. The main similarities are identified in the use of lexical words and pronouns, and almost the same wordings for eight years. The findings show how they make the statements unique to the company in the presentation of risk factors, and the characteristics of specific genre of professional communication. Important implications of this study are that more comprehensive approach can be applied in other contexts, and be used by companies to reflect upon their cautionary statements. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=cautionary%20statements" title="cautionary statements">cautionary statements</a>, <a href="https://publications.waset.org/abstracts/search?q=corporate%20annual%20reports" title=" corporate annual reports"> corporate annual reports</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus" title=" corpus"> corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=risk%20factors" title=" risk factors"> risk factors</a> </p> <a href="https://publications.waset.org/abstracts/84605/corporate-cautionary-statement-a-genre-of-professional-communication" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/84605.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">171</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">610</span> A Corpus-Based Study on the Styles of Three Translators</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Wang%20Yunhong">Wang Yunhong</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The present paper is preoccupied with the different styles of three translators in their translating a Chinese classical novel Shuihu Zhuan. Based on a parallel corpus, it adopts a target-oriented approach to look into whether and what stylistic differences and shifts the three translations have revealed. The findings show that the three translators demonstrate different styles concerning their word choices and sentence preferences, which implies that identification of recurrent textual patterns may be a basic step for investigating the style of a translator. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus" title="corpus">corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=lexical%20choices" title=" lexical choices"> lexical choices</a>, <a href="https://publications.waset.org/abstracts/search?q=sentence%20characteristics" title=" sentence characteristics"> sentence characteristics</a>, <a href="https://publications.waset.org/abstracts/search?q=style" title=" style"> style</a> </p> <a href="https://publications.waset.org/abstracts/73431/a-corpus-based-study-on-the-styles-of-three-translators" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/73431.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">268</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">609</span> A Bayesian Approach for Analyzing Academic Article Structure</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Jia-Lien%20Hsu">Jia-Lien Hsu</a>, <a href="https://publications.waset.org/abstracts/search?q=Chiung-Wen%20Chang"> Chiung-Wen Chang</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Research articles may follow a simple and succinct structure of organizational patterns, called move. For example, considering extended abstracts, we observe that an extended abstract usually consists of five moves, including Background, Aim, Method, Results, and Conclusion. As another example, when publishing articles in PubMed, authors are encouraged to provide a structured abstract, which is an abstract with distinct and labeled sections (e.g., Introduction, Methods, Results, Discussions) for rapid comprehension. This paper introduces a method for computational analysis of move structures (i.e., Background-Purpose-Method-Result-Conclusion) in abstracts and introductions of research documents, instead of manually time-consuming and labor-intensive analysis process. In our approach, sentences in a given abstract and introduction are automatically analyzed and labeled with a specific move (i.e., B-P-M-R-C in this paper) to reveal various rhetorical status. As a result, it is expected that the automatic analytical tool for move structures will facilitate non-native speakers or novice writers to be aware of appropriate move structures and internalize relevant knowledge to improve their writing. In this paper, we propose a Bayesian approach to determine move tags for research articles. The approach consists of two phases, training phase and testing phase. In the training phase, we build a Bayesian model based on a couple of given initial patterns and the corpus, a subset of CiteSeerX. In the beginning, the priori probability of Bayesian model solely relies on initial patterns. Subsequently, with respect to the corpus, we process each document one by one: extract features, determine tags, and update the Bayesian model iteratively. In the testing phase, we compare our results with tags which are manually assigned by the experts. In our experiments, the promising accuracy of the proposed approach reaches 56%. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=academic%20English%20writing" title="academic English writing">academic English writing</a>, <a href="https://publications.waset.org/abstracts/search?q=assisted%20writing" title=" assisted writing"> assisted writing</a>, <a href="https://publications.waset.org/abstracts/search?q=move%20tag%20analysis" title=" move tag analysis"> move tag analysis</a>, <a href="https://publications.waset.org/abstracts/search?q=Bayesian%20approach" title=" Bayesian approach"> Bayesian approach</a> </p> <a href="https://publications.waset.org/abstracts/42221/a-bayesian-approach-for-analyzing-academic-article-structure" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/42221.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">330</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">608</span> English for Academic and Specific Purposes: A Corpus-Informed Approach to Designing Vocabulary Teaching Materials</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Said%20Ahmed%20Zohairy">Said Ahmed Zohairy</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Significant shifts in the theory and practice of teaching vocabulary affect teachers’ decisions about learning materials’ design. Relevant literature supports teaching specialised, authentic, and multi-word lexical items rather than focusing on single-word vocabulary lists. Corpora, collections of texts stored in a database, presents a reliable source of teaching and learning materials. Although corpus-informed studies provided guidance for teachers to identify useful language chunks and phraseological units, there is a scarcity in the literature discussing the use of corpora in teaching English for academic and specific purposes (EASP). The aim of this study is to improve teaching practices and provide a description of the pedagogical choices and procedures of an EASP tutor in an attempt to offer guidance for novice corpus users. It draws on the researcher’s experience of utilising corpus linguistic tools to design vocabulary learning activities without focusing on students’ learning outcomes. Hence, it adopts a self-study research methodology which is based on five methodological components suggested by other self-study researchers. The findings of the study noted that designing specialised and corpus-informed vocabulary learning activities could be challenging for teachers, as they require technical knowledge of how to navigate corpora and utilise corpus analysis tools. Findings also include a description of the researcher’s approach to building and analysing a specialised corpus for the benefit of novice corpus users; they should be able to start their own journey of designing corpus-based activities. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpora" title="corpora">corpora</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus%20linguistics" title=" corpus linguistics"> corpus linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus-informed" title=" corpus-informed"> corpus-informed</a>, <a href="https://publications.waset.org/abstracts/search?q=English%20for%20academic%20and%20specific%20purposes" title=" English for academic and specific purposes"> English for academic and specific purposes</a>, <a href="https://publications.waset.org/abstracts/search?q=agribusiness" title=" agribusiness"> agribusiness</a>, <a href="https://publications.waset.org/abstracts/search?q=vocabulary" title=" vocabulary"> vocabulary</a>, <a href="https://publications.waset.org/abstracts/search?q=phraseological%20units" title=" phraseological units"> phraseological units</a>, <a href="https://publications.waset.org/abstracts/search?q=materials%20design" title=" materials design"> materials design</a> </p> <a href="https://publications.waset.org/abstracts/190242/english-for-academic-and-specific-purposes-a-corpus-informed-approach-to-designing-vocabulary-teaching-materials" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/190242.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">24</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">607</span> A Corpus-Assisted Discourse Analysis of Adjectival Collocation of the Word 'Education' in the American Context</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Ngan%20Nguyen">Ngan Nguyen</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The study analyses adjectives collocating with the word ‘education’ in the American language of the Corpus of Global Web-based English using a combination of corpus linguistic and discourse analytical methods to examine not only language patterns but also social political ideologies around the topic. Significant conclusions are deduced: (1) there are a large number of adjectival collocates of the word education which have been identified and classified into four categories representing four different aspects of education: level, quality, forms and types of education; (2) education, as in combination with three first categories, carries the meaning as the act and process of teaching and learning while with the last category having the meaning of a particular kind of teaching or training; (3) higher education is the topic that gains most concerns from the American public; (4) five most significant ideologies are discovered from the corpus: higher education associates with financial affairs, higher education is an industry, monetary policy of the government on higher education, people require greater accessibility to higher education and people value higher education. The study contributes to the field of developing meanings of words through corpus analysis and the field of discourse analysis. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=adjectival%20collocation" title="adjectival collocation">adjectival collocation</a>, <a href="https://publications.waset.org/abstracts/search?q=American%20context" title=" American context"> American context</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus%20linguistics" title=" corpus linguistics"> corpus linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=discourse%20analysis" title=" discourse analysis"> discourse analysis</a>, <a href="https://publications.waset.org/abstracts/search?q=education" title=" education"> education</a> </p> <a href="https://publications.waset.org/abstracts/56903/a-corpus-assisted-discourse-analysis-of-adjectival-collocation-of-the-word-education-in-the-american-context" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/56903.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">346</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">606</span> Saudi Twitter Corpus for Sentiment Analysis</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Adel%20Assiri">Adel Assiri</a>, <a href="https://publications.waset.org/abstracts/search?q=Ahmed%20Emam"> Ahmed Emam</a>, <a href="https://publications.waset.org/abstracts/search?q=Hmood%20Al-Dossari"> Hmood Al-Dossari</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Sentiment analysis (SA) has received growing attention in Arabic language research. However, few studies have yet to directly apply SA to Arabic due to lack of a publicly available dataset for this language. This paper partially bridges this gap due to its focus on one of the Arabic dialects which is the Saudi dialect. This paper presents annotated data set of 4700 for Saudi dialect sentiment analysis with (K= 0.807). Our next work is to extend this corpus and creation a large-scale lexicon for Saudi dialect from the corpus. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Arabic" title="Arabic">Arabic</a>, <a href="https://publications.waset.org/abstracts/search?q=sentiment%20analysis" title=" sentiment analysis"> sentiment analysis</a>, <a href="https://publications.waset.org/abstracts/search?q=Twitter" title=" Twitter"> Twitter</a>, <a href="https://publications.waset.org/abstracts/search?q=annotation" title=" annotation"> annotation</a> </p> <a href="https://publications.waset.org/abstracts/44819/saudi-twitter-corpus-for-sentiment-analysis" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/44819.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">630</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">605</span> Corpus Linguistic Methods in a Theoretical Study of Quran Verb Tense and Aspect in Translations from Arabic to English</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Jawharah%20Alasmari">Jawharah Alasmari</a> </p> <p class="card-text"><strong>Abstract:</strong></p> In inflectional morphology of verb, tense and aspect indicate action’s time either past/present or future and their period whether completed or not. The usage and meaning of tense and aspect differ in Arabic and English, therefore is no simple one -to- one mapping from an Arabic verb inflected form an appropriate English translation depends on a range of features, including immediate and wider context of use. The Quranic Arabic Corpus includes seven alternative expertly crafted English translations of each Arabic verses, which provides a test dataset for the study of appropriate Arabic to English translations of verb tense and aspect. We applied Corpus Linguistics Methods in a theoretical study of exemplary verbs, to elicit candidate verbal contexts which influence the choice of English inflection for each verse. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Corpus%20linguistics%20methods" title="Corpus linguistics methods">Corpus linguistics methods</a>, <a href="https://publications.waset.org/abstracts/search?q=Arabic%20verb" title=" Arabic verb"> Arabic verb</a>, <a href="https://publications.waset.org/abstracts/search?q=tense%20and%20aspect" title=" tense and aspect"> tense and aspect</a>, <a href="https://publications.waset.org/abstracts/search?q=English%20translations" title=" English translations"> English translations</a> </p> <a href="https://publications.waset.org/abstracts/69201/corpus-linguistic-methods-in-a-theoretical-study-of-quran-verb-tense-and-aspect-in-translations-from-arabic-to-english" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/69201.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">392</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">604</span> Combining Corpus Linguistics and Critical Discourse Analysis to Study Power Relations in Hindi Newspapers</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Vandana%20Mishra">Vandana Mishra</a>, <a href="https://publications.waset.org/abstracts/search?q=Niladri%20Sekhar%20Dash"> Niladri Sekhar Dash</a>, <a href="https://publications.waset.org/abstracts/search?q=Jayshree%20Charkraborty"> Jayshree Charkraborty</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This present paper focuses on the application of corpus linguistics techniques for critical discourse analysis (CDA) of Hindi newspapers. While Corpus linguistics is the study of language as expressed in corpora (samples) of 'real world' text, CDA is an interdisciplinary approach to the study of discourse that views language as a form of social practice. CDA has mainly been studied from a qualitative perspective. However, we can say that recent studies have begun combining corpus linguistics with CDA in analyzing large volumes of text for the study of existing power relations in society. The corpus under our study is also of a sizable amount (1 million words of Hindi newspaper texts) and its analysis requires an alternative analytical procedure. So, we have combined both the quantitative approach i.e. the use of corpus techniques with CDA’s traditional qualitative analysis. In this context, we have focused on the Keyword Analysis Sorting Concordance Lines of the selected Keywords and calculating collocates of the keywords. We have made use of the Wordsmith Tool for all these analysis. The analysis starts with identifying the keywords in the political news corpus when compared with the main news corpus. The keywords are extracted from the corpus based on their keyness calculated through statistical tests like chi-squared test and log-likelihood test on the frequent words of the corpus. Some of the top occurring keywords are मोदी (Modi), भाजपा (BJP), कांग्रेस (Congress), सरकार (Government) and पार्टी (Political party). This is followed by the concordance analysis of these keywords which generates thousands of lines but we have to select few lines and examine them based on our objective. We have also calculated the collocates of the keywords based on their Mutual Information (MI) score. Both concordance and collocation help to identify lexical patterns in the political texts. Finally, all these quantitative results derived from the corpus techniques will be subjectively interpreted in accordance to the CDA’s theory to examine the ways in which political news discourse produces social and political inequality, power abuse or domination. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=critical%20discourse%20analysis" title="critical discourse analysis">critical discourse analysis</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus%20linguistics" title=" corpus linguistics"> corpus linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=Hindi%20newspapers" title=" Hindi newspapers"> Hindi newspapers</a>, <a href="https://publications.waset.org/abstracts/search?q=power%20relations" title=" power relations"> power relations</a> </p> <a href="https://publications.waset.org/abstracts/88699/combining-corpus-linguistics-and-critical-discourse-analysis-to-study-power-relations-in-hindi-newspapers" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/88699.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">224</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">603</span> A Corpus-Based Discourse Analysis of the Disappearance of MH370 in Malaysia and United Kingdom Newspapers: A Pilot Study</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Theng%20Theng%20Ong">Theng Theng Ong</a> </p> <p class="card-text"><strong>Abstract:</strong></p> This pilot study adopts a corpus-based discourse analysis to explore the construction of Malaysia airline tragedy MH370 in the selected Malaysian and United Kingdom (UK) newspapers. Fairclough’s three-dimensional model is adopted in the study to support the corpus-based analysis. The analysis aims to determine the ways in which Malaysian Airline tragedy MH370 is linguistically defined and constructed in terms of keywords and collocation. The study also seeks to identify the types of discourse that are presented in the news articles. In addition, the differences or similarities in terms of keywords, topics or issues covered by the selected Malaysian and UK news media are examined. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus" title="corpus">corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=CDA" title=" CDA"> CDA</a>, <a href="https://publications.waset.org/abstracts/search?q=newspapers" title=" newspapers"> newspapers</a>, <a href="https://publications.waset.org/abstracts/search?q=airline%20tragedies" title=" airline tragedies"> airline tragedies</a> </p> <a href="https://publications.waset.org/abstracts/48752/a-corpus-based-discourse-analysis-of-the-disappearance-of-mh370-in-malaysia-and-united-kingdom-newspapers-a-pilot-study" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/48752.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">300</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">602</span> The Sinful Pig: Social Construction of Hogs through Corpus Analysis in Czech</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Zden%C4%9Bk%20Joukl">Zdeněk Joukl</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The word for pig in Czech (prase) seems to be one of the most negatively connotated words denoting animals. This paper represents an analysis of the largest Czech corpora, including a diachronic corpus. Besides corpus-analytical tools, sentiment analysis methods and tools such as LIWC and word clouds are used to better capture the usage of the words for pigs in Czech. The most frequent collocations across domains are identified and extracted with context to be used for sentiment analysis, which reveals an almost exclusive negative sentiment or culinary context. The animal is burdened with a disproportionately high number of meanings representing negatively viewed human characteristics or behaviors (dirtiness, fatness, sweating, inebriation, aggressive driving, greediness or chauvinism are among the most frequent ones). The diachronic view helps us understand how this extreme bias came to existence both through institutional construction and human-animal relations. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus%20analysis" title="corpus analysis">corpus analysis</a>, <a href="https://publications.waset.org/abstracts/search?q=pig" title=" pig"> pig</a>, <a href="https://publications.waset.org/abstracts/search?q=sentiment" title=" sentiment"> sentiment</a>, <a href="https://publications.waset.org/abstracts/search?q=social%20construction" title=" social construction"> social construction</a> </p> <a href="https://publications.waset.org/abstracts/195461/the-sinful-pig-social-construction-of-hogs-through-corpus-analysis-in-czech" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/195461.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">4</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">601</span> Passive Voice in SLA: Armenian Learners’ Case Study</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Emma%20Nemishalyan">Emma Nemishalyan</a> </p> <p class="card-text"><strong>Abstract:</strong></p> It is believed that learners’ mother tongue (L1 hereafter) has a huge impact on their second language acquisition (L2 hereafter). This hypothesis has been exposed to both positive and negative criticism. Based on research results of a wide range of learners’ corpora (Chinese, Japanese, Spanish among others) the hypothesis has either been proved or disproved. However, no such study has been conducted on the Armenian learners. The aim of this paper is to understand the implication of the hypothesis on the Armenian learners’ corpus in terms of the use of the passive voice. To this end, the method of Contrastive Interlanguage Analysis (hereafter CIA) has been used on native speakers’ corpus (Louvain Corpus of Native English Essays (LOCNESS)) and Armenian learners’ corpus which has been compiled by me in compliance with International Corpus of Learner English (ICLE) guidelines. CIA compares the interlanguage (the language produced by learners) with the one produced by native speakers. With the help of this method, it is possible not only to highlight the mistakes that learners make, but also to underline the under or overuses. The choice of the grammar issue (passive voice) is conditioned by the fact that typologically Armenian and English are drastically different as they belong to different branches. Moreover, the passive voice is considered to be one of the most problematic grammar topics to be acquired by learners of the English language. Based on this difference, we hypothesized that Armenian learners would either overuse or underuse some types of the passive voice. With the help of Lancsbox software, we have identified the frequency rates of passive voice usage in LOCNESS and Armenian learners’ corpus to understand whether the latter have the same usage pattern of the passive voice as the native speakers. Secondly, we have identified the types of the passive voice used by the Armenian leaners trying to track down the reasons in their mother tongue. The results of the study showed that Armenian learners underused the passive voices in contrast to native speakers. Furthermore, the hypothesis that learners’ L1 has an impact on learners’ L2 acquisition and production was proved. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus%20linguistics" title="corpus linguistics">corpus linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=applied%20linguistics" title=" applied linguistics"> applied linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=second%20language%20acquisition" title=" second language acquisition"> second language acquisition</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus%20compilation" title=" corpus compilation"> corpus compilation</a> </p> <a href="https://publications.waset.org/abstracts/165348/passive-voice-in-sla-armenian-learners-case-study" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/165348.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">109</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">600</span> The Repetition of New Words and Information in Mandarin-Speaking Children: A Corpus-Based Study</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Jian-Jun%20Gao">Jian-Jun Gao</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Repetition is used for a variety of functions in conversation. When young children first learn to speak, they often repeat words from the adult’s recent utterance with the learning and social function. The objective of this study was to ascertain whether the repetitions are equivalent in indicating attention to new words and the initial repeat of information in conversation. Based on the observation of naturally occurring language use in Taiwan Corpus of Child Mandarin (TCCM), the results in this study provided empirical support to the previous findings that children are more likely to repeat new words they are offered than to repeat new information. When children get older, there would be a drop in the repetition of both new words and new information. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=acquisition" title="acquisition">acquisition</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus" title=" corpus"> corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=mandarin" title=" mandarin"> mandarin</a>, <a href="https://publications.waset.org/abstracts/search?q=new%20words" title=" new words"> new words</a>, <a href="https://publications.waset.org/abstracts/search?q=new%20information" title=" new information"> new information</a>, <a href="https://publications.waset.org/abstracts/search?q=repetition" title=" repetition"> repetition</a> </p> <a href="https://publications.waset.org/abstracts/106580/the-repetition-of-new-words-and-information-in-mandarin-speaking-children-a-corpus-based-study" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/106580.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">149</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">599</span> Chinese Students’ Use of Corpus Tools in an English for Academic Purposes Writing Course: Influence on Learning Behaviour, Performance Outcomes and Perceptions</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Jingwen%20Ou">Jingwen Ou</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Writing for academic purposes in a second or foreign language poses a significant challenge for non-native speakers, particularly at the tertiary level, where English academic writing for L2 students is often hindered by difficulties in academic discourse, including vocabulary, academic register, and organization. The past two decades have witnessed a rising popularity in the application of the data-driven learning (DDL) approach in EAP writing instruction. In light of such a trend, this study aims to enhance the integration of DDL into English for academic purposes (EAP) writing classrooms by investigating the perception of Chinese college students regarding the use of corpus tools for improving EAP writing. Additionally, the research explores their corpus consultation behaviors during training to provide insights into corpus-assisted EAP instruction for DDL practitioners. Given the uprising popularity of DDL, this research aims to investigate Chinese university students’ use of corpus tools with three main foci: 1) the influence of corpus tools on learning behaviours, 2) the influence of corpus tools on students’ academic writing performance outcomes, and 3) students’ perceptions and potential perceptional changes towards the use of such tools. Three corpus tools, CQPWeb, Sketch Engine, and LancsBox X, are selected for investigation due to the scarcity of empirical research on patterns of learners’ engagement with a combination of multiple corpora. The research adopts a pre-test / post-test design for the evaluation of students’ academic writing performance before and after the intervention. Twenty participants will be divided into two groups: an intervention and a non-intervention group. Three corpus training workshops will be delivered at the beginning, middle, and end of a semester. An online survey and three separate focus group interviews are designed to investigate students’ perceptions of the use of corpus tools for improving academic writing skills, particularly the rhetorical functions in different essay sections. Insights from students’ consultation sessions indicated difficulties with DDL practice, including insufficiency of time to complete all tasks, struggle with technical set-up, unfamiliarity with the DDL approach and difficulty with some advanced corpus functions. Findings from the main study aim to provide pedagogical insights and training resources for EAP practitioners and learners. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus%20linguistics" title="corpus linguistics">corpus linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=data-driven%20learning" title=" data-driven learning"> data-driven learning</a>, <a href="https://publications.waset.org/abstracts/search?q=English%20for%20academic%20purposes" title=" English for academic purposes"> English for academic purposes</a>, <a href="https://publications.waset.org/abstracts/search?q=tertiary%20education%20in%20China" title=" tertiary education in China"> tertiary education in China</a> </p> <a href="https://publications.waset.org/abstracts/180899/chinese-students-use-of-corpus-tools-in-an-english-for-academic-purposes-writing-course-influence-on-learning-behaviour-performance-outcomes-and-perceptions" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/180899.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">60</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">598</span> Corpus-Based Model of Key Concepts Selection for the Master English Language Course "Government Relations"</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Elena%20Pozdnyakova">Elena Pozdnyakova</a> </p> <p class="card-text"><strong>Abstract:</strong></p> “Government Relations” is a field of knowledge presently taught at the majority of universities around the globe. English as the default language can become the language of teaching since the issues discussed are both global and national in character. However for this field of knowledge key concepts and their word representations in English don’t often coincide with those in other languages. International master’s degree students abroad as well as students, taught the course in English at their national universities, are exposed to difficulties, connected with correct conceptualizing of terminology of GR in British and American academic traditions. The study was carried out during the GR English language course elaboration (pilot research: 2013 -2015) at Moscow State Institute of Foreign Relations (University), Russian Federation. Within this period, English language instructors designed and elaborated the three-semester course of GR. Methodologically the course design was based on elaboration model with the special focus on conceptual elaboration sequence and theoretical elaboration sequence. The course designers faced difficulties in concept selection and theoretical elaboration sequence. To improve the results and eliminate the problems with concept selection, a new, corpus-based approach was worked out. The computer-based tool WordSmith 6.0 was used with the aim to build a model of key concept selection. The corpus of GR English texts consisted of 1 million words (the study corpus). The approach was based on measuring effect size, i.e. the percent difference of the frequency of a word in the study corpus when compared to that in the reference corpus. The results obtained proved significant improvement in the process of concept selection. The corpus-based model also facilitated theoretical elaboration of teaching materials. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus-based%20study" title="corpus-based study">corpus-based study</a>, <a href="https://publications.waset.org/abstracts/search?q=English%20as%20the%20default%20language" title=" English as the default language"> English as the default language</a>, <a href="https://publications.waset.org/abstracts/search?q=key%20concepts" title=" key concepts"> key concepts</a>, <a href="https://publications.waset.org/abstracts/search?q=measuring%20effect%20size" title=" measuring effect size"> measuring effect size</a>, <a href="https://publications.waset.org/abstracts/search?q=model%20of%20key%20concept%20selection" title=" model of key concept selection "> model of key concept selection </a> </p> <a href="https://publications.waset.org/abstracts/42802/corpus-based-model-of-key-concepts-selection-for-the-master-english-language-course-government-relations" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/42802.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">306</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">597</span> OPEN-EmoRec-II-A Multimodal Corpus of Human-Computer Interaction</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Stefanie%20Rukavina">Stefanie Rukavina</a>, <a href="https://publications.waset.org/abstracts/search?q=Sascha%20Gruss"> Sascha Gruss</a>, <a href="https://publications.waset.org/abstracts/search?q=Steffen%20Walter"> Steffen Walter</a>, <a href="https://publications.waset.org/abstracts/search?q=Holger%20Hoffmann"> Holger Hoffmann</a>, <a href="https://publications.waset.org/abstracts/search?q=Harald%20C.%20Traue"> Harald C. Traue</a> </p> <p class="card-text"><strong>Abstract:</strong></p> OPEN-EmoRecII is an open multimodal corpus with experimentally induced emotions. In the first half of the experiment, emotions were induced with standardized picture material and in the second half during a human-computer interaction (HCI), realized with a wizard-of-oz design. The induced emotions are based on the dimensional theory of emotions (valence, arousal and dominance). These emotional sequences - recorded with multimodal data (mimic reactions, speech, audio and physiological reactions) during a naturalistic-like HCI-environment one can improve classification methods on a multimodal level. This database is the result of an HCI-experiment, for which 30 subjects in total agreed to a publication of their data including the video material for research purposes. The now available open corpus contains sensory signal of: video, audio, physiology (SCL, respiration, BVP, EMG Corrugator supercilii, EMG Zygomaticus Major) and mimic annotations. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=open%20multimodal%20emotion%20corpus" title="open multimodal emotion corpus">open multimodal emotion corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=annotated%20labels" title=" annotated labels"> annotated labels</a>, <a href="https://publications.waset.org/abstracts/search?q=intelligent%20interaction" title=" intelligent interaction"> intelligent interaction</a> </p> <a href="https://publications.waset.org/abstracts/29365/open-emorec-ii-a-multimodal-corpus-of-human-computer-interaction" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/29365.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">416</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">596</span> Compilation and Statistical Analysis of an Arabic-English Legal Corpus in Sketch Engine</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=C.%20Brierley">C. Brierley</a>, <a href="https://publications.waset.org/abstracts/search?q=H.%20El-Farahaty"> H. El-Farahaty</a>, <a href="https://publications.waset.org/abstracts/search?q=A.%20Farhan"> A. Farhan</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The Leeds Parallel Corpus of Arabic-English Constitutions is a parallel corpus for the Arabic legal domain. Analysis of legal language via Corpus Linguistics techniques is an important development. In legal proceedings, a corpus-based approach to disambiguating meaning is set to replace the dictionary as an interpretative tool, and legal scholarship in the States is now attuned to the potential for Text Analytics over vast quantities of text-based legal material, following the business and medical industries. This trend is reflected in Europe: the interdisciplinary research group in Computer Assisted Legal Linguistics mines big data collections of legal and non-legal texts to analyse: legal interpretations; legal discourse; the comprehensibility of legal texts; conflict resolution; and linguistic human rights. This paper focuses on ‘dignity’ as an important aspect of the overarching concept of human rights in current constitutions across the Arab world. We have compiled a parallel, Arabic-English raw text corpus (169,861 Arabic words and 205,893 English words) from reputable websites such as the World Intellectual Property Organisation and CONSTITUTE, and uploaded and queried our corpus in Sketch Engine. Our most challenging task was sentence-level alignment of Arabic-English data. This entailed manual intervention to ensure correspondence on a one-to-many basis since Arabic sentences differ from English in length and punctuation. We have searched for morphological variants of ‘dignity’ (رامة ك, karāma) in the Arabic data and inspected their English translation equivalents. The term occurs most frequently in the Sudanese constitution (10 instances), and not at all in the constitution of Palestine. Its most frequent collocate, determined via the logDice statistic in Sketch Engine, is ‘human’ as in ‘human dignity’. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Arabic%20constitution" title="Arabic constitution">Arabic constitution</a>, <a href="https://publications.waset.org/abstracts/search?q=corpus-based%20legal%20linguistics" title=" corpus-based legal linguistics"> corpus-based legal linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=human%20rights" title=" human rights"> human rights</a>, <a href="https://publications.waset.org/abstracts/search?q=parallel%20Arabic-English%20legal%20corpora" title=" parallel Arabic-English legal corpora"> parallel Arabic-English legal corpora</a> </p> <a href="https://publications.waset.org/abstracts/83004/compilation-and-statistical-analysis-of-an-arabic-english-legal-corpus-in-sketch-engine" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/83004.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">183</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">595</span> Particle Swarm Optimization Based Method for Minimum Initial Marking in Labeled Petri Nets</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Hichem%20Kmimech">Hichem Kmimech</a>, <a href="https://publications.waset.org/abstracts/search?q=Achref%20Jabeur%20Telmoudi"> Achref Jabeur Telmoudi</a>, <a href="https://publications.waset.org/abstracts/search?q=Lotfi%20Nabli"> Lotfi Nabli</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The estimation of the initial marking minimum (MIM) is a crucial problem in labeled Petri nets. In the case of multiple choices, the search for the initial marking leads to a problem of optimization of the minimum allocation of resources with two constraints. The first concerns the firing sequence that could be legal on the initial marking with respect to the firing vector. The second deals with the total number of tokens that can be minimal. In this article, the MIM problem is solved by the meta-heuristic particle swarm optimization (PSO). The proposed approach presents the advantages of PSO to satisfy the two previous constraints and find all possible combinations of minimum initial marking with the best computing time. This method, more efficient than conventional ones, has an excellent impact on the resolution of the MIM problem. We prove through a set of definitions, lemmas, and examples, the effectiveness of our approach. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=marking" title="marking">marking</a>, <a href="https://publications.waset.org/abstracts/search?q=production%20system" title=" production system"> production system</a>, <a href="https://publications.waset.org/abstracts/search?q=labeled%20Petri%20nets" title=" labeled Petri nets"> labeled Petri nets</a>, <a href="https://publications.waset.org/abstracts/search?q=particle%20swarm%20optimization" title=" particle swarm optimization"> particle swarm optimization</a> </p> <a href="https://publications.waset.org/abstracts/98499/particle-swarm-optimization-based-method-for-minimum-initial-marking-in-labeled-petri-nets" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/98499.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">179</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">594</span> Towards Law Data Labelling Using Topic Modelling</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Daniel%20Pinheiro%20Da%20Silva%20Junior">Daniel Pinheiro Da Silva Junior</a>, <a href="https://publications.waset.org/abstracts/search?q=Aline%20Paes"> Aline Paes</a>, <a href="https://publications.waset.org/abstracts/search?q=Daniel%20De%20Oliveira"> Daniel De Oliveira</a>, <a href="https://publications.waset.org/abstracts/search?q=Christiano%20Lacerda%20Ghuerren"> Christiano Lacerda Ghuerren</a>, <a href="https://publications.waset.org/abstracts/search?q=Marcio%20Duran"> Marcio Duran</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The Courts of Accounts are institutions responsible for overseeing and point out irregularities of Public Administration expenses. They have a high demand for processes to be analyzed, whose decisions must be grounded on severity laws. Despite the existing large amount of processes, there are several cases reporting similar subjects. Thus, previous decisions on already analyzed processes can be a precedent for current processes that refer to similar topics. Identifying similar topics is an open, yet essential task for identifying similarities between several processes. Since the actual amount of topics is considerably large, it is tedious and error-prone to identify topics using a pure manual approach. This paper presents a tool based on Machine Learning and Natural Language Processing to assists in building a labeled dataset. The tool relies on Topic Modelling with Latent Dirichlet Allocation to find the topics underlying a document followed by Jensen Shannon distance metric to generate a probability of similarity between documents pairs. Furthermore, in a case study with a corpus of decisions of the Rio de Janeiro State Court of Accounts, it was noted that data pre-processing plays an essential role in modeling relevant topics. Also, the combination of topic modeling and a calculated distance metric over document represented among generated topics has been proved useful in helping to construct a labeled base of similar and non-similar document pairs. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=courts%20of%20accounts" title="courts of accounts">courts of accounts</a>, <a href="https://publications.waset.org/abstracts/search?q=data%20labelling" title=" data labelling"> data labelling</a>, <a href="https://publications.waset.org/abstracts/search?q=document%20similarity" title=" document similarity"> document similarity</a>, <a href="https://publications.waset.org/abstracts/search?q=topic%20modeling" title=" topic modeling"> topic modeling</a> </p> <a href="https://publications.waset.org/abstracts/121281/towards-law-data-labelling-using-topic-modelling" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/121281.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">179</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">593</span> Corpus-Assisted Study of Gender Related Tiger Metaphors in the Chinese Context</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Na%20Xiao">Na Xiao</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Animal metaphors have many different connotations, ranging from loving emotions to derogatory epithets, but gender expressions using animal metaphors are often imbalanced. Generally, animal metaphors related to females tend to be negative. Little known about the reasons for the negative expressions of animal female metaphors in Chinese contexts still have not been quantified. The Modern Chinese Corpus at the Center for Chinese Linguistics at Peking University (CCL Corpus) provided the data for this research, which aims to identify the influencing variables of gender differences in the description of animal metaphors mapping humans in Chinese by observing the percentage of "tiger" metaphor, which is based on the conceptual metaphor theory. A quantitative research method was used in this study to statistically examine the gender attitude percentage of the "tiger" metaphor using corpus data. This study has proved that the tiger metaphors associated with humans in the Chinese context tend to be negative. Importantly, this study has also shown that the high proportion of tiger metaphorical idioms is what causes the high proportion of negative tiger metaphors that are related to women. This finding can be used as crucial information for future studies on other gender-related animal metaphorical idioms and can offer additional insights for understanding trends in other animal metaphors. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=Chinese" title="Chinese">Chinese</a>, <a href="https://publications.waset.org/abstracts/search?q=CCL%20corpus" title=" CCL corpus"> CCL corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=gender%20differences" title=" gender differences"> gender differences</a>, <a href="https://publications.waset.org/abstracts/search?q=metaphorical%20idioms" title=" metaphorical idioms"> metaphorical idioms</a>, <a href="https://publications.waset.org/abstracts/search?q=tigers" title=" tigers"> tigers</a> </p> <a href="https://publications.waset.org/abstracts/152992/corpus-assisted-study-of-gender-related-tiger-metaphors-in-the-chinese-context" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/152992.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">109</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">592</span> A Meta Regression Analysis to Detect Price Premium Threshold for Eco-Labeled Seafood </h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Cristina%20Giosu%C3%A8">Cristina Giosuè</a>, <a href="https://publications.waset.org/abstracts/search?q=Federica%20Biondo"> Federica Biondo</a>, <a href="https://publications.waset.org/abstracts/search?q=Sergio%20Vitale"> Sergio Vitale</a> </p> <p class="card-text"><strong>Abstract:</strong></p> In the last years, the consumers' awareness for environmental concerns has been increasing, and seafood eco-labels are considered as a possible instrument to improve both seafood markets and sustainable fishing management. In this direction, the aim of this study was to carry out a meta-analysis on consumers’ willingness to pay (WTP) for eco-labeled wild seafood, by a meta-regression. Therefore, only papers published on ISI journals were searched on “Web of Knowledge” and “SciVerse Scopus” platforms, using the combinations of the following key words: seafood, ecolabel, eco-label, willingness, WTP and premium. The dataset was built considering: paper’s and survey’s codes, year of publication, first author’s nationality, species’ taxa and family, sample size, survey’s continent and country, data collection (where and how), gender and age of consumers, brand and ΔWTP. From analysis the interest on eco labeled seafood emerged clearly, in particular in developed countries. In general, consumers declared greater willingness to pay than that actually applied for eco-label products, with difference related to taxa and brand. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=eco%20label" title="eco label">eco label</a>, <a href="https://publications.waset.org/abstracts/search?q=meta%20regression" title=" meta regression"> meta regression</a>, <a href="https://publications.waset.org/abstracts/search?q=seafood" title=" seafood"> seafood</a>, <a href="https://publications.waset.org/abstracts/search?q=willingness%20to%20pay" title=" willingness to pay"> willingness to pay</a> </p> <a href="https://publications.waset.org/abstracts/122921/a-meta-regression-analysis-to-detect-price-premium-threshold-for-eco-labeled-seafood" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/122921.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">122</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">591</span> Redundancy in Malay Morphology: School Grammar versus Corpus Grammar </h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Zaharani%20Ahmad">Zaharani Ahmad</a>, <a href="https://publications.waset.org/abstracts/search?q=Nor%20Hashimah%20Jalaluddin"> Nor Hashimah Jalaluddin</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The aim of this paper is to examine and identify the issue of linguistic redundancy in two competing grammars of Malay, namely the school grammar and the corpus grammar. The former is a normative grammar which is formally and prescriptively taught in the classroom, whereas the latter is a descriptive grammar that is informally acquired and mastered by the students as native speakers of the language outside the classroom. Corpus grammar is depicted based on its actual used in natural occurring texts, as attested in the corpus. It is observed that the grammar taught in schools is incompatible with the grammar used in the corpus. For instance, a noun phrase containing nominal reduplicated form which denotes plurality (i.e. murid-murid ‘students’ which is derived from murid ‘student’) and a modifier categorized as quantifiers (i.e. semua ‘all’, seluruh ‘entire’, and kebanyakan ‘most’) is not acceptable in the school grammar because the formation (i.e. semua murid-murid ‘all the students’ kebanyakan pelajar-pelajar ‘most of the students’) is claimed to be redundant, and redundancy is prohibited in the grammar. Redundancy is generally construed as the property of speech and language by which more information is provided than is precisely required for the message to be understood, so that, if some information is omitted, the remaining information will still be sufficient for the message to be comprehended. Thus, the correct construction to be used is strictly the reduplicated form (i.e. murid-murid ‘students’) or the quantifier plus the root (i.e. semua murid ‘all the students’) with the intention that the grammatical meaning of plural is not repeated. Nevertheless, the so-called redundant form (i.e. kebanyakan pelajar-pelajar ‘most of the students’) is frequently used in the corpus grammar. This study shows that there are a number of redundant forms occur in the morphology of the language, particularly in affixation, reduplication and combination of both. Apparently, the so-called redundancy has grammatical and socio-cultural functions in communication that is to give emphasis and to stress the importance of the information delivered by the speakers or writers. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus%20grammar" title="corpus grammar">corpus grammar</a>, <a href="https://publications.waset.org/abstracts/search?q=morphology" title=" morphology"> morphology</a>, <a href="https://publications.waset.org/abstracts/search?q=redundancy" title=" redundancy"> redundancy</a>, <a href="https://publications.waset.org/abstracts/search?q=school%20grammar" title=" school grammar"> school grammar</a> </p> <a href="https://publications.waset.org/abstracts/42192/redundancy-in-malay-morphology-school-grammar-versus-corpus-grammar" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/42192.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">342</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">590</span> The Automatisation of Dictionary-Based Annotation in a Parallel Corpus of Old English</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Ana%20Elvira%20Ojanguren%20Lopez">Ana Elvira Ojanguren Lopez</a>, <a href="https://publications.waset.org/abstracts/search?q=Javier%20Martin%20Arista"> Javier Martin Arista</a> </p> <p class="card-text"><strong>Abstract:</strong></p> The aims of this paper are to present the automatisation procedure adopted in the implementation of a parallel corpus of Old English, as well as, to assess the progress of automatisation with respect to tagging, annotation, and lemmatisation. The corpus consists of an aligned parallel text with word-for-word comparison Old English-English that provides the Old English segment with inflectional form tagging (gloss, lemma, category, and inflection) and lemma annotation (spelling, meaning, inflectional class, paradigm, word-formation and secondary sources). This parallel corpus is intended to fill a gap in the field of Old English, in which no parallel and/or lemmatised corpora are available, while the average amount of corpus annotation is low. With this background, this presentation has two main parts. The first part, which focuses on tagging and annotation, selects the layouts and fields of lexical databases that are relevant for these tasks. Most information used for the annotation of the corpus can be retrieved from the lexical and morphological database Nerthus and the database of secondary sources Freya. These are the sources of linguistic and metalinguistic information that will be used for the annotation of the lemmas of the corpus, including morphological and semantic aspects as well as the references to the secondary sources that deal with the lemmas in question. Although substantially adapted and re-interpreted, the lemmatised part of these databases draws on the standard dictionaries of Old English, including The Student's Dictionary of Anglo-Saxon, An Anglo-Saxon Dictionary, and A Concise Anglo-Saxon Dictionary. The second part of this paper deals with lemmatisation. It presents the lemmatiser Norna, which has been implemented on Filemaker software. It is based on a concordance and an index to the Dictionary of Old English Corpus, which comprises around three thousand texts and three million words. In its present state, the lemmatiser Norna can assign lemma to around 80% of textual forms on an automatic basis, by searching the index and the concordance for prefixes, stems and inflectional endings. The conclusions of this presentation insist on the limits of the automatisation of dictionary-based annotation in a parallel corpus. While the tagging and annotation are largely automatic even at the present stage, the automatisation of alignment is pending for future research. Lemmatisation and morphological tagging are expected to be fully automatic in the near future, once the database of secondary sources Freya and the lemmatiser Norna have been completed. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=corpus%20linguistics" title="corpus linguistics">corpus linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=historical%20linguistics" title=" historical linguistics"> historical linguistics</a>, <a href="https://publications.waset.org/abstracts/search?q=old%20English" title=" old English"> old English</a>, <a href="https://publications.waset.org/abstracts/search?q=parallel%20corpus" title=" parallel corpus"> parallel corpus</a> </p> <a href="https://publications.waset.org/abstracts/88538/the-automatisation-of-dictionary-based-annotation-in-a-parallel-corpus-of-old-english" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/88538.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">212</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">589</span> Quality Control of 99mTc-Labeled Radiopharmaceuticals Using the Chromatography Strips</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Yasuyuki%20Takahashi">Yasuyuki Takahashi</a>, <a href="https://publications.waset.org/abstracts/search?q=Akemi%20Yoshida"> Akemi Yoshida</a>, <a href="https://publications.waset.org/abstracts/search?q=Hirotaka%20Shimada"> Hirotaka Shimada</a> </p> <p class="card-text"><strong>Abstract:</strong></p> 99mTc-2-methoxy-isobutyl-isonitrile (MIBI) and 99mTcmercaptoacetylgylcylglycyl-glycine (MAG3 ) are heat to 368-372K and are labeled with 99mTc-pertechnetate. Quality control (QC) of 99mTc-labeled radiopharmaceuticals is performed at hospitals, using liquid chromatography, which is difficult to perform in general hospitals. We used chromatography strips to simplify QC and investigated the effects of the test procedures on quality control. In this study is 99mTc- MAG3. Solvent using chloroform + acetone + tetrahydrofuran, and the gamma counter was ARC-380CL. The changed conditions are as follows; heating temperature, resting time after labeled, and expiration year for use: which were 293, 313, 333, 353 and 372K; 15 min (293K and 372K) and 1 hour (293K); and 2011, 2012, 2013, 2014 and 2015 respectively were tested. Measurement time using the gamma counter was one minute. A nuclear medical clinician decided the quality of the preparation in judging the usability of the retest agent. Two people conducted the test procedure twice, in order to compare reproducibility. The percentage of radiochemical purity (% RCP) was approximately 50% under insufficient heat treatment, which improved as the temperature and heating time increased. Moreover, the % RCP improved with time even under low temperatures. Furthermore, there was no deterioration with time after the expiration date. The objective of these tests was to determine soluble 99mTc impurities, including 99mTc-pertechnetate and the hydrolyzed-reduced 99mTc. Therefore, we assumed that insufficient heating and heating to operational errors in the labeling. It is concluded that quality control is a necessary procedure in nuclear medicine to ensure safe scanning. It is suggested that labeling is necessary to identify specifications. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=quality%20control" title="quality control">quality control</a>, <a href="https://publications.waset.org/abstracts/search?q=tc-99m%20labeled%20radio-pharmaceutical" title=" tc-99m labeled radio-pharmaceutical"> tc-99m labeled radio-pharmaceutical</a>, <a href="https://publications.waset.org/abstracts/search?q=chromatography%20strip" title=" chromatography strip"> chromatography strip</a>, <a href="https://publications.waset.org/abstracts/search?q=nuclear%20medicine" title=" nuclear medicine"> nuclear medicine</a> </p> <a href="https://publications.waset.org/abstracts/51516/quality-control-of-99mtc-labeled-radiopharmaceuticals-using-the-chromatography-strips" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/51516.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">322</span> </span> </div> </div> <div class="card paper-listing mb-3 mt-3"> <h5 class="card-header" style="font-size:.9rem"><span class="badge badge-info">588</span> Tagging a corpus of Media Interviews with Diplomats: Challenges and Solutions</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Roberta%20Facchinetti">Roberta Facchinetti</a>, <a href="https://publications.waset.org/abstracts/search?q=Sara%20Corrizzato"> Sara Corrizzato</a>, <a href="https://publications.waset.org/abstracts/search?q=Silvia%20Cavalieri"> Silvia Cavalieri</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Increasing interconnection between data digitalization and linguistic investigation has given rise to unprecedented potentialities and challenges for corpus linguists, who need to master IT tools for data analysis and text processing, as well as to develop techniques for efficient and reliable annotation in specific mark-up languages that encode documents in a format that is both human and machine-readable. In the present paper, the challenges emerging from the compilation of a linguistic corpus will be taken into consideration, focusing on the English language in particular. To do so, the case study of the InterDiplo corpus will be illustrated. The corpus, currently under development at the University of Verona (Italy), represents a novelty in terms both of the data included and of the tag set used for its annotation. The corpus covers media interviews and debates with diplomats and international operators conversing in English with journalists who do not share the same lingua-cultural background as their interviewees. To date, this appears to be the first tagged corpus of international institutional spoken discourse and will be an important database not only for linguists interested in corpus analysis but also for experts operating in international relations. In the present paper, special attention will be dedicated to the structural mark-up, parts of speech annotation, and tagging of discursive traits, that are the innovational parts of the project being the result of a thorough study to find the best solution to suit the analytical needs of the data. Several aspects will be addressed, with special attention to the tagging of the speakers’ identity, the communicative events, and anthropophagic. Prominence will be given to the annotation of question/answer exchanges to investigate the interlocutors’ choices and how such choices impact communication. Indeed, the automated identification of questions, in relation to the expected answers, is functional to understand how interviewers elicit information as well as how interviewees provide their answers to fulfill their respective communicative aims. A detailed description of the aforementioned elements will be given using the InterDiplo-Covid19 pilot corpus. The data yielded by our preliminary analysis of the data will highlight the viable solutions found in the construction of the corpus in terms of XML conversion, metadata definition, tagging system, and discursive-pragmatic annotation to be included via Oxygen. <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=spoken%20corpus" title="spoken corpus">spoken corpus</a>, <a href="https://publications.waset.org/abstracts/search?q=diplomats%E2%80%99%20interviews" title=" diplomats’ interviews"> diplomats’ interviews</a>, <a href="https://publications.waset.org/abstracts/search?q=tagging%20system" title=" tagging system"> tagging system</a>, <a href="https://publications.waset.org/abstracts/search?q=discursive-pragmatic%20annotation" title=" discursive-pragmatic annotation"> discursive-pragmatic annotation</a>, <a href="https://publications.waset.org/abstracts/search?q=english%20linguistics" title=" english linguistics"> english linguistics</a> </p> <a href="https://publications.waset.org/abstracts/143495/tagging-a-corpus-of-media-interviews-with-diplomats-challenges-and-solutions" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/143495.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">185</span> </span> </div> </div> <ul class="pagination"> <li class="page-item disabled"><span class="page-link">‹</span></li> <li class="page-item active"><span class="page-link">1</span></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=2">2</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=3">3</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=4">4</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=5">5</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=6">6</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=7">7</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=8">8</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=9">9</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=10">10</a></li> <li class="page-item disabled"><span class="page-link">...</span></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=20">20</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=21">21</a></li> <li class="page-item"><a class="page-link" href="https://publications.waset.org/abstracts/search?q=labeled%20corpus&page=2" rel="next">›</a></li> </ul> </div> </main> <footer> <div id="infolinks" class="pt-3 pb-2"> <div class="container"> <div style="background-color:#f5f5f5;" class="p-3"> <div class="row"> <div class="col-md-2"> <ul class="list-unstyled"> About <li><a href="https://waset.org/page/support">About Us</a></li> <li><a href="https://waset.org/page/support#legal-information">Legal</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/WASET-16th-foundational-anniversary.pdf">WASET celebrates its 16th foundational anniversary</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Account <li><a href="https://waset.org/profile">My Account</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Explore <li><a href="https://waset.org/disciplines">Disciplines</a></li> <li><a href="https://waset.org/conferences">Conferences</a></li> <li><a href="https://waset.org/conference-programs">Conference Program</a></li> <li><a href="https://waset.org/committees">Committees</a></li> <li><a href="https://publications.waset.org">Publications</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Research <li><a href="https://publications.waset.org/abstracts">Abstracts</a></li> <li><a href="https://publications.waset.org">Periodicals</a></li> <li><a href="https://publications.waset.org/archive">Archive</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Open Science <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Philosophy.pdf">Open Science Philosophy</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Award.pdf">Open Science Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Society-Open-Science-and-Open-Innovation.pdf">Open Innovation</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Postdoctoral-Fellowship-Award.pdf">Postdoctoral Fellowship Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Scholarly-Research-Review.pdf">Scholarly Research Review</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Support <li><a href="https://waset.org/page/support">Support</a></li> <li><a href="https://waset.org/profile/messages/create">Contact Us</a></li> <li><a href="https://waset.org/profile/messages/create">Report Abuse</a></li> </ul> </div> </div> </div> </div> </div> <div class="container text-center"> <hr style="margin-top:0;margin-bottom:.3rem;"> <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" class="text-muted small">Creative Commons Attribution 4.0 International License</a> <div id="copy" class="mt-2">© 2024 World Academy of Science, Engineering and Technology</div> </div> </footer> <a href="javascript:" id="return-to-top"><i class="fas fa-arrow-up"></i></a> <div class="modal" id="modal-template"> <div class="modal-dialog"> <div class="modal-content"> <div class="row m-0 mt-1"> <div class="col-md-12"> <button type="button" class="close" data-dismiss="modal" aria-label="Close"><span aria-hidden="true">×</span></button> </div> </div> <div class="modal-body"></div> </div> </div> </div> <script src="https://cdn.waset.org/static/plugins/jquery-3.3.1.min.js"></script> <script src="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/js/bootstrap.bundle.min.js"></script> <script src="https://cdn.waset.org/static/js/site.js?v=150220211556"></script> <script> jQuery(document).ready(function() { /*jQuery.get("https://publications.waset.org/xhr/user-menu", function (response) { jQuery('#mainNavMenu').append(response); });*/ jQuery.get({ url: "https://publications.waset.org/xhr/user-menu", cache: false }).then(function(response){ jQuery('#mainNavMenu').append(response); }); }); </script> </body> </html>