CINXE.COM
A Comparative Analysis of Machine Learning Models in News Categorization | U.Porto Journal of Engineering
<!DOCTYPE html> <html lang="en" xml:lang="en"> <head> <meta charset="utf-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title> A Comparative Analysis of Machine Learning Models in News Categorization | U.Porto Journal of Engineering </title> <link rel="icon" href="https://journalengineering.fe.up.pt/public/journals/3/favicon_en_US.png"> <meta name="generator" content="Open Journal Systems 3.4.0.3"> <link rel="schema.DC" href="http://purl.org/dc/elements/1.1/" /> <meta name="DC.Creator.PersonalName" content="Mohammad Hossein Zolfagharnasab"/> <meta name="DC.Creator.PersonalName" content="Siavash Damari"/> <meta name="DC.Date.created" scheme="ISO8601" content="2024-07-10"/> <meta name="DC.Date.dateSubmitted" scheme="ISO8601" content="2023-11-22"/> <meta name="DC.Date.issued" scheme="ISO8601" content="2024-07-10"/> <meta name="DC.Date.modified" scheme="ISO8601" content="2024-07-10"/> <meta name="DC.Description" xml:lang="en" content="The constant stream of news nowadays highlights the necessity for meticulous assessment to ensure that the information accurately reaches its intended audience with the least amount of delay least delay. Despite the flexibility and efficiency of Deep Learning (DL) models, their intricate training and substantial resource demands pose significant challenges for their deployment in real-time applications. In this regard, this study evaluates the performance of resource-efficient Machine Learning (ML) techniques – Multinomial Naive Bayes (MNB), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) – in categorizing news. Based on the results, all the evaluated models attain a commendable level of accuracy in news categorization. Notably, the SVM excels, achieving an accuracy rate of 98% and a mean squared error of 0.28. This performance exemplifies the robust effectiveness of classical ML models in the categorization of news, particularly when enhanced by a suitably tailored preprocessing pipeline."/> <meta name="DC.Format" scheme="IMT" content="application/pdf"/> <meta name="DC.Identifier" content="2183-6493_0010-003_002464"/> <meta name="DC.Identifier.pageNumber" content="23-38"/> <meta name="DC.Identifier.DOI" content="10.24840/2183-6493_0010-003_002464"/> <meta name="DC.Identifier.URI" content="https://journalengineering.fe.up.pt/index.php/upjeng/article/view/2183-6493_0010-003_002464"/> <meta name="DC.Language" scheme="ISO639-1" content="en"/> <meta name="DC.Rights" content="Copyright (c) 2024 Mohammad Hossein Zolfagharnasab, Siavash Damari"/> <meta name="DC.Rights" content="https://creativecommons.org/licenses/by/4.0"/> <meta name="DC.Source" content="U.Porto Journal of Engineering"/> <meta name="DC.Source.ISSN" content="2183-6493"/> <meta name="DC.Source.Issue" content="3"/> <meta name="DC.Source.Volume" content="10"/> <meta name="DC.Source.URI" content="https://journalengineering.fe.up.pt/index.php/upjeng"/> <meta name="DC.Subject" xml:lang="en" content="Real-time Content Classification"/> <meta name="DC.Subject" xml:lang="en" content="News Categorization"/> <meta name="DC.Subject" xml:lang="en" content=" Natural Language Processing"/> <meta name="DC.Subject" xml:lang="en" content="Machine Learning"/> <meta name="DC.Title" content="A Comparative Analysis of Machine Learning Models in News Categorization"/> <meta name="DC.Type" content="Text.Serial.Journal"/> <meta name="DC.Type.articleType" content="PEC202324_01"/> <meta name="gs_meta_revision" content="1.1"/> <meta name="citation_journal_title" content="U.Porto Journal of Engineering"/> <meta name="citation_journal_abbrev" content="UPjeng"/> <meta name="citation_issn" content="2183-6493"/> <meta name="citation_author" content="Mohammad Hossein Zolfagharnasab"/> <meta name="citation_author_institution" content="University of Porto, Faculty of Engineering. "/> <meta name="citation_author" content="Siavash Damari"/> <meta name="citation_author_institution" content="University of Allameh Tabataba'i, Department of Statistics, Mathematics, and Computer Science."/> <meta name="citation_title" content="A Comparative Analysis of Machine Learning Models in News Categorization"/> <meta name="citation_language" content="en"/> <meta name="citation_date" content="2024/07/10"/> <meta name="citation_volume" content="10"/> <meta name="citation_issue" content="3"/> <meta name="citation_firstpage" content="23"/> <meta name="citation_lastpage" content="38"/> <meta name="citation_doi" content="10.24840/2183-6493_0010-003_002464"/> <meta name="citation_abstract_html_url" content="https://journalengineering.fe.up.pt/index.php/upjeng/article/view/2183-6493_0010-003_002464"/> <meta name="citation_abstract" xml:lang="en" content="The constant stream of news nowadays highlights the necessity for meticulous assessment to ensure that the information accurately reaches its intended audience with the least amount of delay least delay. Despite the flexibility and efficiency of Deep Learning (DL) models, their intricate training and substantial resource demands pose significant challenges for their deployment in real-time applications. In this regard, this study evaluates the performance of resource-efficient Machine Learning (ML) techniques – Multinomial Naive Bayes (MNB), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) – in categorizing news. Based on the results, all the evaluated models attain a commendable level of accuracy in news categorization. Notably, the SVM excels, achieving an accuracy rate of 98% and a mean squared error of 0.28. This performance exemplifies the robust effectiveness of classical ML models in the categorization of news, particularly when enhanced by a suitably tailored preprocessing pipeline."/> <meta name="citation_keywords" xml:lang="en" content="Real-time Content Classification"/> <meta name="citation_keywords" xml:lang="en" content="News Categorization"/> <meta name="citation_keywords" xml:lang="en" content=" Natural Language Processing"/> <meta name="citation_keywords" xml:lang="en" content="Machine Learning"/> <meta name="citation_pdf_url" content="https://journalengineering.fe.up.pt/index.php/upjeng/article/download/2183-6493_0010-003_002464/2183-6493_0010-003_002464"/> <link rel="alternate" type="application/atom+xml" href="https://journalengineering.fe.up.pt/index.php/upjeng/gateway/plugin/APP%5Cplugins%5Cgeneric%5CwebFeed%5CWebFeedGatewayPlugin/atom"> <link rel="alternate" type="application/rdf+xml" href="https://journalengineering.fe.up.pt/index.php/upjeng/gateway/plugin/APP%5Cplugins%5Cgeneric%5CwebFeed%5CWebFeedGatewayPlugin/rss"> <link rel="alternate" type="application/rss+xml" href="https://journalengineering.fe.up.pt/index.php/upjeng/gateway/plugin/APP%5Cplugins%5Cgeneric%5CwebFeed%5CWebFeedGatewayPlugin/rss2"> <link rel="stylesheet" href="https://journalengineering.fe.up.pt/index.php/upjeng/$$$call$$$/page/page/css?name=bootstrap" type="text/css" /> </head> <body class="pkp_page_article pkp_op_view has_site_logo"> <div class="pkp_structure_page"> <nav id="accessibility-nav" class="sr-only" role="navigation" aria-label="Quick jump to page content"> <ul> <li><a href="#main-navigation">Main Navigation</a></li> <li><a href="#main-content">Main Content</a></li> <li><a href="#sidebar">Sidebar</a></li> </ul> </nav> <header class="navbar navbar-default" id="headerNavigationContainer" role="banner"> <div class="container-fluid"> <div class="row"> <nav aria-label="User Navigation"> <ul id="navigationUser" class="nav nav-pills tab-list pull-right"> <li class=" menu-item-1"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/user/register"> Register </a> </li> <li class=" menu-item-2"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/login"> Login </a> </li> </ul> </nav> </div><!-- .row --> </div><!-- .container-fluid --> <div class="container-fluid"> <div class="navbar-header"> <button type="button" class="navbar-toggle collapsed" data-toggle="collapse" data-target="#nav-menu" aria-expanded="false" aria-controls="nav-menu"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> <div class="site-name"> <a href=" https://journalengineering.fe.up.pt/index.php/upjeng/index " class="navbar-brand navbar-brand-logo"> <img src="https://journalengineering.fe.up.pt/public/journals/3/pageHeaderLogoImage_en_US.png" alt="U. Porto Journal of Engineering"> </a> </div> </div> <nav id="nav-menu" class="navbar-collapse collapse" aria-label="Site Navigation"> <ul id="main-navigation" class="nav navbar-nav"> <li class=" menu-item-8"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/issue/current"> Current </a> </li> <li class=" menu-item-9"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/issue/archive"> Archives </a> </li> <li class=" menu-item-11"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/about"> About </a> </li> <li class=" menu-item-14"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/about/editorialTeam"> Editorial Team </a> </li> <li class=" menu-item-13"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/about/submissions"> Submissions </a> </li> <li class=" menu-item-53"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/openaccess"> Open Access Policy </a> </li> <li class=" menu-item-54"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/peerreview"> Peer Review </a> </li> <li class=" menu-item-55"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/ethics"> Publication Ethics </a> </li> <li class=" menu-item-15"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/about/contact"> Contact </a> </li> </ul> <div class="pull-md-right"> <form class="navbar-form navbar-left" role="search" method="post" action="https://journalengineering.fe.up.pt/index.php/upjeng/search/search"> <div class="form-group"> <input class="form-control" name="query" value="" type="search" aria-label="Search Query" placeholder=""> </div> <button type="submit" class="btn btn-default">Search</button> </form> </div> </nav> </div><!-- .pkp_head_wrapper --> </header><!-- .pkp_structure_head --> <div class="pkp_structure_content container"> <main class="pkp_structure_main col-xs-12 col-sm-10 col-md-12" role="main"> <div class="page page_article"> <nav class="cmp_breadcrumbs" role="navigation" aria-label="You are here:"> <ol class="breadcrumb"> <li> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/index"> Home </a> </li> <li> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/issue/archive"> Archives </a> </li> <li> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/issue/view/2183-6493_0010-003"> Vol. 10 No. 3 (2024) </a> </li> <li class="active"> PEC202324_01 </li> </ol> </nav> <article class="article-details"> <header> <h1 class="page-header"> A Comparative Analysis of Machine Learning Models in News Categorization </h1> </header> <div class="row"> <section class="article-sidebar col-md-4"> <h2 class="sr-only">Article Sidebar</h2> <div class="cover-image"> <img class="img-responsive" src="https://journalengineering.fe.up.pt/public/journals/3/submission_2464_2638_coverImage_en.jpg" alt="" > </div> <div class="download"> <a class="galley-link btn btn-primary pdf" role="button" href="https://journalengineering.fe.up.pt/index.php/upjeng/article/view/2183-6493_0010-003_002464/2183-6493_0010-003_002464"> PDF </a> </div> <div class="list-group"> <div class="list-group-item date-published"> <strong>Published:</strong> Jul 10, 2024 </div> <div class="list-group-item doi"> <strong>DOI:</strong> <a href="https://doi.org/10.24840/2183-6493_0010-003_002464"> https://doi.org/10.24840/2183-6493_0010-003_002464 </a> </div> <div class="list-group-item issue"> <strong> Issue: </strong> <span class=""> <a class="title" href="https://journalengineering.fe.up.pt/index.php/upjeng/issue/view/2183-6493_0010-003"> Vol. 10 No. 3 (2024) </a> </span> </div> <div class="list-group-item keywords"> <strong> Keywords:</strong> <div class=""> <span class="value"> Real-time Content Classification, News Categorization, Natural Language Processing, Machine Learning </span> </div> </div> </div> </section><!-- .article-sidebar --> <div class="col-md-8"> <section class="article-main"> <h2 class="sr-only">Main Article Content</h2> <div class="authors"> <div class="author"> <strong>Mohammad Hossein Zolfagharnasab</strong> <div class="article-author-affilitation"> University of Porto, Faculty of Engineering. </div> <div class="orcid"> <a href="https://orcid.org/0000-0001-6124-7507" target="_blank"> https://orcid.org/0000-0001-6124-7507 </a> </div> </div> <div class="author"> <strong>Siavash Damari</strong> <div class="article-author-affilitation"> University of Allameh Tabataba'i, Department of Statistics, Mathematics, and Computer Science. </div> <div class="orcid"> <a href="https://orcid.org/0009-0002-8486-9549" target="_blank"> https://orcid.org/0009-0002-8486-9549 </a> </div> </div> </div> <div class="article-summary" id="summary"> <h2>Abstract</h2> <div class="article-abstract"> <p>The constant stream of news nowadays highlights the necessity for meticulous assessment to ensure that the information accurately reaches its intended audience with the least amount of delay least delay. Despite the flexibility and efficiency of Deep Learning (DL) models, their intricate training and substantial resource demands pose significant challenges for their deployment in real-time applications. In this regard, this study evaluates the performance of resource-efficient Machine Learning (ML) techniques – <a></a>Multinomial Naive Bayes (MNB), Random Forest (RF), Support Vector Machine (SVM), and Logistic Regression (LR) – in categorizing news. Based on the results, all the evaluated models attain a commendable level of accuracy in news categorization. Notably, the SVM excels, achieving an accuracy rate of 98% and a mean squared error of 0.28. This performance exemplifies the robust effectiveness of classical ML models in the categorization of news, particularly when enhanced by a suitably tailored preprocessing pipeline.</p> </div> </div> <section class="item downloads_chart"> <h2 class="label"> Downloads </h2> <div class="value"> <canvas class="usageStatsGraph" data-object-type="Submission" data-object-id="2464"></canvas> <div class="usageStatsUnavailable" data-object-type="Submission" data-object-id="2464"> Download data is not yet available. </div> </div> </section> </section><!-- .article-main --> <section class="article-more-details"> <h2 class="sr-only">Article Details</h2> <!--<div class="panel panel-default issue"> <div class="panel-heading"> Issue </div> <div class="panel-body"> <a class="title" href="https://journalengineering.fe.up.pt/index.php/upjeng/issue/view/2183-6493_0010-003"> Vol. 10 No. 3 (2024) </a> </div> </div>--> <!-- <div class="panel panel-default section"> <div class="panel-heading"> Section </div> <div class="panel-body"> PEC202324_01 </div> </div> --> <div class="panel panel-default copyright"> <div class="panel-body"> <a rel="license" href="https://creativecommons.org/licenses/by/4.0/"><img alt="Creative Commons License" src="//i.creativecommons.org/l/by/4.0/88x31.png" /></a><p>This work is licensed under a <a rel="license" href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</a>.</p> <p>Authors who publish with this journal agree to the following terms:</p> <ol type="a"> <li>Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a <a href="http://creativecommons.org/licenses/by/4.0/" target="_blank">Creative Commons Attribution License</a> that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.</li> <li>Authors grant the journal the rights to provide the article in all forms and media so the article can be used on the latest technology even after publication and ensure its long-term preservation.</li> <li>Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.</li> <li>Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work (See <a href="http://opcit.eprints.org/oacitation-biblio.html" target="_blank">The Effect of Open Access</a>).</li> </ol> </div> </div> <div class="panel panel-default author-bios"> <div class="panel-heading"> Author Biographies </div> <div class="panel-body"> <div class="media biography"> <div class="media-body"> <h3 class="media-heading biography-author"> Mohammad Hossein Zolfagharnasab, <span class="affiliation">University of Porto, Faculty of Engineering. </span> </h3> <p>PhD student, Department of Electrical and Computer Engineering,</p> <p>Faculty of Engineering, University of Porto,</p> <p>Rua Dr. Roberto Frias, 4200-465 PORTO, Portugal</p> <p> </p> </div> </div> <div class="media biography"> <div class="media-body"> <h3 class="media-heading biography-author"> Siavash Damari, <span class="affiliation">University of Allameh Tabataba'i, Department of Statistics, Mathematics, and Computer Science.</span> </h3> <p>Master student,</p> <p>Department of Statistics, Mathematics, and Computer Science,</p> <p>University of Allameh Tabataba'i,</p> <p>Western Azadi Stadium Blvd, Tehran, Iran</p> </div> </div> </div> </div> </section><!-- .article-details --> </div><!-- .col-md-8 --> </div><!-- .row --> </article> </div><!-- .page --> </main> </div><!-- pkp_structure_content --> <footer class="footer" role="contentinfo"> <div class="container"> <div class="row"> <div class="col-md-10"> <p><small><a href="https://www.fe.up.pt" target="_blank" rel="noopener">Faculdade de Engenharia da Universidade do Porto</a><br /><a href="https://biblioteca.fe.up.pt" target="_blank" rel="noopener">Serviços de Documentação e Informação: Biblioteca</a><br />Rua Dr. Roberto Frias<br />4200-465 PORTO<br /><a title="Biblioteca da FEUP" href="mailto:biblioteca@fe.up.pt">biblioteca@fe.up.pt</a> |<a href="tel:+351220413805"> +351 220413805</a></small></p> <p>ISSN <a title="ROAD information" href="https://portal.issn.org/resource/ISSN/2183-6493" target="_blank" rel="noopener">2183-6493</a><br />DOI <a href="https://doi.org/10.24840/2183-6493">10.24840/2183-6493</a></p> </div> <div class="col-md-2" role="complementary"> <a href="https://journalengineering.fe.up.pt/index.php/upjeng/about/aboutThisPublishingSystem"> <img class="img-responsive" alt="More information about the publishing system, Platform and Workflow by OJS/PKP." src="https://journalengineering.fe.up.pt/templates/images/ojs_brand.png"> </a> </div> </div> <!-- .row --> </div><!-- .container --> </footer> </div><!-- pkp_structure_page --> <script src="https://journalengineering.fe.up.pt/lib/pkp/lib/vendor/components/jquery/jquery.min.js?v=3.4.0.3" type="text/javascript"></script><script src="https://journalengineering.fe.up.pt/lib/pkp/lib/vendor/components/jqueryui/jquery-ui.min.js?v=3.4.0.3" type="text/javascript"></script><script src="https://journalengineering.fe.up.pt/lib/pkp/js/lib/jquery/plugins/jquery.tag-it.js?v=3.4.0.3" type="text/javascript"></script><script src="https://journalengineering.fe.up.pt/plugins/themes/bootstrap3/bootstrap/js/bootstrap.min.js?v=3.4.0.3" type="text/javascript"></script><script src="https://journalengineering.fe.up.pt/plugins/themes/bootstrap3/bootstrap/js/cookie.js?v=3.4.0.3" type="text/javascript"></script><script type="text/javascript">var pkpUsageStats = pkpUsageStats || {};pkpUsageStats.data = pkpUsageStats.data || {};pkpUsageStats.data.Submission = pkpUsageStats.data.Submission || {};pkpUsageStats.data.Submission[2464] = {"data":{"2024":{"7":"54","8":"38","9":"35","10":"21","11":"3"}},"label":"All Downloads","color":"79,181,217","total":151};</script><script src="https://journalengineering.fe.up.pt/lib/pkp/js/lib/Chart.min.js?v=3.4.0.3" type="text/javascript"></script><script type="text/javascript">var pkpUsageStats = pkpUsageStats || {};pkpUsageStats.locale = pkpUsageStats.locale || {};pkpUsageStats.locale.months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"];pkpUsageStats.config = pkpUsageStats.config || {};pkpUsageStats.config.chartType = "line";</script><script src="https://journalengineering.fe.up.pt/lib/pkp/js/usage-stats-chart.js?v=3.4.0.3" type="text/javascript"></script><script type="text/javascript"> (function (w, d, s, l, i) { w[l] = w[l] || []; var f = d.getElementsByTagName(s)[0], j = d.createElement(s), dl = l != 'dataLayer' ? '&l=' + l : ''; j.async = true; j.src = 'https://www.googletagmanager.com/gtag/js?id=' + i + dl; f.parentNode.insertBefore(j, f); function gtag(){dataLayer.push(arguments)}; gtag('js', new Date()); gtag('config', i); }) (window, document, 'script', 'dataLayer', 'UA-26693817-4'); </script> <!-- cookies alert --> <div id="cookie_directive_container" class="container" style="display: none"> <nav class="navbar navbar-default navbar-fixed-bottom"> <div class="container"> <div class="navbar-inner navbar-content-center" id="cookie_accept"> <a href="#" class="btn btn-default pull-right">Close</a> <p class="text-muted credit"> By using our website you are consenting to our use of cookies in accordance with our <a href="/index.php/index/privacy" target="_blank">cookies policy</a>. </p> </div> </div> </nav> </div> </body> </html>