CINXE.COM
A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction
<!DOCTYPE html> <html lang="en" dir="ltr"> <head> <!-- Google tag (gtag.js) --> <script async src="https://www.googletagmanager.com/gtag/js?id=G-P63WKM1TM1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-P63WKM1TM1'); </script> <!-- Yandex.Metrika counter --> <script type="text/javascript" > (function(m,e,t,r,i,k,a){m[i]=m[i]||function(){(m[i].a=m[i].a||[]).push(arguments)}; m[i].l=1*new Date(); for (var j = 0; j < document.scripts.length; j++) {if (document.scripts[j].src === r) { return; }} k=e.createElement(t),a=e.getElementsByTagName(t)[0],k.async=1,k.src=r,a.parentNode.insertBefore(k,a)}) (window, document, "script", "https://mc.yandex.ru/metrika/tag.js", "ym"); ym(55165297, "init", { clickmap:false, trackLinks:true, accurateTrackBounce:true, webvisor:false }); </script> <noscript><div><img src="https://mc.yandex.ru/watch/55165297" style="position:absolute; left:-9999px;" alt="" /></div></noscript> <!-- /Yandex.Metrika counter --> <!-- Matomo --> <!-- End Matomo Code --> <title>A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction</title> <meta name="description" content="A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction"> <meta name="keywords" content="table extraction, optical character recognition, image processing, text extraction, morphological transformation"> <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, user-scalable=no"> <meta charset="utf-8"> <link href="https://cdn.waset.org/favicon.ico" type="image/x-icon" rel="shortcut icon"> <link href="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/css/bootstrap.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/plugins/fontawesome/css/all.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/css/site.css?v=150220211555" rel="stylesheet"> </head> <body> <header> <div class="container"> <nav class="navbar navbar-expand-lg navbar-light"> <a class="navbar-brand" href="https://waset.org"> <img src="https://cdn.waset.org/static/images/wasetc.png" alt="Open Science Research Excellence" title="Open Science Research Excellence" /> </a> <button class="d-block d-lg-none navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#navbarMenu" aria-controls="navbarMenu" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="w-100"> <div class="d-none d-lg-flex flex-row-reverse"> <form method="get" action="https://waset.org/search" class="form-inline my-2 my-lg-0"> <input class="form-control mr-sm-2" type="search" placeholder="Search Conferences" value="" name="q" aria-label="Search"> <button class="btn btn-light my-2 my-sm-0" type="submit"><i class="fas fa-search"></i></button> </form> </div> <div class="collapse navbar-collapse mt-1" id="navbarMenu"> <ul class="navbar-nav ml-auto align-items-center" id="mainNavMenu"> <li class="nav-item"> <a class="nav-link" href="https://waset.org/conferences" title="Conferences in 2024/2025/2026">Conferences</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/disciplines" title="Disciplines">Disciplines</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/committees" rel="nofollow">Committees</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownPublications" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> Publications </a> <div class="dropdown-menu" aria-labelledby="navbarDropdownPublications"> <a class="dropdown-item" href="https://publications.waset.org/abstracts">Abstracts</a> <a class="dropdown-item" href="https://publications.waset.org">Periodicals</a> <a class="dropdown-item" href="https://publications.waset.org/archive">Archive</a> </div> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/page/support" title="Support">Support</a> </li> </ul> </div> </div> </nav> </div> </header> <main> <div class="container mt-4"> <div class="row"> <div class="col-md-9 mx-auto"> <form method="get" action="https://publications.waset.org/abstracts/search"> <div id="custom-search-input"> <div class="input-group"> <i class="fas fa-search"></i> <input type="text" class="search-query" name="q" placeholder="Author, Title, Abstract, Keywords" value=""> <input type="submit" class="btn_search" value="Search"> </div> </div> </form> </div> </div> <div class="row mt-3"> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Commenced</strong> in January 2007</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Frequency:</strong> Monthly</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Edition:</strong> International</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Paper Count:</strong> 87354</div> </div> </div> </div> <div class="card publication-listing mt-3 mb-3"> <h5 class="card-header" style="font-size:.9rem">A Conglomerate of Multiple Optical Character Recognition Table Detection and Extraction</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/abstracts/search?q=Smita%20Pallavi">Smita Pallavi</a>, <a href="https://publications.waset.org/abstracts/search?q=Raj%20Ratn%20Pranesh"> Raj Ratn Pranesh</a>, <a href="https://publications.waset.org/abstracts/search?q=Sumit%20Kumar"> Sumit Kumar</a> </p> <p class="card-text"><strong>Abstract:</strong></p> Information representation as tables is compact and concise method that eases searching, indexing, and storage requirements. Extracting and cloning tables from parsable documents is easier and widely used; however, industry still faces challenges in detecting and extracting tables from OCR (Optical Character Recognition) documents or images. This paper proposes an algorithm that detects and extracts multiple tables from OCR document. The algorithm uses a combination of image processing techniques, text recognition, and procedural coding to identify distinct tables in the same image and map the text to appropriate the corresponding cell in dataframe, which can be stored as comma-separated values, database, excel, and multiple other usable formats. <iframe src="https://publications.waset.org/abstracts/127575.pdf" style="width:100%; height:400px;" frameborder="0"></iframe> <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/abstracts/search?q=table%20extraction" title="table extraction">table extraction</a>, <a href="https://publications.waset.org/abstracts/search?q=optical%20character%20recognition" title=" optical character recognition"> optical character recognition</a>, <a href="https://publications.waset.org/abstracts/search?q=image%20processing" title=" image processing"> image processing</a>, <a href="https://publications.waset.org/abstracts/search?q=text%20extraction" title=" text extraction"> text extraction</a>, <a href="https://publications.waset.org/abstracts/search?q=morphological%20transformation" title=" morphological transformation"> morphological transformation</a> </p> <a href="https://publications.waset.org/abstracts/127575/a-conglomerate-of-multiple-optical-character-recognition-table-detection-and-extraction" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/abstracts/127575.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">143</span> </span> </div> </div> </div> </main> <footer> <div id="infolinks" class="pt-3 pb-2"> <div class="container"> <div style="background-color:#f5f5f5;" class="p-3"> <div class="row"> <div class="col-md-2"> <ul class="list-unstyled"> About <li><a href="https://waset.org/page/support">About Us</a></li> <li><a href="https://waset.org/page/support#legal-information">Legal</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/WASET-16th-foundational-anniversary.pdf">WASET celebrates its 16th foundational anniversary</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Account <li><a href="https://waset.org/profile">My Account</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Explore <li><a href="https://waset.org/disciplines">Disciplines</a></li> <li><a href="https://waset.org/conferences">Conferences</a></li> <li><a href="https://waset.org/conference-programs">Conference Program</a></li> <li><a href="https://waset.org/committees">Committees</a></li> <li><a href="https://publications.waset.org">Publications</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Research <li><a href="https://publications.waset.org/abstracts">Abstracts</a></li> <li><a href="https://publications.waset.org">Periodicals</a></li> <li><a href="https://publications.waset.org/archive">Archive</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Open Science <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Philosophy.pdf">Open Science Philosophy</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Award.pdf">Open Science Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Society-Open-Science-and-Open-Innovation.pdf">Open Innovation</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Postdoctoral-Fellowship-Award.pdf">Postdoctoral Fellowship Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Scholarly-Research-Review.pdf">Scholarly Research Review</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Support <li><a href="https://waset.org/page/support">Support</a></li> <li><a href="https://waset.org/profile/messages/create">Contact Us</a></li> <li><a href="https://waset.org/profile/messages/create">Report Abuse</a></li> </ul> </div> </div> </div> </div> </div> <div class="container text-center"> <hr style="margin-top:0;margin-bottom:.3rem;"> <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" class="text-muted small">Creative Commons Attribution 4.0 International License</a> <div id="copy" class="mt-2">© 2024 World Academy of Science, Engineering and Technology</div> </div> </footer> <a href="javascript:" id="return-to-top"><i class="fas fa-arrow-up"></i></a> <div class="modal" id="modal-template"> <div class="modal-dialog"> <div class="modal-content"> <div class="row m-0 mt-1"> <div class="col-md-12"> <button type="button" class="close" data-dismiss="modal" aria-label="Close"><span aria-hidden="true">×</span></button> </div> </div> <div class="modal-body"></div> </div> </div> </div> <script src="https://cdn.waset.org/static/plugins/jquery-3.3.1.min.js"></script> <script src="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/js/bootstrap.bundle.min.js"></script> <script src="https://cdn.waset.org/static/js/site.js?v=150220211556"></script> <script> jQuery(document).ready(function() { /*jQuery.get("https://publications.waset.org/xhr/user-menu", function (response) { jQuery('#mainNavMenu').append(response); });*/ jQuery.get({ url: "https://publications.waset.org/xhr/user-menu", cache: false }).then(function(response){ jQuery('#mainNavMenu').append(response); }); }); </script> </body> </html>