Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

<!DOCTYPE html> <html lang="en" dir="ltr"> <head>  <script async src="https://www.googletagmanager.com/gtag/js?id=G-P63WKM1TM1"></script> <script> window.dataLayer = window.dataLayer || []; function gtag(){dataLayer.push(arguments);} gtag('js', new Date()); gtag('config', 'G-P63WKM1TM1'); </script>  <script type="text/javascript" > (function(m,e,t,r,i,k,a){m[i]=m[i]||function(){(m[i].a=m[i].a||[]).push(arguments)}; m[i].l=1*new Date(); for (var j = 0; j < document.scripts.length; j++) {if (document.scripts[j].src === r) { return; }} k=e.createElement(t),a=e.getElementsByTagName(t)[0],k.async=1,k.src=r,a.parentNode.insertBefore(k,a)}) (window, document, "script", "https://mc.yandex.ru/metrika/tag.js", "ym"); ym(55165297, "init", { clickmap:false, trackLinks:true, accurateTrackBounce:true, webvisor:false }); </script> <noscript><div><img src="https://mc.yandex.ru/watch/55165297" style="position:absolute; left:-9999px;" alt="" /></div></noscript>    <title>Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores</title> <meta name="description" content="Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores"> <meta name="keywords" content="Retail stores, Faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition."> <meta name="viewport" content="width=device-width, initial-scale=1, minimum-scale=1, maximum-scale=1, user-scalable=no"> <meta charset="utf-8"> <meta name="citation_title" content="Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores"> <meta name="citation_author" content="Ankit Sinha"> <meta name="citation_author" content="Soham Banerjee"> <meta name="citation_author" content="Pratik Chattopadhyay"> <meta name="citation_publication_date" content="2023/06/14"> <meta name="citation_journal_title" content="International Journal of Computer and Information Engineering"> <meta name="citation_volume" content="17"> <meta name="citation_issue" content="6"> <meta name="citation_firstpage" content="374"> <meta name="citation_lastpage" content="381"> <meta name="citation_pdf_url" content="https://publications.waset.org/10013136/pdf"> <link href="https://cdn.waset.org/favicon.ico" type="image/x-icon" rel="shortcut icon"> <link href="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/css/bootstrap.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/plugins/fontawesome/css/all.min.css" rel="stylesheet"> <link href="https://cdn.waset.org/static/css/site.css?v=150220211555" rel="stylesheet"> </head> <body> <header> <div class="container"> <nav class="navbar navbar-expand-lg navbar-light"> <a class="navbar-brand" href="https://waset.org"> <img src="https://cdn.waset.org/static/images/wasetc.png" alt="Open Science Research Excellence" title="Open Science Research Excellence" /> </a> <button class="d-block d-lg-none navbar-toggler ml-auto" type="button" data-toggle="collapse" data-target="#navbarMenu" aria-controls="navbarMenu" aria-expanded="false" aria-label="Toggle navigation"> <span class="navbar-toggler-icon"></span> </button> <div class="w-100"> <div class="d-none d-lg-flex flex-row-reverse"> <form method="get" action="https://waset.org/search" class="form-inline my-2 my-lg-0"> <input class="form-control mr-sm-2" type="search" placeholder="Search Conferences" value="" name="q" aria-label="Search"> <button class="btn btn-light my-2 my-sm-0" type="submit"><i class="fas fa-search"></i></button> </form> </div> <div class="collapse navbar-collapse mt-1" id="navbarMenu"> <ul class="navbar-nav ml-auto align-items-center" id="mainNavMenu"> <li class="nav-item"> <a class="nav-link" href="https://waset.org/conferences" title="Conferences in 2024/2025/2026">Conferences</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/disciplines" title="Disciplines">Disciplines</a> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/committees" rel="nofollow">Committees</a> </li> <li class="nav-item dropdown"> <a class="nav-link dropdown-toggle" href="#" id="navbarDropdownPublications" role="button" data-toggle="dropdown" aria-haspopup="true" aria-expanded="false"> Publications </a> <div class="dropdown-menu" aria-labelledby="navbarDropdownPublications"> <a class="dropdown-item" href="https://publications.waset.org/abstracts">Abstracts</a> <a class="dropdown-item" href="https://publications.waset.org">Periodicals</a> <a class="dropdown-item" href="https://publications.waset.org/archive">Archive</a> </div> </li> <li class="nav-item"> <a class="nav-link" href="https://waset.org/page/support" title="Support">Support</a> </li> </ul> </div> </div> </nav> </div> </header> <main> <div class="container mt-4"> <div class="row"> <div class="col-md-9 mx-auto"> <form method="get" action="https://publications.waset.org/search"> <div id="custom-search-input"> <div class="input-group"> <i class="fas fa-search"></i> <input type="text" class="search-query" name="q" placeholder="Author, Title, Abstract, Keywords" value=""> <input type="submit" class="btn_search" value="Search"> </div> </div> </form> </div> </div> <div class="row mt-3"> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Commenced</strong> in January 2007</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Frequency:</strong> Monthly</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Edition:</strong> International</div> </div> </div> <div class="col-sm-3"> <div class="card"> <div class="card-body"><strong>Paper Count:</strong> 33093</div> </div> </div> </div> <div class="card publication-listing mt-3 mb-3"> <h5 class="card-header" style="font-size:.9rem">Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores</h5> <div class="card-body"> <p class="card-text"><strong>Authors:</strong> <a href="https://publications.waset.org/search?q=Ankit%20Sinha">Ankit Sinha</a>, <a href="https://publications.waset.org/search?q=Soham%20Banerjee"> Soham Banerjee</a>, <a href="https://publications.waset.org/search?q=Pratik%20Chattopadhyay"> Pratik Chattopadhyay</a> </p> <p class="card-text"><strong>Abstract:</strong></p> <p>Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies  the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.</p> <iframe src="https://publications.waset.org/10013136.pdf" style="width:100%; height:400px;" frameborder="0"></iframe> <p class="card-text"><strong>Keywords:</strong> <a href="https://publications.waset.org/search?q=Retail%20stores" title="Retail stores">Retail stores</a>, <a href="https://publications.waset.org/search?q=Faster-RCNN" title=" Faster-RCNN"> Faster-RCNN</a>, <a href="https://publications.waset.org/search?q=object%20localization" title=" object localization"> object localization</a>, <a href="https://publications.waset.org/search?q=ResNet-18" title=" ResNet-18"> ResNet-18</a>, <a href="https://publications.waset.org/search?q=triplet%20loss" title=" triplet loss"> triplet loss</a>, <a href="https://publications.waset.org/search?q=data%20augmentation" title=" data augmentation"> data augmentation</a>, <a href="https://publications.waset.org/search?q=product%20recognition." title=" product recognition."> product recognition.</a> </p> <a href="https://publications.waset.org/10013136/effective-stacking-of-deep-neural-models-for-automated-object-recognition-in-retail-stores" class="btn btn-primary btn-sm">Procedia</a> <a href="https://publications.waset.org/10013136/apa" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">APA</a> <a href="https://publications.waset.org/10013136/bibtex" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">BibTeX</a> <a href="https://publications.waset.org/10013136/chicago" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">Chicago</a> <a href="https://publications.waset.org/10013136/endnote" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">EndNote</a> <a href="https://publications.waset.org/10013136/harvard" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">Harvard</a> <a href="https://publications.waset.org/10013136/json" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">JSON</a> <a href="https://publications.waset.org/10013136/mla" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">MLA</a> <a href="https://publications.waset.org/10013136/ris" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">RIS</a> <a href="https://publications.waset.org/10013136/xml" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">XML</a> <a href="https://publications.waset.org/10013136/iso690" target="_blank" rel="nofollow" class="btn btn-primary btn-sm">ISO 690</a> <a href="https://publications.waset.org/10013136.pdf" target="_blank" class="btn btn-primary btn-sm">PDF</a> <span class="bg-info text-light px-1 py-1 float-right rounded"> Downloads <span class="badge badge-light">583</span> </span> <p class="card-text"><strong>References:</strong></p> <br>[1] Yuchen Wei, Son N. Tran, Shuxiang Xu, Byeong Ho Kang, and Matthew Springer. Deep learning for retail product recognition: Challenges and techniques. Computational Intelligence and Neuroscience, 2020, Article ID: 8875910, 2020. <br>[2] Bikash Santra and Dipti Prasad Mukherjee. A comprehensive survey on computer vision based approaches for automatic identification of products in retail store. Image and Vision Computing, 86:45–63, 2019. <br>[3] Alessio Tonioni, Eugenio Serra, and Luigi di Stefano. A deep learning pipeline for product recognition on store shelves. In Proceedings of the International Conference on Image Processing, Applications and Systems, pages 25–31, 2018. <br>[4] Bikash Santra, Avishek Shaw, and Dipti Prasad Mukherjee. An end-to-end annotation-free machine vision system for detection of products on the rack. Machine Vision and Applications, 32(3):1–13, 2021. <br>[5] Bikash Santra, Avishek Shaw, and Dipti Prasad Mukherjee. Graph-based non-maximal suppression for detecting products on the rack. Pattern Recognition Letters, 140:73–80, 2020. <br>[6] Bikash Santra, Avishek Shaw, and Dipti Prasad Mukherjee. Part-based annotation-free fine-grained classification of images of retail products. Pattern Recognition, 121:108257, 2022. <br>[7] Marian George and Christian Floerkemeier. Recognizing products: A per-exemplar multi-label image classification approach. In Proceedings of the European Conference on Computer Vision, pages 440–455, 2014. <br>[8] Jinjun Wang, Jianchao Yang, Kai Yu, Fengjun Lv, Thomas S. Huang, and Yihong Gong. Locality-constrained linear coding for image classification. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, pages 3360–3367, 2010. <br>[9] Wenyon Wang, Yongcheng Cui, Guangshun Li, Chuntao Jiang, and Song Deng. A self-attention-based destruction and construction learning fine-grained image classification method for retail product recognition. Neural Computing and Applications, 32(18):1–10, 2020. <br>[10] Anton Osokin, Denis Sumin, and Vasily Lomakin. Os2d: One-stage one-shot object detection by matching anchor features. In Proceedings of the European Conference on Computer Vision, pages 635–652, 2020. <br>[11] Anurag Saran, Ehtesham Hassan, and Avinash Kumar Maurya. Robust visual analysis for planogram compliance problem. In Proceedings of the IAPR International Conference on Machine Vision Applications, pages 576–579. IEEE, 2015. <br>[12] Archan Ray, Nishant Kumar, Avishek Shaw, and Dipti Prasad Mukherjee. U-pc: Unsupervised planogram compliance. In Proceedings of the European Conference on Computer Vision, pages 586–600, 2018. <br>[13] Herbert Bay, Tinne Tuytelaars, and Luc Van Gool. Surf: Speeded up robust features. In Proceedings of the European Conference on Computer Vision, 2006. <br>[14] Song Liu, W. Li, Stephen J. Davis, Christian Ritz, and Hongda Tian. Planogram compliance checking based on detection of recurring patterns. IEEE MultiMedia, 23(2):54–63, 2016. <br>[15] Alessio Tonioni and Luigi di Stefano. Product recognition in store shelves as a sub-graph isomorphism problem. In Proceedings of the International Conference on Image Analysis and Processing, pages 682–693, 2017. <br>[16] Eran Goldman and Jacob Goldberger. Large-scale classification of structured objects using a crf with deep class embedding. arXiv preprint arXiv:1705.07420, 2017. <br>[17] Ipek Baz, Erdem Y¨or¨uk, and M¨ujdat C¸ etin. Context-aware hybrid classification system for fine-grained retail product recognition. Proceedings of the Image, Video, and Multidimensional Signal Processing Workshop, pages 1–5, 2016. <br>[18] Wei dong Geng, Feilin Han, Jiangke Lin, Liuyi Zhu, Jieming Bai, Suzhen Wang, Lin He, Qiang Xiao, and Zhangjiong Lai. Fine-grained grocery product recognition by one-shot learning. Proceedings of the ACM International Conference on Multimedia, pages 1706–1714, 2018. <br>[19] Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, abs/1409.1556, 2014. <br>[20] Stefan Leutenegger, Margarita Chli, and Roland Y. Siegwart. Brisk: Binary robust invariant scalable keypoints. Proceedings of the International Conference on Computer Vision, pages 2548–2555, 2011. <br>[21] Joseph Redmon and Ali Farhadi. Yolo9000: Better, faster, stronger. arXiv preprint arXiv:1612.08242, 2016. <br>[22] Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the International Conference on Computer Vision and Pattern Recognition, pages 815–823, 2015. <br>[23] Giorgos Tolias, Ronan Sicre, and Herv´e J´egou. Particular object retrieval with integral max-pooling of cnn activations. arXiv preprint arXiv:1511.05879, 2015. <br>[24] Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, volume 28, 2015. <br>[25] Joseph Redmon, Santosh Kumar Divvala, Ross B. Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. Proceedings of the International Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016. <br>[26] Joseph Redmon and Ali Farhadi. Yolov3: An incremental improvement. arXiv preprint arXiv:1804.02767, 2018. <br>[27] W. Liu, Dragomir Anguelov, D. Erhan, Christian Szegedy, Scott E. Reed, Cheng-Yang Fu, and Alexander C. Berg. Ssd: Single shot multibox detector. In Proceedings of the European Conference on Computer Vision, pages 21–37, 2016. <br>[28] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Identity mappings in deep residual networks. In Proceedings of the European Conference on Computer Vision, pages 630–645. Springer, 2016. <br>[29] Tsung-Yi Lin, Piotr Doll´ar, Ross B. Girshick, Kaiming He, Bharath Hariharan, and Serge J. Belongie. Feature pyramid networks for object detection. Proceedings of the International Conference on Computer Vision and Pattern Recognition, pages 2117–2125, 2017. <br>[30] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael S. Bernstein, Alexander C. Berg, and Li Fei-Fei. Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3):211–252, 2015. <br>[31] Tsung-Yi Lin, Michael Maire, Serge J. Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision, 2014. <br>[32] L´eon Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of the International Conference on Computational Statistics, pages 177–186, 2010. <br>[33] Diederik P. Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014. <br>[34] Erdem Y¨or¨uk, Kaan Taha Oner, and Ceyhun Burak Akg¨ul. An efficient hough transform for multi-instance object recognition and pose estimation. Proceedings of the International Conference on Pattern Recognition, pages 1352–1357, 2016. </div> </div> </div> </main> <footer> <div id="infolinks" class="pt-3 pb-2"> <div class="container"> <div style="background-color:#f5f5f5;" class="p-3"> <div class="row"> <div class="col-md-2"> <ul class="list-unstyled"> About <li><a href="https://waset.org/page/support">About Us</a></li> <li><a href="https://waset.org/page/support#legal-information">Legal</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/WASET-16th-foundational-anniversary.pdf">WASET celebrates its 16th foundational anniversary</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Account <li><a href="https://waset.org/profile">My Account</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Explore <li><a href="https://waset.org/disciplines">Disciplines</a></li> <li><a href="https://waset.org/conferences">Conferences</a></li> <li><a href="https://waset.org/conference-programs">Conference Program</a></li> <li><a href="https://waset.org/committees">Committees</a></li> <li><a href="https://publications.waset.org">Publications</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Research <li><a href="https://publications.waset.org/abstracts">Abstracts</a></li> <li><a href="https://publications.waset.org">Periodicals</a></li> <li><a href="https://publications.waset.org/archive">Archive</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Open Science <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Philosophy.pdf">Open Science Philosophy</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Science-Award.pdf">Open Science Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Open-Society-Open-Science-and-Open-Innovation.pdf">Open Innovation</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Postdoctoral-Fellowship-Award.pdf">Postdoctoral Fellowship Award</a></li> <li><a target="_blank" rel="nofollow" href="https://publications.waset.org/static/files/Scholarly-Research-Review.pdf">Scholarly Research Review</a></li> </ul> </div> <div class="col-md-2"> <ul class="list-unstyled"> Support <li><a href="https://waset.org/page/support">Support</a></li> <li><a href="https://waset.org/profile/messages/create">Contact Us</a></li> <li><a href="https://waset.org/profile/messages/create">Report Abuse</a></li> </ul> </div> </div> </div> </div> </div> <div class="container text-center"> <hr style="margin-top:0;margin-bottom:.3rem;"> <a href="https://creativecommons.org/licenses/by/4.0/" target="_blank" class="text-muted small">Creative Commons Attribution 4.0 International License</a> <div id="copy" class="mt-2">© 2024 World Academy of Science, Engineering and Technology</div> </div> </footer> <a href="javascript:" id="return-to-top"><i class="fas fa-arrow-up"></i></a> <div class="modal" id="modal-template"> <div class="modal-dialog"> <div class="modal-content"> <div class="row m-0 mt-1"> <div class="col-md-12"> <button type="button" class="close" data-dismiss="modal" aria-label="Close"><span aria-hidden="true">×</span></button> </div> </div> <div class="modal-body"></div> </div> </div> </div> <script src="https://cdn.waset.org/static/plugins/jquery-3.3.1.min.js"></script> <script src="https://cdn.waset.org/static/plugins/bootstrap-4.2.1/js/bootstrap.bundle.min.js"></script> <script src="https://cdn.waset.org/static/js/site.js?v=150220211556"></script> <script> jQuery(document).ready(function() { /*jQuery.get("https://publications.waset.org/xhr/user-menu", function (response) { jQuery('#mainNavMenu').append(response); });*/ jQuery.get({ url: "https://publications.waset.org/xhr/user-menu", cache: false }).then(function(response){ jQuery('#mainNavMenu').append(response); }); }); </script> </body> </html>

CINXE.COM

Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores