CINXE.COM

{"title":"Urdu Nastaleeq Optical Character Recognition","authors":"Zaheer Ahmad, Jehanzeb Khan Orakzai, Inam Shamsher, Awais Adnan","volume":8,"journal":"International Journal of Computer and Information Engineering","pagesStart":2380,"pagesEnd":2384,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/1702","abstract":"This paper discusses the Urdu script characteristics,\r\nUrdu Nastaleeq and a simple but a novel and robust technique to\r\nrecognize the printed Urdu script without a lexicon. Urdu being a\r\nfamily of Arabic script is cursive and complex script in its nature, the\r\nmain complexity of Urdu compound\/connected text is not its\r\nconnections but the forms\/shapes the characters change when it is\r\nplaced at initial, middle or at the end of a word. The characters\r\nrecognition technique presented here is using the inherited\r\ncomplexity of Urdu script to solve the problem. A word is scanned\r\nand analyzed for the level of its complexity, the point where the level\r\nof complexity changes is marked for a character, segmented and\r\nfeeded to Neural Networks. A prototype of the system has been\r\ntested on Urdu text and currently achieves 93.4% accuracy on the\r\naverage.","references":"[1] U. Pal and Anirban Sarkar, \"Recognition of Printed Urdu Script\",\r\n\"Proceedings of the Seventh International Conference on Document\r\nAnalysis and Recognition (ICDAR 2003)\".\r\n[2] Raymond G. Gordon, \"Ethnologue: Languages of the World Fifteenth\r\nEdition\" SIL International, 2005.\r\n[3] Khalid Saeed, \"New Approaches for Cursive Languages Recognition:\r\nMachine and Hand Written Script and Texts\".\r\n[4] K. Saeed, Three-Agent System for Cursive Script Recognition, \" Proc.\r\nCVPRIP \u00d4\u00c7\u00ff2000 Computer Vsion, Pattern Recognition and Image\r\nProcessing-5th Joint Conf. on Information Sciences, JCIS-200, Vol.2,\r\nPP.244-247, Feb 27-March 3, N.Jersry 2000.\r\n[5] K. Saeed, R Niedzielski, \"Experiments on Thinning of Cursive-Style\r\nAlphabets, \"Inter Conf. on information Technologies ITESB -99, June\r\n24-25, Minsk 1999.\r\n[6] Inam shamsheer, Zaheer Ahmad, Jehanzeb Khan Orakzai, Awais\r\nAdnan, \"OCR For Printed Urdu Script Using Feed Forward Neural\r\nNetwork,\" MLPR 2007 :International Conference on Machine Learning\r\nand Pattern Recognition\", 2007.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 8, 2007"}