CINXE.COM

{"title":"A Content Vector Model for Text Classification","authors":"Eric Jiang","volume":13,"journal":"International Journal of Computer and Information Engineering","pagesStart":222,"pagesEnd":227,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/11975","abstract":"As a popular rank-reduced vector space approach,\r\nLatent Semantic Indexing (LSI) has been used in information\r\nretrieval and other applications. In this paper, an LSI-based content\r\nvector model for text classification is presented, which constructs\r\nmultiple augmented category LSI spaces and classifies text by their\r\ncontent. The model integrates the class discriminative information\r\nfrom the training data and is equipped with several pertinent feature\r\nselection and text classification algorithms. The proposed classifier\r\nhas been applied to email classification and its experiments on a\r\nbenchmark spam testing corpus (PU1) have shown that the approach\r\nrepresents a competitive alternative to other email classifiers based\r\non the well-known SVM and na\u00efve Bayes algorithms.","references":"[1] Androutsopoulos, G. Paliouras, and E. Michelakis (2004). \"Learning to\r\nfilter unsolicited commercial e-mail\".Technical Report 2004\/2, NCSR\r\nDemokritos.\r\n[2] N. Christianini and J. Shawe-Taylor (2000). An introduction to Support\r\nVector Machines and other kernel-based learning methods. Cambridge\r\nUniversity Press.\r\n[3] S. Deerwester, S. Dumais, G. Furnas, T. Landauer and R. Harshman\r\n(1990) \"Indexing by Latent Semantic Analysis\". Journal of the\r\nAmerican Society for Information Science. 41, 391-409.\r\n[4] K. Gee (2003). \"Using Latent Semantic Indexing to Filter Spam\".\r\nProceedings of the 2003 ACM Symposium on Applied Computing, 460-\r\n464.\r\n[5] G. Golub and C. Van Loan (1996). Matrix Computations. John-Hopkins,\r\nBaltimore, 3rd edition.\r\n[6] E. Jiang and M. Berry (2000). \"Solving Total Least-Squares Problems in\r\nInformation Retrieval. Linear Algebra and its Applications, 316, 137-\r\n156.\r\n[7] T. Mitchell (1997). Machine Learning. McGraw-Hill.\r\n[8] J. Quinlan (1993). C 4.5: Programs for Machine Learning. Morgan\r\nKaufmann.\r\n[9] J, Rocchio (1971). \"Relevance feedback information retrieval\". The\r\nSmart retrieval system-Experiments in automatic document processing,\r\n(G. Salton ed.). Prentice-hall, 313-323.\r\n[10] R. Schapier and Y. Singer (2000). \"BoosTexter: a boosting-based system\r\nfor text categorization\". Machine Learning, 39, 2\/3, 135-168.\r\n[11] F. Sebastiani (2002). \"Machine learning in automated text\r\ncategorization\". ACM Computing Surveys 334, 1, 1-47.\r\n[12] H. Schutze, D.A. Hall and J.O. Pedersen (1995). \"A Comparison of\r\nClassifiers and Document Representations for the Routing Problem\".\r\nProceedings of SIGIR, 1995, 229-237.\r\n[13] Y. Yang and J. Pedersen (1997). \"A comparative study on feature\r\nselection in text categorization\". Proceedings of the 14th International\r\nconference on Machine Learning, 412-420.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 13, 2008"}