CINXE.COM

{"title":"ISC\u2013Intelligent Subspace Clustering, A Density Based Clustering Approach for High Dimensional Dataset","authors":"Sunita Jahirabadkar, Parag Kulkarni","volume":31,"journal":"International Journal of Computer and Information Engineering","pagesStart":1682,"pagesEnd":1687,"ISSN":"1307-6892","URL":"https:\/\/publications.waset.org\/pdf\/13091","abstract":"<p>Many real-world data sets consist of a very high dimensional feature space. Most clustering techniques use the distance or similarity between objects as a measure to build clusters. But in high dimensional spaces, distances between points become relatively uniform. In such cases, density based approaches may give better results. Subspace Clustering algorithms automatically identify lower dimensional subspaces of the higher dimensional feature space in which clusters exist. In this paper, we propose a new clustering algorithm, ISC &ndash; Intelligent Subspace Clustering, which tries to overcome three major limitations of the existing state-of-art techniques. ISC determines the input parameter such as \u0454 &ndash; distance at various levels of Subspace Clustering which helps in finding meaningful clusters. The uniform parameters approach is not suitable for different kind of databases. ISC implements dynamic and adaptive determination of Meaningful clustering parameters based on hierarchical filtering approach. Third and most important feature of ISC is the ability of incremental learning and dynamic inclusion and exclusions of subspaces which lead to better cluster formation.<\/p>\r\n","references":"[1] Michael Steinbach, Levent Ert\u00f6z and Vipin Kumar, \"The Challenges of\r\nClustering High Dimensional Data\", (online). Available : http:\/\/wwwusers.\r\ncs.umn.edu\/~kumar\/papers\/high_dim_clustering_19.pdf\r\n[2] R. Sibson. SLINK, \"An optimally efficient algorithm for the single-link\r\ncluster method\", The Computer Journal, 16(1):30{34,1973.\r\n[3] M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, \"A density-based\r\nalgorithm for discovering clusters in large spatial databases with Noise\",\r\nIn Proceedings of the 2nd ACM International Conference on\r\nKnowledge Discovery and Data Mining (KDD), Portland, OR, 1996.\r\n[4] J. Han and M. Kamber, \"Data Mining: Concepts and Techniques\",\r\nMorgan Kaufman, 2001.\r\n[5] R. Agrawal, J. Gehrke, D. Gunopulos, and. Raghavan, \"Automatic\r\nsubspace clustering of high dimensional data for data mining\r\napplications\", In Proceedings of the SIGMOD Conference, Seattle, WA,\r\n1998.\r\n[6] C. H. Cheng, A. W.-C. Fu, and Y. Zhang, \"Entropy-based subspace\r\nclustering for mining numerical data\", In Proceedings of the 5th ACM\r\nInternational Conference on Knowledge Discovery and Data Mining\r\n(SIGKDD), San Diego, CA, pages 84{93, 1999.\r\n[7] S. Goil, H. Nagesh, and A. Choudhary, \"MAFIA: Efficient and scalable\r\nsubspace clustering for very large data sets\", Technical Report CPDCTR-\r\n9906-010, Northwestern University, 1999.\r\n[8] K. Kailing, H.P. Kriegel, and P. Kroger, \"Density-connected subspace\r\nclustering for high-dimensional data\", In Proceedings of the 4th SIAM\r\nInternational Conference on Data Mining (SDM), Orlando, FL, 2004.\r\n[9] H.P. Kriegel, P. Kroger, M. Renz, and S. Wurst, \"A generic framework\r\nfor efficient subspace clustering of high-dimensional data. In\r\nProceedings of the 5th International Conference on Data Mining\r\n(ICDM), Houston, TX, 2005.\r\n[10] C. M. Procopiuc, M. Jones, P. K. Agarwal, and T. M. Murali, \"A Monte\r\nCarlo algorithm for fast projective clustering. In Proceedings of the\r\nSIGMOD Conference, Madison, WI, 2002.\r\n[11] C. Bohm, K. Kailing, H.P. Kriegel, and P. Kroger, \"Density connected\r\nclustering with local subspace preferences\", In Proceedings of the 4th\r\nInternational Conference on Data Mining (ICDM), Brighton, U.K.,\r\n2004.\r\n[12] C. Baumgartner, Plant C, Railing K, Kriegel H. -P, Kroger P, \"Subspace\r\nSelection for Clustering High-Dimensional Data\", In proceedings of 4th\r\nIEEE Int. Conference on Data Mining (ICDM 04), PP 11-18, Brighton,\r\nUK, 2004.\r\n[13] Daxin Jiang, Chun Tang , Aidong Zhang: \"Cluster Analysis for Gene\r\nExpression Data: A Survey\", IEEE Transactions on Knowledge and\r\nData Engineering, Issue Date : November 2004, pp. 1370-1386.\r\n[14] Elke Achtert, Christian Bohm, Hans-Peter Kriegel, Peer Kroger, Ina\r\nMuller-Gorman, Arthur Zimek, \"Finding Hierarchies of Subspace\r\nClusters\", In Proceedings of 10th European Conference on Principles\r\nand Practice of Knowledge Discovery in Databases (PKDD), Berlin,\r\nGermany, 2006.","publisher":"World Academy of Science, Engineering and Technology","index":"Open Science Index 31, 2009"}