CINXE.COM

A Similarity Measure for Clustering and its Applications

<?xml version="1.0" encoding="UTF-8"?> <article key="pdf/9316" mdate="2008-05-20 00:00:00"> <author>Guadalupe J. Torres and Ram B. Basnet and Andrew H. Sung and Srinivas Mukkamala and Bernardete M. Ribeiro</author> <title>A Similarity Measure for Clustering and its Applications</title> <pages>1712 - 1718</pages> <year>2008</year> <volume>2</volume> <number>5</number> <journal>International Journal of Computer and Information Engineering</journal> <ee>https://publications.waset.org/pdf/9316</ee> <url>https://publications.waset.org/vol/17</url> <publisher>World Academy of Science, Engineering and Technology</publisher> <abstract>This paper introduces a measure of similarity between two clusterings of the same dataset produced by two different algorithms, or even the same algorithm (Kmeans, for instance, with different initializations usually produce different results in clustering the same dataset). We then apply the measure to calculate the similarity between pairs of clusterings, with special interest directed at comparing the similarity between various machine clusterings and human clustering of datasets. The similarity measure thus can be used to identify the best (in terms of most similar to human) clustering algorithm for a specific problem at hand. Experimental results pertaining to the text categorization problem of a Portuguese corpus (wherein a translationintoEnglish approach is used) are presented, as well as results on the wellknown benchmark IRIS dataset. The significance and other potential applications of the proposed measure are discussed.</abstract> <index>Open Science Index 17, 2008</index> </article>