CINXE.COM

TY - JFULL AU - Ren谩ta Iv谩ncsy and S谩ndor Juh谩sz PY - 2007/11/ TI - Analysis of Web User Identification Methods T2 - International Journal of Computer and Information Engineering SP - 3048 EP - 3056 VL - 1 SN - 1307-6892 UR - https://publications.waset.org/pdf/386 PU - World Academy of Science, Engineering and Technology NX - Open Science Index 10, 2007 N2 - Web usage mining has become a popular research area, as a huge amount of data is available online. These data can be used for several purposes, such as web personalization, web structure enhancement, web navigation prediction etc. However, the raw log files are not directly usable; they have to be preprocessed in order to transform them into a suitable format for different data mining tasks. One of the key issues in the preprocessing phase is to identify web users. Identifying users based on web log files is not a straightforward problem, thus various methods have been developed. There are several difficulties that have to be overcome, such as client side caching, changing and shared IP addresses and so on. This paper presents three different methods for identifying web users. Two of them are the most commonly used methods in web log mining systems, whereas the third on is our novel approach that uses a complex cookie-based method to identify web users. Furthermore we also take steps towards identifying the individuals behind the impersonal web users. To demonstrate the efficiency of the new method we developed an implementation called Web Activity Tracking (WAT) system that aims at a more precise distinction of web users based on log data. We present some statistical analysis created by the WAT on real data about the behavior of the Hungarian web users and a comprehensive analysis and comparison of the three methods ER -