Clickstream Project

About

Online services are increasingly dependent on user participation. Whether it is online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this project, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users’ click events), and visualize the detected behaviors in an intuitive manner. Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity). The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors.

This demo presents the clustering result on a large-scale clickstream traces from an anonymous social network, Whisper. Our system effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters. In addition, we have successfully applied clickstream-based behavior model to detect new attacks in real-world online social networks including Renren and LinkedIn.

Publications

Gang Wang, Xinyi Zhang, Shiliang Tang, Haitao Zheng, Ben Y. Zhao. Unsupervised Clickstream Clustering for User Behavior Analysis. Proceedings of SIGCHI Conference on Human Factors in Computing Systems (CHI), San Jose, CA, May 2016. Abstract Video
Gang Wang, Tristan Konolige, Christo Wilson, Xiao Wang, Haitao Zheng, Ben Y. Zhao. You are How You Click: Clickstream Analysis for Sybil Detection. Proceedings of the 22nd USENIX Security Symposium (USENIX Security), Washington, DC, August 2013. Abstract Slides

Code

The project source code is available for download. This zip file contains a set of scripts that perform recursive hierarchical clustering on clickstream data, and generate clusters of user behaviors.

For details about input/output format, and system configurations, please refer to the documentation. The algorithm itself is detailed in our paper.

A quick example is shown as follows.

$> python recursiveHierarchicalCustering.py input.txt output/

input.txt: input file that contains information about user clickstreams. Each line represents one user, her clickstream patterns:
user_id \t A(1)G(10)
where A and G are action patterns, and 1 and 10 represent how many times the respective pattern appears in the user's clickstream.
output: the directory for temporary files and the final clustering result files.
output/result.json will be the output file for the clustering results.

Contact Us

We are a research team from the Department of Computer Science in Univ of Chicago. If you have any questions, please don't hesitate to contact us.

Clickstream Clustering for User Behavior Analysis SAND Lab @ University of Chicago

About

Publications

Code

Contact Us

Clickstream Clustering for User Behavior Analysis
SAND Lab @ University of Chicago