Online services are increasingly dependent on user participation. Whether it is online social networks or crowdsourcing services, understanding user behavior is important yet challenging. In this project, we build an unsupervised system to capture dominating user behaviors from clickstream data (traces of users’ click events), and visualize the detected behaviors in an intuitive manner. Our system identifies "clusters" of similar users by partitioning a similarity graph (nodes are users; edges are weighted by clickstream similarity). The partitioning process leverages iterative feature pruning to capture the natural hierarchy within user clusters and produce intuitive features for visualizing and understanding captured user behaviors.
This demo presents the clustering result on a large-scale clickstream traces from an anonymous social network, Whisper. Our system effectively identifies previously unknown behaviors, e.g., dormant users, hostile chatters. In addition, we have successfully applied clickstream-based behavior model to detect new attacks in real-world online social networks including Renren and LinkedIn.
The project source code is available for download. This zip file contains a set of scripts that perform recursive hierarchical clustering on clickstream data, and generate clusters of user behaviors.
A quick example is shown as follows.
$> python recursiveHierarchicalCustering.py input.txt output/
user_id \t A(1)G(10)
G are action patterns, and
10 represent how many times the respective pattern appears in the user's clickstream.
output/result.json will be the output file for the clustering results.
We are a research team from the Department of Computer Science in Univ of Chicago. If you have any questions, please don't hesitate to contact us.