Updates

11/15/13: v1.2c released, bug fix in coarse fitting for random walk model.

12/17/11: There does not appear to be any problem with the modified Forest Fire fitting code as described in the 12/02/11 update. The update was put up due to the fact that the code did not seem to terminate in the time expected - upon further investigation it seems the code simply takes a very long time to run due to it's recursive nature. In our case, it has been running for several weeks. Parameter fitting is still being run on our datasets and will be released when finished (an update will be made here) As the modified Forest Fire model was shown to not be very effective in replicating social graph structure in our WWW2010 paper, we recommend users to use alternative models such as modified Nearest Neighbor and/or dK-2.

12/02/11: There is a problem with the code associated with fitting for modified Forest Fire. We are investigating this, and will update here when it is fixed.

11/17/11: v1.2b released - fixed floating point round-off error for fine sampling on parameters (fitting code) for modified Random Walk. 0.9 - 0.99 would terminate at 0.98, now properly terminates at 0.99.

11/15/11: v1.2 released. demo.py script now included as a walkthrough on how to use the other scripts. No changes were made to the fitting code or to socialModels.py.

11/13/11: Modified Random Walk parameters released. Modified Random Walk fitting/generation code has been checked and is OK. Modified Forest Fire parameter fitting is up next.

11/10/11: v1.1 released. Fitting code for mod. Random Walk is back to sampling on both qe and qv intervals as opposed to the previous update which fixed qe.

11/07/11: v1.0 released for socialModels.py to improve running time of randomWalk_mod in socialModels.py, modified content in README, and changed fine fitting for Random Walk to only sample 'qe' for one value. Parameter fitting for modified Random Walk is still being run on our graphs and will be up soon.

About

The online social network scene has exploded in the past few years with networks such as Facebook and Twitter aggregating a large portion of hits per day. As such, many researchers have become interested in studying the dynamics and structure of these online social networks.

One of the main obstacles in researching social networks is the general lack of availability of datasets, due to increasing difficulty of crawling social networks. To circumvent this, several graph generation models have been proposed that attempt to mimic the structure of social graphs.

In this project, we explored the fidelity of these graph models, i.e. their ability to fit their parameters to a specific graph and generate synthetic graphs with nearly identical graph structures. Our work is described in our WWW 2010 paper [1], and found that two models, a modified version of the NearestNeighbor model, and the dK-2 series produce synthetic graphs with extremely similar properties to the original.

A key goal of our work is to contribute a way for researchers in the community to generate their own synthetic graphs that are "similar" in structure to the Facebook graphs we've been able to generate from our measurements. The instructions below include code to generate synthetic graphs, and model parameters that produce graphs that closely match our Facebook measurement datasets. Note that while our followup work in [2] produced a graph model with strong privacy guarantees (through a differentially private graph model), we do not yet have that code ready for distribution.

Publications:

Instructions

Fitting code
Make sure Python 2.7+, the NetworkX Python package, and the Orbis/dK code have been installed.

You may have to modify the sys path at the top of the scripts to point towards where you have the NetworkX package installed.

sys.path.append('/where/you/have/networkx/installed/')

The .tar.gz contains: fitModel[_noPickling].py, socialModels.py, twoKDistance.py, demo.py, README, and LICENSE

Run demo.py via:

$ python demo.py

to run the walkthrough script on how to use the other files.

To run the fitting script:

$ python fitModel[_noPickling].py (-ff/-rw/-nn) (graph name (ex. Twitter2011)) (input pickle path) (Orbis dkDist path) (result output path) [additional param. for -rw/-nn]

Additional parameters for mod. Random Walk and Nearest Neighbor

Examples:

$ python fitModel_noPickling.py -rw OSN_Name ../pickles/OSN.pickle ~/allMyCode/Orbis_Source/dkDist ./results/FineRW/ -coarse

$ python fitModel_noPickling.py -rw OSN_Name ../pickles/OSN.pickle ~/allMyCode/Orbis_Source/dkDist ./results/FineRW/ -fine 0.2 0.3 0.5 0.7

$ python fitModel_noPickling.py -nn OSN_Name ../pickles/OSN.pickle ~/allMyCode/Orbis_Source/dkDist ./results/FineNN/ -fine 3


socialModels.py
Make sure both Python 2.7+ and the NetworkX Python package have been installed.

Function Description Parameters
networkx.barabasi_albert_graph(n, m) Generates a graph using the Barabasi-Albert model
  • n nodes in the graph (integer)
  • m number of edges between each new node and existing nodes (integer)
forestFire_mod(n, p) Generates a graph based on Forest Fire model modified for undirected edges
  • n nodes in the graph (integer)
  • p burn rate (floating point)
randomWalk_mod(n, qe, qv) Generates a graph based on a Random Walk model modified for undirected edges
  • n nodes in the graph (integer)
  • qe probability of continuing the walk after each step (floating point)
  • qv probability of attaching to visited node (floating point)
nearestNeighbor_mod(n, u, k) Nearest Neighbor model modified to generate graphs with power-law exponent between 1.5 and 1.75 to match a social graph structure
  • n nodes in the graph (integer)
  • u probability that determines if a new node is added or if a pair of 2 hop neighbors is connected (floating point)
  • each time a new node is added, k pairs of random nodes in the graph are connected to adjust the power-law exponent (integer)

Sample usage:

>>> import networkx as nx
>>> import socialModels as sm
>>> G = sm.nearestNeighbor_mod(97134, 0.53, 1)

Fitted Model Parameters

The below provides users with model parameters that, when fed into our model generator code, produces graphs that closely match our Facebook data. To see model parameters for a given model, choose it from the clickable field. For anonymity reasons, we do not identify which model parameters match which Facebook graphs. However, all parameters shown are from our 2009 Facebook graphs.

Note: Our study has shown that the models which best replicate the original social graph structure are the modified Nearest Neighbor model and the dK-2 distribution.

Download

Current version: fittingCode.zip v1.2c, updated 11/15/13.

Note: Fitting code for the Kronecker model can be found in the C++ SNAP library here.

We are also in the process of integrating our modified Forest Fire, Random Walk, and Nearest Neighbor code into the NetworkX package. We will make a note here when this integration is completed.

* Note: