Updates
11/15/13: v1.2c released, bug fix in coarse fitting for random walk model.
12/17/11: There does not appear to be any problem with the modified Forest Fire fitting code as described in the 12/02/11 update. The update was put up due to the fact that the code did not seem to terminate in the time expected - upon further investigation it seems the code simply takes a very long time to run due to it's recursive nature. In our case, it has been running for several weeks. Parameter fitting is still being run on our datasets and will be released when finished (an update will be made here) As the modified Forest Fire model was shown to not be very effective in replicating social graph structure in our WWW2010 paper, we recommend users to use alternative models such as modified Nearest Neighbor and/or dK-2.
12/02/11: There is a problem with the code associated with fitting for modified Forest Fire. We are investigating this, and will update here when it is fixed.
11/17/11: v1.2b released - fixed floating point round-off error for fine sampling on parameters (fitting code) for modified Random Walk. 0.9 - 0.99 would terminate at 0.98, now properly terminates at 0.99.
11/15/11: v1.2 released. demo.py script now included as a walkthrough on how to use the other scripts. No changes were made to the fitting code or to socialModels.py.
11/13/11: Modified Random Walk parameters released. Modified Random Walk fitting/generation code has been checked and is OK. Modified Forest Fire parameter fitting is up next.
11/10/11: v1.1 released. Fitting code for mod. Random Walk is back to sampling on both qe and qv intervals as opposed to the previous update which fixed qe.
11/07/11: v1.0 released for socialModels.py to improve running time of randomWalk_mod in socialModels.py, modified content in README, and changed fine fitting for Random Walk to only sample 'qe' for one value. Parameter fitting for modified Random Walk is still being run on our graphs and will be up soon.
About
The online social network scene has exploded in the past few years with networks such as Facebook and Twitter aggregating a large portion of hits per day. As such, many researchers have become interested in studying the dynamics and structure of these online social networks.
One of the main obstacles in researching social networks is the general lack of availability of datasets, due to increasing difficulty of crawling social networks. To circumvent this, several graph generation models have been proposed that attempt to mimic the structure of social graphs.
In this project, we explored the fidelity of these graph models, i.e. their ability to fit their parameters to a specific graph and generate synthetic graphs with nearly identical graph structures. Our work is described in our WWW 2010 paper [1], and found that two models, a modified version of the NearestNeighbor model, and the dK-2 series produce synthetic graphs with extremely similar properties to the original.
A key goal of our work is to contribute a way for researchers in the community to generate their own synthetic graphs that are "similar" in structure to the Facebook graphs we've been able to generate from our measurements. The instructions below include code to generate synthetic graphs, and model parameters that produce graphs that closely match our Facebook measurement datasets. Note that while our followup work in [2] produced a graph model with strong privacy guarantees (through a differentially private graph model), we do not yet have that code ready for distribution.
Publications:
- [1] Alessandra Sala, Lili Cao, Christo Wilson, Robert Zablit, Haitao Zheng, and Ben Y. Zhao. Measurement-calibrated Graph Models for Social Network Experiments. Proceedings of The 19th World Wide Web Conference (WWW 2010).
- [2] Alessandra Sala, Xiaohan Zhao, Christo Wilson, Haitao Zheng, and Ben Y. Zhao. Sharing Graphs using Differentially Private Graph Models. Proceedings of The 11th ACM SIGCOMM Internet Measurement Conference (IMC 2011).
Instructions
Fitting code
Make sure Python 2.7+, the NetworkX Python package, and the Orbis/dK code have been installed.
You may have to modify the sys path at the top of the scripts to point towards where you have the NetworkX package installed.
sys.path.append('/where/you/have/networkx/installed/')
The .tar.gz contains: fitModel[_noPickling].py, socialModels.py, twoKDistance.py, demo.py, README, and LICENSE
Run demo.py via:
$ python demo.py
to run the walkthrough script on how to use the other files.
To run the fitting script:
$ python fitModel[_noPickling].py (-ff/-rw/-nn) (graph name (ex. Twitter2011)) (input pickle path) (Orbis dkDist path) (result output path) [additional param. for -rw/-nn]
Additional parameters for mod. Random Walk and Nearest Neighbor
- -coarse/-fine to indicate coarse or fine sampling
- If '-fine' sampling is indicated, add additional 'qe_start qe_end qv_start qv_end' parameters for Random Walk, or 'k' parameter for Nearest Neighbor.
Examples:
$ python fitModel_noPickling.py -rw OSN_Name ../pickles/OSN.pickle ~/allMyCode/Orbis_Source/dkDist ./results/FineRW/ -coarse
$ python fitModel_noPickling.py -rw OSN_Name ../pickles/OSN.pickle ~/allMyCode/Orbis_Source/dkDist ./results/FineRW/ -fine 0.2 0.3 0.5 0.7
$ python fitModel_noPickling.py -nn OSN_Name ../pickles/OSN.pickle ~/allMyCode/Orbis_Source/dkDist ./results/FineNN/ -fine 3
socialModels.py
Make sure both Python 2.7+ and the NetworkX Python package have been installed.
Function | Description | Parameters |
---|---|---|
networkx.barabasi_albert_graph(n, m) | Generates a graph using the Barabasi-Albert model |
|
forestFire_mod(n, p) | Generates a graph based on Forest Fire model modified for undirected edges |
|
randomWalk_mod(n, qe, qv) | Generates a graph based on a Random Walk model modified for undirected edges |
|
nearestNeighbor_mod(n, u, k) | Nearest Neighbor model modified to generate graphs with power-law exponent between 1.5 and 1.75 to match a social graph structure |
|
Sample usage:
>>> import networkx as nx
>>> import socialModels as sm
>>> G = sm.nearestNeighbor_mod(97134, 0.53, 1)
Fitted Model Parameters
The below provides users with model parameters that, when fed
into our model generator code, produces graphs that closely match our
Facebook data. To see model parameters for a given model, choose it
from the clickable field. For anonymity reasons, we do not identify
which model parameters match which Facebook graphs. However, all parameters
shown are from our 2009 Facebook graphs.
Note: Our study has shown that
the models which best replicate the original social graph structure
are the modified Nearest Neighbor model and the dK-2 distribution.
Download
Current version:
fittingCode.zip v1.2c, updated 11/15/13.
Note: Fitting code for the Kronecker model can be found in the C++ SNAP library here.
We are also in the process of integrating our modified Forest Fire, Random Walk, and Nearest
Neighbor code into the NetworkX package. We will make a note here when this integration is
completed.
* Note:
- Code for the Barabasi-Albert model can be found in the NetworkX package under networkx.generators.random_graphs.barabasi_albert_graph.
- Code for the dK model (up to dK-2) can be found here. dK-3 and upwards generators currently do not exist.
- Code for the Kronecker model can be found in the C++ SNAP library found here.