Predicted max degree sampling: Sampling in directed networks to maximize node coverage through crawling
Ricky Laishram Syracuse University, Syracuse, NY
Katchaguy Areekijseree Syracuse University, Syracuse, NY
Sucheta Soundarajan Syracuse University, Syracuse, NY
Abstract

Sampling through crawling is an important research topic in social network analysis. However there is very little existing work on sampling through crawling in directed net- works. In this paper we present a new method of sampling a directed network, with the objective of maximizing the node coverage. Our proposed method, Predicted Max Degree (PMD) Sampling, works by predicting which k open nodes are most likely to have the highest number of unobserved neighbors in a particular iteration. These nodes are queried, and the whole process repeats until all the available budget has been used up. We compared PMD against three baseline algorithms with three networks, and saw large improvements vs. baseline sampling algorithms: With a budget of 2000, PMD found 15%, 87.4% and 170.2% more nodes than the closest baseline algorithm in the wiki-Votes, soc-Slashdot and web-Google networks respectively.