А distributed crawler for social networks based on Map/Reduce model


A.V. Yakushev, L.J. Dijkstra, S.А. Mityagin

This paper describes system which is based on MapReduce model for gathering data (crawling) from social networks. System runs on a cluster of computers that are managed by Apache Hadoop and supports multiuser mode and allows for each client to gather data about different themes and to use different crawling policies. System was used for crawling data from Livejournal social network in which network of user’s friendship was analyzed and was discovered that it is scale-free network.
