Measuring preferential attachment in growing networks is an important topic in network science, since the experimental confirmation of assumptions about the generative processes that give rise to heavy-tail degree distributions characteristic of real-world networks depends on it. Multiple methods have been devised for measuring preferential attachment in time-resolved networks. However, many real-world network datasets are available as only single snapshots. We propose a novel nonparametric method, called PAFit-oneshot, for estimating the preferential attachment function for a growing network from one snapshot. The PAFit-oneshot method corrects for a bias that arises when estimating preferential attachment values only for degrees observed in the single snapshot. This bias, which had previously gone unnoticed, has a connection with a recently developed conditional inference approach called post-selection inference. Extensive experiments show that our method recovers the true preferential attachment function in simulated as well as real-world networks. Our work opens up a new path for network scientists to measure preferential attachment in a large number of one-snapshot networks that have been out-of-reach until now. As a demonstration, we nonparametrically estimated the preferential attachment function in three such networks and found all are sub-linear. The PAFit-oneshot method is implemented in the R package PAFit.
Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two simple growth mechanisms that not only explain certain structural properties commonly observed in real-world systems, but are also tied to a number of applications in modeling and inference. While there are statistical packages for estimating various parametric forms of the preferential attachment function, there is no existing package for a non-parametric estimation, which would allow finer inspections on the famous `rich-get-richer’ phenomenon as well as provide clues to explain non-standard structural properties observed in real-world networks. This paper introduces the R package PAFit, which implements statistical methods for estimating the preferential attachment function and node fitness non-parametrically, as well as a number of functions for generating complex networks from these two mechanisms. The main computational part of the package is implemented in C++ with OpenMP to ensure scalability to large-scale networks. In this paper, we first introduce the main functionalities of PAFit through simulated examples, and then use the package to analyze a collaboration network between scientists in the field of complex networks. The results indicate the joint existence of `rich-get-richer’ and `fit-get-richer’ phenomena in the collaboration network. The estimated attachment function is almost linear, which means that the probability an author develops a new collaboration is proportional to their current number of collaborators. Furthermore, the estimated fitnesses reveal many familiar names of the complex network field as top fittest scientists.
We propose a statistical method for estimating the non-parametric transitivity and preferential attachment functions simultaneously in a growing network, in contrast to conventional methods that either estimate each function in isolation or assume a certain functional form for these. Our model is demonstrated to exhibit a good fit to two real-world co-authorship networks and can illuminate several intriguing details of the preferential attachment and transitivity phenomena that would be unavailable under traditional methods. Moreover, we introduce a method for quantifying the amount of contributions of these phenomena in the growth process of a network based on the probabilistic dynamic process induced by the model formula. By applying this method, we found that transitivity dominated preferential attachment in both co-authorship networks. This suggests the importance of indirect relations in scientific creative processes. The proposed method is implemented in the R package FoFaF.
We propose a method for the non-parametric joint estimation of preferential attachment and transitivity in complex networks, as opposite to conventional methods that either estimate one mechanism in isolation or jointly estimate both assuming some functional forms. We apply our method to three scientific co-authorship networks between scholars in the complex network field, physicists in high-energy physics, and authors in the Strategic Management Journal. The non-parametric method revealed complex trends of preferential attachment and transitivity that would be unavailable under conventional parametric approaches. In all networks, having one common collaborator with another scientist increases at least five times the chance that one will collaborate with that scientist. Finally, by quantifying the contribution of each mechanism, we found that while transitivity dominates preferential attachment in the high-energy physics network, preferential attachment is the main driving force behind the evolutions of the remaining two networks.
Understanding how a scientist develops new scientific collaborations or how their papers receive new citations is a major challenge in scientometrics. The approach being proposed simultaneously examines the growth processes of the co-authorship and citation networks by analyzing the evolutions of the rich get richer and the fit get richer phenomena. In particular, the preferential attachment function and author fitnesses, which govern the two phenomena, are estimated non-parametrically in each network. The approach is applied to the co-authorship and citation networks of the flagship journal of the strategic management scientific community, namely the Strategic Management Journal. The results suggest that the abovementioned phenomena have been consistently governing both temporal networks. The average of the attachment exponents in the co-authorship network is 0.30 while it is 0.29 in the citation network. This suggests that the rich get richer phenomenon has been weak in both networks. The right tails of the distributions of author fitness in both networks are heavy, which imply that the intrinsic scientific quality of each author has been playing a crucial role in getting new citations and new co-authorships. Since the total competitiveness in each temporal network is founded to be rising with time, it is getting harder to receive a new citation or to develop a new collaboration. Analyzing the average competency, it was found that on average, while the veterans tend to be more competent at developing new collaborations, the newcomers are likely better at acquiring new citations. Furthermore, the author fitness in both networks has been consistent with the history of the strategic management scientific community. This suggests that coupling node fitnesses throughout different networks might be a promising new direction in analyzing simultaneously multiple networks.
Full-text view-only version is here.
Complex network growth across diverse fields of science is hypothesized to be driven in the main by a combination of preferential attachment and node fitness processes. For measuring the respective influences of these processes, previous approaches make strong and untested assumptions on the functional forms of either the preferential attachment function or fitness function or both. We introduce a Bayesian statistical method called PAFit to estimate preferential attachment and node fitness without imposing such functional constraints that works by maximizing a log-likelihood function with suitably added regularization terms. We use PAFit to investigate the interplay between preferential attachment and node fitness processes in a Facebook wall-post network. While we uncover evidence for both preferential attachment and node fitness, thus validating the hypothesis that these processes together drive complex network evolution, we also find that node fitness plays the bigger role in determining the degree of a node. This is the first validation of its kind on real-world network data. But surprisingly the rate of preferential attachment is found to deviate from the conventional log-linear form when node fitness is taken into account. The proposed method is implemented in the R package PAFit.
Preferential attachment is a stochastic process that has been proposed to explain certain topological features characteristic of complex networks from diverse domains. The systematic investigation of preferential attachment is an important area of research in network science, not only for the theoretical matter of verifying whether this hypothesized process is operative in real-world networks, but also for the practical insights that follow from knowledge of its functional form. Here we describe a maximum likelihood based estimation method for the measurement of preferential attachment in temporal complex networks. We call the method PAFit, and implement it in an R package of the same name. PAFit constitutes an advance over previous methods primarily because we based it on a nonparametric statistical framework that enables attachment kernel estimation free of any assumptions about its functional form. We show this results in PAFit outperforming the popular methods of Jeong and Newman in Monte Carlo simulations. What is more, we found that the application of PAFit to a publically available Flickr social network dataset yielded clear evidence for a deviation of the attachment kernel from the popularly assumed log-linear form. Independent of our main work, we provide a correction to a consequential error in Newman’s original method which had evidently gone unnoticed since its publication over a decade ago.
We introduce a statistically sound method called PAFit for the joint estimation of preferential attachment and node fitness in temporal complex networks. Together these mechanisms play a crucial role in shaping network topology by governing the way in which nodes acquire new edges over time. PAFit is an advance over previous methods in so far as it does not make any assumptions on the functional form of the preferential attachment function. We found that the application of PAFit to a publicly available Flickr social network dataset turned up clear evidence for a deviation of the preferential attachment function from the popularly assumed log-linear form. What is more, we were surprised to find that hubs are not always the nodes with the highest node fitnesses. PAFit is implemented in an R package of the same name.