hrvatski jezikClear Cookie - decide language by browser settings

Statistical Inference of Exogenous and Endogenous Information Propagation in Social Networks

Piškorec, Matija (2019) Statistical Inference of Exogenous and Endogenous Information Propagation in Social Networks. Doctoral thesis, Rudjer Boskovic Institute.

PDF (Doctoral dissertation) - Archival copy - other
Download (14MB) | Preview


In the last decade we witnessed a rapid rise of the online social media services. Although they were created in the early 2000's, their rise began in earnest after 2010 when their presence started to fundamentally alter the traditional media landscape. Today, their influence on the way our society consumes, curates and disseminates information is indisputable. With their wider adoption came also the first criticism, as well as a need to solve emerging legislative, ethical and societal issues. One line of research is to explain and quantify the sources of influence in online social services and investigate to what extent are these new social landscapes vulnerable to manipulation by third parties. This manipulation is often performed by using user's digital traces - a record of their activities on the online social service. These digital footprints have a potential to characterize users in more detail than what they themselves would be willing to share otherwise. For example, user's personality traits can be inferred indirectly from the content with which they interact through online services, and even their writing style on the written content they published could be used to infer their demographic characteristics. This opens opportunities for micro-targeting of users for various dubious purposes, for example by increasing their propensity to spread misinformation. Research described in this thesis shows that much can be learned about user engagement by using very little data - in our case only friendship connections between users and a single activation cascade. A single activation cascade means we only have one registration event per user. This data alone is sufficient to estimate, under certain assumptions, whether activation for each user was predominantly influenced by its peers with which they are connected (endogenous influence), or the exogenous factors which are external to the friendship network itself. Both endogenous and exogenous factors, for example mass media, are known to have a significant impact on the activity of users of online social media. The methodology developed in this thesis requires postulating an explicit endogenous influence model which governs interactions between pairs of users, while exogenous influence is assumed to act equally towards all users in the network. Several suitable endogenous influence models are proposed for the use with this methodology. First one is Susceptible-Infected model, commonly used in epidemiological modeling. Second one features a decay factor for the endogenous influence, which is a realistic assumption for in social systems. Third one features a logistic threshold for activation. Exogenous influence is modelled as an independent probability of activation which is, at any given time, equal for all non-activated users, although it may change in time. An inference method is developed where maximum likelihood estimation is used to estimate relative magnitudes of endogenous and exogenous influence on users. These estimates can then be used to characterize influence of individual users. The computational scalability analysis is performed on simulated data to demonstrate that the inference method is able to scale to large social networks. Empirical data on over 20 thousand Facebook users is used for evaluation of the proposed inference method. Data is collected using three unique Facebook political survey applications which provided Facebook friendship relations between users and a single activation cascade - a single registration event per user. Referral links, which identify user's origin, are used as a proxy for user's activation type. Users whose referral links originated from Facebook are considered as endogenously activated while those whose referral links originated from an external website are considered as exogenously activated. Inference method is used to estimate the most probable source of influence for each user individually, as well as to asses the overall influence of different media channels (peer communication, Facebook advertisements, or external news media) on user's activations cascade. Ethical, methodological and technical issues regarding data collection in the context of online social media services is discussed. Guidelines on how to collect online social media data in an ethically principled way are provided, especially in the context of satisfying requirements for reproducible research. Estimating endogenous and exogenous influence in networks with a statistical methodology that is conceptually simple, yet powerful and efficient, is widely applicable to scientific domains where deciphering properties of spreading processes and external influences on complex networks is crucial for an explanation of new phenomena.

Item Type: Thesis (Doctoral thesis)
Uncontrolled Keywords: online social networks; social influence estimation; statistical learning; maximum likelihood method; online social data collection
Subjects: TECHNICAL SCIENCES > Computing
Divisions: Division of Electronics
Depositing User: Matija Piškorec
Date Deposited: 28 Jun 2023 12:51

Actions (login required)

View Item View Item


Downloads per month over past year

Increase Font
Decrease Font
Dyslexic Font