Where do networks come from?

The key assumption underlying both the peer effects and structural approaches to network effects assume some degree of exogeneity in the existence and structure of network ties.

Exogeneity is both a theoretical claim as well as an empirical assumption. All reasonable theories are built on a set of axioms that assume some primitive or exogenous features of the world or of the target system which is being analyzed.  Many models in economics, for instance, assume that preferences are exogenous. From these preferences, we are then able to then derive things like behavior, choice, “roles” as well as the structure of social relationships.

Screen Shot 2017-05-10 at 10.31.25 AM.png

Similarly, some sociological and anthropological traditions start with axioms that assume that “roles” are exogenous. These roles—e.g., the position a individual occupies in a social structure—govern behavior, preferences, as well as social relationships.

Screen Shot 2017-05-10 at 10.31.32 AM

Much of the network analysis we’ve been conducting or discussing thus far also has an exogeneity assumption built in. The primitives are social relationships and their structure. All other things we observe such as behavior, preferences and roles emerge from the pattern of exogenous network ties. In the lectures on structural holes, status and peer effects, we argue that the pattern of social relationships cause in differences in behavior, preferences, as well as roles and not vice versa.

Screen Shot 2017-05-10 at 10.31.38 AM

The challenge of network formation

However, a challenge for the social relationships first perspective is that networks are unlikely to be fully “exogenous.” They form and evolve through certain processes that make some people more likely to connect to each other, and make some people less likely to do so.

Network scholars have spent considerable time on trying to understand how networks form and change. At a broad conceptual level, we can think about five factors that shape whether a tie between two individuals—e.g., ego and alter—forms.

Screen Shot 2017-05-10 at 11.08.33 AM.png

The logic behind most models of network formation is simple. At one end, there are “benefits” whether actual or perceived as well as pecuniary and non-pecuniary/psychic  for connecting with someone. At the other end, there are “costs” which make it either easier or harder to form a relationship with someone, either because searching for them, coordinating with them, or potentially dealing with them is more costly than with someone else. Relatedly, some individuals may have a lower cost of building a network than others and/or it may be lower cost (relative to benefit) to connect with someone.

Factor 1: Characteristics of Ego, the sender.

Characteristics encapsulated in “Factor 1” include a range of factors that make it easier for certain types of people (e.g., those who have a certain characteristics themselves) to connect with many others. This characteristic may include things that either make it easier for these people (relative to others) to make many connections or perhaps provide them greater benefit from doing so. Research in this stream has found a substantial range of characteristics that vary at the individual level, that also predict an increased or decreased propensity to have a certain type of network surrounding them. These things include:

  • Personality: Some work has found that differences in personality traits are correlated with network structure. For instance, individuals who have many ties are also likely to have Extroverted personalities. Relatedly, those who are high in “self monitoring” also have a greater likelihood of being “brokers” or occupying “structural holes” in a social network.
  • Other factors that may also be related to larger networks include:
    • Strategic intent
    • Intelligence
    • Physical characteristics (e.g., beauty or height)
    • Age
  • Some factors may be describe an individual at a certain point in time:
    • After the loss of a job
    • After being promoted to a new role
  • Other factors may be socially constructed, but describing the Ego in a given context:
    • Caste
    • Religion

One can reason about the various ways in which these characteristics of Ego either lower their costs of making ties or increase the benefit they get. Can you come up with other individual-level factors that might matter?

Factor 2: Characteristics of Alter, the receiver.

A related set of arguments can be made about the characteristics of an alter or alters. For instance, one could theorize about the following characteristics of alter(s) that may make them more likely to receive connections from others.

  • Personality
  • Intelligence
  • Skill
  • Wealth
  • Social standing
  • Formal role in the organization

Like the Ego-centric perspective, one could logically use a “cost” and “benefit” perspective for reasoning about why some Alter may have more advice seekers (e.g., they are smart) or more friends (e.g., they are helpful). In purely altercentric models, we ignore the characteristics of Ego.

Factor 3: The interaction of Ego/Alter characteristics (e.g., homophily)

The 3rd Factor is one related to the “Ego-Alter” interaction. In such models, there is something about the characteristics of Ego and Alter together that predict an increased or decreased propensity to have network ties. The most common theme in these models is homophily or the tendency for individuals who are similar to each other to have a higher propensity to connect. Research has found that individuals who are similar in the following characteristics are more likely to connect with each other, relative to the alternatives:

  • Race and ethnicity
  • Gender
  • Age
  • Formal organizational position
  • Occupation
  • Religion

There are many theories about why such a preference exists. On one hand, social contexts (e.g., communities, neighborhoods, etc.) are often organized by these characteristics. This makes it much easier to connect with people who are similar to you. There is also an element of choice. Individuals who are similar to you are likely have similar experiences, share similar values, and like and dislike similar things. As a consequence, the costs of interacting with similar people is likely to be less than interacting with people who are different.

However, the type of relation may matter here. In mating networks you are more likely to see heterophily than homophily. This might also be true of mentoring relationships, where individuals are more likely to be mentored by those of a different level of senority than them.

What other factors at this level might increase or decrease the cost of interaction or raise its benefits?

Factor 4: Social and Physical Context

The fourth factor can broadly be thought of as the social or physical context within which individuals are forming social networks. A simple example is office or neighborhood layout. A substantial amount of research has found that physical distance has a substantial effect on whether two individuals form ties. Scientists who are nearby, for instance, are more likely to collaborate and their research trajectories also become rather similar.

Research has found that there is a exponential relationship between physical distance and the propensity to connect. This effect is called propinquity. Individuals who are physically proximate are substantially more likely to interact, followed by steep declines in the rates of interaction as distance increases.

In addition to propinquity, other aspects of the social context are also likely to affect the extent of tie formation. These factors could be the reorganization of roles, task inter-dependencies, as well as cultural or organizational norms regarding competition or collaboration. Incentives are also important in determining what the shape of the network might be. The challenge with many of these effects are that they are often “absorbed” into the intercept of the model. That is, they are only able to be detected when looking across contexts, but not within context.

Factor 5: Endogenous Network Processes

 

Finally, the structure of one part of the network may affect the structure of another. Consider a simple example: Reciprocity. If I consider you a friend. There is a social-psychological as well as a sociological process that also increases the likelihood that I consider you a friend. This is akin to tit-for-tat. If you give me a gift, I will give you one in return. Networks exhibit this property with substantial regularity (but not always!). In this context, the emergence of a network tie, the reciprocal one, is endogenous to the network. That is, it emerges from within the network structure and not outside of it.

Similarly, there are other endogenous network processes that others have detected in networks. These include factors such as transitivity. For instance, a friend of a friend is often a friend. Heiderian balance theory, for example, argues that individuals desire balance in their relationships. The situation of being friend’s with your friend’s enemy is unsustainable according to balance theory (why?). Because it is, that structure will endogenously change into something else–either the enemies become friends or  the network splits.

Other forces include preferential attachment. New entrants into a network are proportionally more likely to connect to individuals based on the size of their degree centrality. This process gives some networks a power law distribution, rather than a binomial/normal distribution that would be expected if the network was formed through a purely random process.

 

Image result for power law distribution

Power law distribution

 

 

Image result for normal distribution

Normal Distribution

 

 

Empirical considerations

Though the theoretical ideas behind network formation are quite straightforward, disentangling the differential impact of these effects remains quite challenging. In a subsequent post, we will discuss the various approaches to estimating these models.

 

 

Advertisement

Seeing the networks in your company

Thus far we have assumed that we had network data. But data like the “Professionals” was gathered using a survey in a real organization. In this post I will walk you through the process of creating a simple network survey in SurveyMonkey (a web based survey application) and analyzing the responses from the survey using R. Lets begin by going to www.surveymonkey.com.  Here is the landing page (as of May 5, 2017). You will need to purchase a basic subscription to download the data (I purchased an educator subscription for $18).

Screen Shot 2017-05-05 at 8.31.33 AM.png

I’ve signed up for a free account (for now). After I complete all my signup information. Here is the screen that I get, asking me to start by creating a survey.

Screen Shot 2017-05-05 at 8.35.15 AM

I will call my survey, “Simple Network Survey.” I enter this into the text box, and then press + Add Questions. Pressing this takes me to a new screen.

 

Screen Shot 2017-05-05 at 8.37.27 AM

In order to create the appropriate network data (where we know who considers whom a friend, advice giver, etc.), we will need to begin by asking people who they are. I prefer to do this first using a dropdown menu where an individual can select just one option. The question I ask is: What is your name? Please select from the dropdown menu.  Make sure that the question type is “Dropdown”

Screen Shot 2017-05-05 at 8.39.27 AM.png

Once I have this, I would like to enter the names of the people who will be taking the survey. My list (of fake people) include: Alice, Bob, Chris, Dina, Elena, Frank, and Greg. I add these using the “Add Answers in Bulk” option:

Screen Shot 2017-05-05 at 8.42.24 AM.png

Once I click save, I move to the Options tab, and I check off “Require an Answer to This Question.” Next I click DONE. 

I now create a new page (+ New Page). This is where I will place the network survey.

Screen Shot 2017-05-05 at 8.44.24 AM.png

For the purposes of this example, I will only ask two questions about people’s networks. What questions shall we ask?

Perhaps one of the things that hardest to teach about network analysis is determining the right types of questions to ask people. The questions should reveal something people and their social networks that we might not have been able to assess if we hadn’t asked them those questions.

We can think about kinds of questions in terms of a 2×2 — on one dimension we have questions about networks that provide people with resources (Instrumental) and on the other, we have questions about more personal/social relationships (e.g., Expresssive).  On the other dimension we have questions that are either “Enduring or qualitative” or “Event based.” The table below summarizes some examples.

Enduring/Qualitative Event Based
Instrumental Advice

Task

Information

Asked for advice in the past week.
Expressive Friendship

Trust

Social support

Informally go to Lunch

Talked about important personal matters

Here are some examples:

Questions about who you know:

Below is a list of names of your colleagues at [firm name]. Some of them you may (1) know well, others you (2) may be acquainted with, and still others (3) you may not know at all. Please check the box next to the names of those individuals who are in categories (1) or (2).

Advice (Work-related)

Sometimes it is useful to get help or advice from your colleagues on performing some aspect of doing your work well. Please check the box next to the names of those individuals who you would approach for help or advice on such work related issues.

Advice (Work related) Reciprocal

There also may be people who come to you seeking help or advice about doing their own work well. Please check the box next to the names of those individuals who might typically come to you for help or advice on work related issues.

Advice (Career and Success)

Sometimes it is useful to seek advice from colleagues at work about more than just how to do your work well. For example, you may be interested in “how things work” around here, or how to optimize your chances for a successful career here. If you needed help along these lines, who would you go to for help or advice regarding these issues?  Please check the box next to the names of those individuals who you would approach for help or advice on these non-technical related issues.

Advice (Career and Success) Reciprocal

There also may be people who come to you seeking help or advice about such non-task related issues. Please check the box next to the names of those individuals who might typically come to you for help or advice along these dimensions.

Friendship

Sometimes during the course of interactions at the workplace, friendships form. We are interested in whether you have people at [firm name] who you consider to be friends of yours. Please check the box next to the names of the individuals who you think of as friends here at [firm name].

Event based questions:

Lunch

Below you will find a list of people who work at [firm name]. Please check the names of the individuals with whom you have met with for lunch at least once during the past 30 days.

Event based advice

Below you will find a list of people who work at [firm name]. Please check the names of the individuals from whom you’ve sought out advice about work related matters at least once during the past 30 days.

The problem of recall: People are highly inaccurate when you ask them to recall specific interaction events. They are much more accurate when you ask them to recall enduring and qualitatively meaningful relationships.  Events are highly informative when you know what happens during that event, but otherwise they are harder to generalize from.

Now that we have some examples of questions, lets add one two the survey. I typically recommend having 2 questions, one expressive (e.g., friendship) and one instrumental (e.g., advice). They usually provide different information.

Lets, for the sake of example, add an advice network question to Page 2. We will create a “Multiple Choice” question where the answers are the names of the people in the organization (e.g., Alice, etc.). The question we ask is:

Sometimes it is useful to get help or advice from your colleagues on performing some aspect of doing your work well. Please check the box next to the names of those individuals who you would approach for help or advice on such work related issues.

We will also add a short note telling people not to select their own name and to check as few or as many names as appropriate. Below the options, also check “Allow more than one answer to this question (use checkboxes).

Screen Shot 2017-05-05 at 9.00.21 AM

Let us now save this question by clicking save.

I will now add one more question, this can be our “Dependent variable” which measures the extent to which co-workers have a positive or negative impact.

Screen Shot 2017-05-05 at 9.55.18 AM.png

After all the questions are in, click “Next” at the top and lets begin collecting responses.

Screenshot 2017-05-05 10.39.16.png

We will use the “Get Web Link” option. The web link for the survey I made is:

https://www.surveymonkey.com/r/QZ5KG3S

Lets quickly fill out the survey. I will also fill in responses for everyone in the roster.

Screenshot 2017-05-05 10.42.01.png

After all the responses are in for all the people in the organization (e.g., Alice…) we can download the data. I have downloaded the excel file. It comes as a zip file and a resulting csv file with the data. These are respectively attached here and here.

The raw CSV file that is exported from Survey Monkey looks like this:

Screenshot 2017-05-05 19.26.35.png

Lets clean this up so that we get a 7×7 matrix. Note that there is an ordered list of names on the left (Alice…Greg on the rows) and a similarly ordered list of names at the top (columns). The rows are the respondents (senders) and the columns are the people with whom they do and do not have a relationship. With the names, the matrix looks like:

Screenshot 2017-05-05 19.30.34.png

Without the names, it looks like:

Screenshot 2017-05-05 19.37.20.png

Try to match it up to the survey response in our original file. The matrix is now saved as surveyexample.csv.

The following code imports the data (the cleaned up version above) and plots the network:

# This file provides some simple code to get you started on your Network Analysis Journey

library(data.table)

library(curl)

library(sna)

#(Q0) “who do you know or know of at [the firm]”,

#Load the “Survey Monkey” network data from Dropbox.

survey <- fread(https://www.dropbox.com/s/nd13m6szn8d8lto/surveyexample.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be

#analyzed using the sna package.

survey = as.matrix(survey)

# this creates the no

names = c(“Alice”, “Bob”, “Chris”,“Dina”,“Elena”,“Frank”, “Greg”)

# Rename all the rows

rownames(survey) = names

# Rename all the columns

colnames(survey) = names

# Plot the survey network

gplot(survey, label = names)

Here is the resulting network.

Screenshot 2017-05-05 20.44.55.png

 

We can calculate each person’s centrality and also correlate the network positions with the final question we asked. We need to first convert it into a numeric and then import it into R.

# This file provides some simple code to get you started on your Network Analysis Journey

library(data.table)
library(curl)
library(sna)

#(Q0) “who do you know or know of at [the firm]”,

#Load the “Survey Monkey” network data from Dropbox.
survey <- fread(‘https://www.dropbox.com/s/nd13m6szn8d8lto/surveyexample.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.
survey = as.matrix(survey)

# this creates the no
names = c(“Alice”, “Bob”, “Chris”,”Dina”,”Elena”,”Frank”, “Greg”)

# Rename all the rows
rownames(survey) = names

# Rename all the columns
colnames(survey) = names

# Plot the survey network
gplot(survey, label = names)

#Load the “Survey Monkey” network data from Dropbox.
surveyoutcome <- fread(‘https://www.dropbox.com/s/we2dvevfejte8ov/surveyoutcome.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.
surveyoutcome = as.matrix(surveyoutcome)

# rename rownames and create a variable which is the integer
# version of the numeric response
colnames(surveyoutcome) = c(“name”,”response”,”respval”)
respval = as.integer(surveyoutcome[,3])

# Calculate outdegree for the survey response
survey.outdegree = degree(survey, cmode = “outdegree”)

# Estimate a model regressing the respval on the outdgree
m.0 = lm(respval ~ survey.outdegree)
summary(m.0)

Here is the regression outcome:

Screenshot 2017-05-05 21.01.51.png

 

The above walk-through should give you a way to collect network data, and then analyze it using R.

Before, I conclude I want to discuss the various survey approaches used by network analysts

Types of Network Surveys

Roster based surveys: Roster based methods are perhaps the most common approach. This is what we just completed above. With roster surveys, you provide the respondent with a list of names of people or organizations. Then you ask them to indicate (by checking off the boxes next to the names) which of these people they have a certain relationship with. The nice thing about roster based surveys is that they tend to be quite accurate because people don’t have to recall the names out of the blue. Further, the roster allows you get longer network lists than if people had to recall names from memory. The down-side of this is that if the organization has too many people (say in the 1000s) it would be too hard to make people go through a list of 1000 or even worse, 2000 people.

List based surveys: The other type of survey is a list survey. Here you ask the question and then request that your respondents list the names of people in the organization that they have this relationship with. What might be some concerns with a survey method like this? 

Ego-network surveys:  This is a slightly modified version of the list-based survey. Here you ask the people to list up to five people (or k people) that they have a certain relationship with. Then you ask them to indicate whether the people listed also have a relationship of a certain type with each other. 

Position generator surveys: This is perhaps the least structural of the network surveys. Here what you do is the following: You provide a list of the “positions” that people can potentially occupy – so in an organization you list the different functional areas, levels of seniority, etc.  And then ask people whether they have a no relationship with someone in such a position, acquaintance in that position, a friend in that position, etc.  This is a very indirect measure of networks, but it provides a broad understanding of the “range” of a persons network.

In addition to these classical approaches to collecting network data, organizations have more modern methods available to figure out potential sources of interaction between their employees. These include:

Email:  IT administrators know every email you send to everyone else and what it contains. This is true in most cases in the vast majority of organizations. Scary, yes. True, yes. But this is information that everyone knows exists and some organizations are using it to understand informal interaction and trying to make better decisions with this information.

Mailing list/Groups activity: Another source of information about networks and interaction are the mailing lists that people are a part of.

RFID:  Most of our ID cards have RFID these days – we use these cards to enter/exit buildings. RFID censors can also be placed in strategic locations to understand interactions that are face-to-face between people. Conference organizers are also using RFID tags to understand interaction among attendees.

Online data sources:

LinkedIn —  LinkedIn has a massive economic graph. Their data include where people got their degrees, where they worked, who they worked with, etc.

Facebook: This is the largest social network in the world. Period.

About firms:  The websites of Venture capital firms tell you who their partners, etc. are and where they attended college and when they graduated.  It also tells you that some may be investing in similar projects.

 

More: In a future post, I will walk through how to create “network” data using text in documents. The “ties” here are measures of similarity between the text descriptions of entities.

 

Peer effects, knowledge transfer and social influence

The structural approach to social networks is inherently beautiful as a representational approach. I am always in awe of the fact that we can learn so much about how human beings act or their outcomes based merely on the pattern of their social ties. The idea is both simple and profound.

The structural approach is built on assumptions regarding information transfer across a simpler unit of analysis: the dyad. In the world of dyads, new complications arise and different theories must be developed and tested.

Let us take the Professionals data we have been analyzing as an example. Here is the advice network among these professionals.

Screen Shot 2017-05-04 at 10.45.24 AM.png

In the prior analyses, we have focused on analyzing the structure of each node’s connections.  For example, each node has a specific number of incoming connections, its outdegree:

Screen Shot 2017-05-04 at 10.47.03 AM.png

The beauty of the structural approach to social networks is that we can learn a lot about the outcomes of individuals and organizations by merely looking at the pattern of their relationships. Recall our prior analysis. There is information in indegree. We were able to explain 6.5% of the variation in our measure of whether a person has the “knowledge to succeed” just by looking at the count of their incoming connections! While indegree may capture or reflect other processes and might not be causal, it is nevertheless information rich.

However, an Ego’s alters (e.g., the people that a focal node is connected to) are not all the same—as we sometimes implicitly assume in our models. As a note, I don’t believe that researchers actually believe that all the people we are connected to are the same. Indeed, betweenness, closeness, eigenvector centrality, all assume that not all connections are the same by their very construction. However, the heterogeneity in alter characteristics is implicit rather than explicit because we never specify in our theories or models, exactly how these individuals vary.

The peer effects framework on the other had often ignores variation in structure, but emphasizes variation in the characteristics of connections.

Below, I walk through some examples of this approach.

A simple model of peer effects

The “peer effects” framework is called as such because it is based on a line of research in the economics of education where scholars were attempting to understand the impact of classroom peers on academic outcomes. Hence, peer effects.

Let us start with a simple setup. Let us assume there are 100 students in a classroom. The teacher has decided that everyone in the class will have a study partner, so he asks each of the students to pair up into groups of two. There are now 50 pairs, each with two people. The teacher wonders, whether having a smart peer (i.e., alter) increases the performance of for a focal student (e.g. Ego). Visually, he is interested in understanding this influence process:

Screen Shot 2017-05-04 at 1.20.36 PM.png

At the end of the class, all of the students take a standardized exam. This exam is scored on a 100 point scale, and students can get anywhere from a score of 0 to 100. The teacher takes this score and runs the following regression with 100 observations, 1 for each student. She’s also good with standard errors, so she clusters standard errors at the level of the dyad:

score_{i} = \beta_{0} + \beta_{1} score_{j} + \epsilon 

After running the regression, she finds a large and statistically significant coefficient for \beta_{1}. How should she interpret it?

A naive causal interpretation is: for every unit increase in score_{j} there is a corresponding \beta_{1} increase in score_{i}. Or, by having a study partner with a certain score, there is a corresponding increase/decrease in the performance of the focal student. This interpretation is naive for a reason, because is probably (though not definitely) wrong.

But before we dive into why it is probably wrong, it is useful to reiterate that this “peer effects” representation is quite general. For example these outcomes might be determined in part by the influence of peers (however defined).

 

  • Finance: Putting money away into a retirement savings account, adopting a microfinance product, etc.
  • Health behaviors: Obesity, Happiness, use of HIV/AIDS test, etc.
  • Academic performance: Getting good grades, choosing a major.
  • Entrepreneurship: Becoming an entrepreneur; deciding against becoming an entrepreneur.
  • Careers: Quitting; moving to a new company.
  • Adoption of products: Prescribing a drug, buying a car.
  • Adoption of behaviors: Smoking, drinking, sexual events.
  • Adoption of ideas: Learning from patents.
  • Organizational behavior:  Adoption of corporate practices and policies.

The basic idea is simple: We observe some level or change in the behavior or characteristics of an alter (or alters) and we see whether these are correlated to the behaviors or outcomes of Ego.

 

This apparently simple process is much more nuanced and complicated than it appears. There are dozens of “mechanisms” that can lead to the correlation we might observe (or that the teacher observes. Here are some examples of a few reasons why we might observe a correlation, either positive or negative. Consider the case of product adoption.

 

 

  Name Definition
1 Direct transfer of specific information. Alter tells me about a product, but nothing more.
2 Persuasion Effects Alter tells me about the product, and forcefully persuades me to adopt it.
3 Direct transfer of general information. Alter tells me about a website that reviews products, and on this page a list is produced where the product that I adopt is listed first.
4 Role-modeling / Imitation I see Alter doing something, I copy it.
5 Install Base Effects  I see many Alters adopting a product (i.e. buying an iPad, I adopt the iPad)
6 Threshold Effects I only buy an iPad if at least 10 people I know own it, once the 10th person adopts, I decide to adopt.
7 Snob effects I see an Alter(s) doing something, I avoid doing it myself.
8 Simultaneous Alter helps me out and I help her out, and together we perform better than either one would alone, because we, by talking through a problem for example, figure it out together.
9 Reverse causality The Alter does not affect Ego; but rather the Ego affects the Alter.
10 Contextual Effects We are both in the same neighborhood, and because we get exposed to the same billboard, we see the same advertisement for a project, and thus we adopt it.
11 Induced Environmental Effects Having a high achieving peer results in a teacher who teaches at a higher level, thus the student learns more not because of greater transfer of information from her peer, but because teaching quality improves.
12 Selection bias I become friends with people who already own iPads. I become friends with people who like technology, and because they like technology, they also own iPads.
13 Homophily Effects I like iPads and because I do, I become friends with iPads.

Can you think of more mechanisms?

 

Which mechanism is actually at play in a specific context?

This question is a hard one. Because we have several potential mechanisms that we must work with, how do we rule out some of them? Some mechanisms are easier to rule out then others, but most are actually quite difficult to conclusively confirm or deny.

To deal with this issue (which is VERY common during the review process) I have come up with a two part classification. The first set of mechanisms are what I call “pseudo-mechanisms.” Pseudo-mechanisms are alternative explanations of the correlation that have nothing to do with social influence of the type we care about: influence flowing from the peer to the focal individual. Charles Manski, in a famous paper has defined these as the reflection problem and the selection problem. 

Reflection problem: The reflection problem asks you to imagine a mirror. You see two objections moving. And if it is unclear to you that you are looking at a mirror, then you can’t tell which one is the actual person who is moving and which one is the mirror image. More formally, imagine that we have two sets of variables, let us call them  x and y; let x be the measurement of the characteristics of individual ’s peers’ characteristics at time t and let y be the measurement of the focal individual ’s characteristics at time t. Now, because of the simultaneous measurement, we are unable to tell whether the change in x’s characteristics has caused a change in y’s characteristic, or vice versa. And this indeterminacy exists for each observation.

Furthermore, we are unable to tell whether each of these actors was exposed to some environmental shock (advertising, etc. at the same time, which make their adoption correlated). The only way that we can insure that the reflection problem is not an issue is by measuring the traits and characteristics of the xs prior to measuring those of y.

However, solving the doing this does not resolve the issue of causality. Thus, it is a necessary, but insufficient condition.

Another important, and much more difficult condition now has to be met in order for the effect to have the title “Causal.”  This is the selection problem. The set of conditions that solves the selection problem are twofold:

  1. Either you know all the reasons why two people were paired together (i.e. why person y is friends with, shares a room with, enters the college as, with x).
  2. OR the two individuals are randomly assigned, and thus breaking the correlation between the characteristics of x and y.

Assume for a moment that we have ruled out reflection and selection effects by (1) using a lagged measure of peer consumption or action, and (2) the ego and alter are randomly paired, we have only ruled out a handful of possible “mechanisms” producing the peer effects. We can rule out the “pseudo-mechanisms” #8 – #13 (except for #11), but that leaves us with 8 possible mechanisms.

Imagine a doctor telling you that “Yes, we’ve ruled out the fact that you are faking your symptoms, but there are 8 or more possible viruses that could be causing your infection!”

So, we need to now try and distinguish between these.

This is hard, even harder than resolving the reflection and selection problems.  The reflection and selection problems are interesting in that they are hard problems to solve, but we know how to solve them. Not to make too many medical analogies, but this like separating conjoined twins. Hard, but someone can do it and has done it.

So how do we distinguish between different mechanisms, say #1 – #7?

This will depend a lot on context, and a lot on the data that you have available.

Let us examine a very simple situation where we have two students. Let us call the first student “Ego” and let us call the second student “Alter.” Assume for a moment that we have completely alleviated the problems of reflection and selection.

 

Screen Shot 2017-05-04 at 2.31.58 PM.png

Let us say that really there are two contender mechanisms.  (This is probably not true; but, for a moment assume that it is true.)

Mechanism 1: A student learns general study habits from his/her peer (alter) and this why his performance increases.

Mechanism 2: A student interacts a lot with his/her peer (alter) and they study together, and the peer helps the student learn the material.

How would we go about designing a test that would distinguish between these two mechanisms?

  1. For instance, if what the student is getting from her peer is increased motivation, that should have a positive effect on various subjects.
  2. On the other hand, if the student is learning something rather specific (like how to do an integral), then the effects should be subject specific.

Assume you do this test, and you find out that there are effects across subjects, what can you say about the mechanisms? Can you say anything?

How to conduct the estimation in R

Standard peer effects estimations are quite straightforward. This is especially true when you have randomization in the pairing of focal individuals to peers and longitudinal data so you can lag the characteristics of the peer.

score_{i,t+1} = \beta_{0} + \beta_{1} score_{j,t} + \epsilon 

Here is a synthetic peer effects dataset in which 2000 individuals have been randomly paired: peer_effects.csv.

Let us examine the extent to which there are peer effects.

The model we want to estimate is:

postself_{i,t+1} = \beta_{0} + \beta_{1} prepeer{j,t} + \epsilon 

Estimating this equation in R with this data results in:

Screen Shot 2017-05-04 at 3.28.39 PM.png

If the randomization is proper, this coefficient should be stable if we control for the focal individuals own pretreatment score.

Screen Shot 2017-05-04 at 3.30.22 PM.png

Another worry we have is whether this effect of the peer (captured by the pre-treatment characteristics) is homogeneous or heterogeneous. That is, does it depend on the characteristics of the focal individual or does it apply to everyone? To test this, we include a main effect of the characteristics of the focal individual (self_char) and an interaction term (pre_peer * self_char).

Screen Shot 2017-05-04 at 3.33.01 PM.png

Here, we see that the peer effects depends on the characteristic of the focal individual. If the focal individual has this characteristic (e.g., willingness to listen), the peer effect is larger.

This is only a simple demonstration of the complexity of peer effects, there are likely to be many interactional factors that turn peer effects “on” or “off” or modulate them in some important way. One could imagine the following contingencies, where peer effects depend on characteristics of:

  • the focal individual
  • the environment
  • the alter/peer
  • personalities of both

 

Entrepreneurial networks

Who is this? Keep this face in mind, at least for a bit.

James Dewey Watson

 

 

Leading in the whitespace

A major breakthrough in our understanding of the social nature of competition came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens.

His, very powerful, argument to us was to think about “structural holes” as “opportunities.

That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged.

The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge.

This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes.  Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999).

Picture1.png

On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties.

The Control Benefits of Structural Holes

Let us examine the control benefits first. In the first representation, who has control?

Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics.

While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking.

At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them.

If you are in the third structure, there are two specific control benefits that you have:

  • The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you.  For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay.
  • The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other. 

These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill.

The Information Benefits of Structural Holes

All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits:

  • Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups.  Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it.
  • Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others.  Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc.
  • Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness.   

The Structural Holes in DNA

Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes.

This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA.

James Dewey Watson

Here are some quotes about the quest for the structure of DNA from the Nobel Prize website:

In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.”

…Nobody had the slightest idea of what the molecule might look like.

In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together…

As in the solving of other complex problems, the work of many people was needed to establish the full picture.

Picture1.png

Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick:

“Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity.  This was not because he thought it uninteresting. Quite the contrary.

Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation.  At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science.

The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.”

Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA.

Screen Shot 2017-05-02 at 11.09.10 AM.png

Break #1:

At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA.

“I proceeded to forget Maurice, but not his DNA photograph.”

Break #2:

A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself.

Break #3:

Knowledge about Chargaff’s rules through is doctoral training in Indiana.

Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA.  Because of his position, he was able to put these pieces together faster than anyone else.

All three processes helped Watson:

  • Access to novel information.
  • Timing, getting access to information before it was published.
  • Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise.

Luck? No. Social Networks.

Growing your network strategically

Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit.

  • Decreasing returns to network size:  If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection.
  • Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided.
  • Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly?

In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get.

Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way:

Efficiency = Effective Size / Actual Size

Expanding this function out, we can define:

Actual size = The number of connections that you have.

Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections.

Bandwidth and Diversity

The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff.

On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information.

Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties:

  1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important.
  2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need.

However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare.

 Extending the Core Insights from Structural Hole Theory

As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements.

Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time.

Two processes often lead to decay:

  • Disintermediation: Disconnected parties learn to exchange on their own.
  • Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists.

Why bridges decay:

  • -Low performance / High performers have lower rates of decay for bridges
  • If other relations are decaying, bridges are also likely to decay
  • Experience bridging improves the chances that new bridges survive
  • “Hole decay” may be limited when:
    • Deep barriers limit interaction across the hole.
    • The benefits to the bridged parties is high enough and switching costs are high.
    • The bridged individuals don’t question the role of the broker, or it is not salient to them.

Beyond Information and Control

There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations.

  •  In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us.  This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.”
  • Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance.

Outcomes as Mean versus Variance

The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome.

Picture1

However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first.

Which would you prefer below?

Picture1

James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well.

With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low.

The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out.

The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine.

The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones. 

Network Positions and Advantage: Status

 

One of the most important things we do on a day-to-day basis is make predictions about the value of individuals or companies, or really, any entity.  Making such predictions is challenging because we have limited information about the qualities of the entity we are attempting to make predictions about. For instance:

  •  A hiring manager at a firm is trying to make a prediction about whether a certain applicant will be a high performer.
  • A PhD admissions committee makes predictions about whether an applicant to their program will turn into a star researcher.
  • A venture capitalist makes predictions about whether a startup or founding team will create a breakthrough product that will become a billion dollar company.
  • A search engine is making a prediction about whether a certain webpage contains useful information for its users.
  • A consumer makes predictions about the quality of a product before he/she buys it.

Predictions of this type are commonplace and often rather difficult to make. This difficulty exists for two reasons. First, only a limited set of characteristics are observable to the decision maker, whereas much else is unobservable. A hiring manger, for instance, may observe a resume and a list of references. Based on this resume and reference list, she attempts to make an inference about many things: how hard working the applicant is, their base of knowledge, their ability to get along with other members of her team, and so on. Thus, the hiring manager attempts to use “observables” to infer something about the unobservables.

The goal therefore is to map observables (the things that you can easily measure and observe about someone or some organization) to unobservables. What are some examples of unobservables and/or things that are difficult to observe:

  • Creativity
  • Whether a person you hire will “fit” with an organization’s culture
  • Whether a company you invest in will turn a profit
  • Trustworthiness

The inability to effectively communicate information about these hard to quantify traits from one person to another becomes a problem for both the evaluator and in many cases for the person being evaluated, particularly if they are high quality, but others can’t tell this is the case.

That is, how does one separate the signal from the noise?

One solution proposed to this problem is signaling theory. People send signals and these signals contain information that allow “buyers” to ascertain whether the seller (a job market candidate) is of high quality or not. But anyone can send signals, and sometimes the signals are noisy or uninformative. If the signals are no good, then they don’t solve the asymmetric information problem.

Michael Spence argued that some signals are harder to acquire than others, and this difficulty in acquiring the signal is related to some dimension of underlying quality.

For instance, a hiring manager might be looking to hire someone with great machine learning talent. Anyone can put “machine learning” on his/her resume, so merely doing so isn’t likely to be a very good signal of having that skill. However, it is probably easier to win a Kaggle competition if you have good machine learning skills than if you do not. As a result, those with more machine learning skill are more likely to be represented among Kaggle winners than those without that skill. Thus, winning in Kaggle is likely to be a decent signal of ML skill. Further, since winning in Kaggle is easily observable, it is perhaps a decent signal for what we care about.

 

 

Can you think of other signals that contain a lot of information and are difficult to fake?

Joel Podolny in a series of articles proposed that social relations also help signal quality. This is a profound idea, and I will walk through it further. But let us fast forward to another application of Eigenvector centrality: the original Google PageRank algorithm.

For example, social cues such as endorsements, recommendations, funding decisions or hiring decisions,  convey/signal information.

Screen Shot 2017-05-02 at 1.58.08 PM.png

Consider James and Betty. Both have two connections of their own. And both of their connections think highly of them and recommend them. In an abstract sense, Betty and James are rated by their raters-e.g., their two connections. But a new problem arises: who has more reliable raters?

This is what we can consider the “rating the raters” problem. While in the first degree out (the direct connections of these two individuals) they are indistinguishable, there is substantial variation in their second and third degree ties. Although James and Better have similarly sized networks, Betty’s network connections have far more connections of their own.

Screen Shot 2017-05-02 at 2.06.58 PM.png

While it is relatively easy to figure out the difference between the size of Betty and James’ second degree network, the problem gets more complicated the further we move out. Real networks don’t usually have connections out to the 2nd or 3rd degree, but to 4th, 5th, 6th, etc. The second problem is that real networks aren’t usually trees. Networks loop back on themselves over and over again which make the “rating the rater” problem hard.​ So we cannot just re-weight the rating by the ratings received by the rater.

There is concept, called Eigenvector centrality, that does exactly what we thought was hard: it rates the raters, the rater’s raters, the rater’s rater’s raters, and so on.​ This measure gives us a nice summary statistic telling us how much “status” a node in the network has. ​Hard to fake because you can perhaps fake your own network ties, but not the ties of your connections’ connections. The nodes below, for instance, are resized by eigenvector centrality.

Screen Shot 2017-05-02 at 2.12.38 PM.png

The problem of determining the “value” or credibility of an object based on its connections and its connections’ connections is a general one.​  Google’s original algorithm, PageRank, is sociometric status. ​ The basic intuition of PageRank was if a site gets a lot of incoming links, and the sites linking to the original site also do, and so on. Then there must be some value to it.​ The insight arises by viewing the Web as a network, and using its structure to determine whether a page is useful or not.​
Screen Shot 2017-05-02 at 2.13.41 PM.png

Ego and Altercentric Perspectives

Now that we have the basic concept of sociometric status down. The “big idea” in sociology came from Joel Podolny. He suggested that we had focused primarily on seeing networks as “pipes” through which information, resources, support, and other “stuff” flows. However, networks are also useful for individuals in resolving problems of uncertainty because certain types of network structures also signal trust, reputation, and identity — network structures are prisms that reveal information as well.

The extent to which networks operate as pipes or as prisms depends on the level of uncertainty faced by market participants. He developed a highly useful framework for thinking about characterizing what structure may matter when. There are two types of uncertainty, Egocentric and Altercentric.

 

Picture1.png

Fig. 1.—Illustrative markets arrayed by altercentric and egocentric uncertainty

Egocentric  uncertainty

A market or market segment can rate highly on one type of uncertainty without rating highly on the other.

 

Consider the four markets represented in the figure above. From Podolny (2001):

Vaccines: Beginning with the market for a particular vaccine, such as polio or smallpox, in the upper left-hand quadrant. The most salient source of uncertainty in this market is that which underlies the development of the vaccine. Once the vaccine is developed and is given regulatory approval, there is little uncertainty on the part of consumers as to whether they will benefit from the innovation. Accordingly, a market  for a vaccine is a market that rates high on egocentric uncertainty, but low on altercentric uncertainty.

 

Roofers: Alternatively, consider the market in the lower right-hand corner, a regional market for roofers. “Roofing technology” is relatively well understood, and while roofers may face some uncertainty as to who needs a roof in any particular year, they can be confident that every homeowner will need repair work or a replacement every 20 years or so. By sending out fliers or advertising in the yellow pages, they can be assured of reaching a constituency with a demand for their service. However, because an individual consumer only infrequently enters the market, the consumer is generally unaware of quality-based distinctions among roofers. The consumer may be able to alleviate some of this uncertainty through consultation with others who have recently had roof repairs; however, the need for such consultation is an illustration of the basic point. Only through such search and consultation can the consumer’s relatively high level of uncertainty be reduced. Accordingly, this is a market that is comparatively low in terms of egocentric uncertainty, but relatively high in terms of altercentric uncertainty.

 

What are some other examples of markets that are low on one type of uncertainty and high on another? What about markets that are high on both?

 

 

How does one deal with altercentric uncertainty?

Let us loop back to our earlier discussion of sociometric status. Why is sociometric status a useful signal to help resolve altercentric uncertainty?

  • Sociometric Status: A position in a social network – defined by the ties that you have to others – where you receive deference from others who are themselves highly respected or deferred to.

 

 

When does Status goes awry?

However, there are many instances where status does not serve as a perfect signal of quality – and this can lead to mis-perceptions of status and thus misperceptions of quality.   When status is a perfect signal of quality it is said that there is tight coupling between status and quality. However, as a I mentioned, this is often not the case.

 

Matthew Effect / Self-fulfilling prophecy:  The classic example of this is the phenomenon of the 41st chair.  This is the example of the “French Academy” where there are only 40 chairs, and there perhaps no substantive difference between #40 and #41 – but the 40th person becomes a holder of a chair, and the 41st person does not.  This results in the 40th person get more rewards, recognition, etc. Which in turn allows them to do better work – because they now have significantly more resources than people who do not. In sociological parlance, the phenomenon of the 41st chair is called  “Decoupling.” Here, the linear relationship between quality and status – the 40th person gains far more status than the 41st—breaks down.

 

Buy low, sell high: This decoupling is an arbitrage situation for managers – because most people use status signals that are imperfect. There are two possible strategies to exploit this gap:

  1. Figure out a more readily observable representation of social signals that maps onto to quality more tightly and sell that information.
  2. Figure out a way to measure sociometric status in a situation where it is not currently used. Then use this as a better way of valuation.

Beyond the basics

The study of sociometric (and other Status) is an extremely rich area of research in organizational sociology and economic sociology. I have merely scratched the surface of this topic.

Some excellent articles and reviews in this stream include:

Stuart, Toby E., Ha Hoang, and Ralph C. Hybels. “Interorganizational endorsements and the performance of entrepreneurial ventures.” Administrative science quarterly 44.2 (1999): 315-349.

Sauder, Michael, Freda Lynn, and Joel M. Podolny. “Status: Insights from organizational sociology.” Annual Review of Sociology 38 (2012): 267-283.

Lynn, Freda B., Joel M. Podolny, and Lin Tao. “A Sociological (De) Construction of the Relationship between Status and Quality.” American Journal of Sociology 115.3 (2009): 755-804.

Chen, Ya-Ru, et al. “Introduction to the special issue: Bringing status to the table—attaining, maintaining, and experiencing status in organizations and markets.” (2012): 299-307.

Phillips, Damon J., and Ezra W. Zuckerman. “Middle-Status Conformity: Theoretical Restatement and Empirical Demonstration in Two Markets.” American Journal of Sociology 107.2 (2001): 379-429.

Network Positions and Advantage: Structural Holes

Who is this? Keep this face in mind, at least for a bit.

James Dewey Watson

In the prior lecture we discussed the simple micro-macro-micro process described in Granovetter (1973), the “Strength of Weak Ties.” Recall what we discussed: The forbidden triad is forbidden because in equilibrium it is generally unstable, because it is unbalanced.

Picture1.png

The unstable structure of the forbidden triad is particularly unstable for strong ties in which strength increases as some function of.

  • The amount of time that two people spend together
  • The emotional intensity of the interaction
  • The intimacy between the two parties (i.e., mutual confiding)
  • The reciprocal services which the two parties engage in.

The way to sustain the “bridge structure” implied by the forbidden triad is to weaken one of these conditions.  The weak tie that is a result, can allow for the persistence of “bridges” or “brokerage” across distinct and differentiated strong tie clusters across groups that divide the social world.
Picture1

One key assumption that we make is that there is different information that is being discussed across these different groups. For instance, these different groups could be scientific research communities, regional economic clusters, different departments in the same business school, and so on. We start with the assumption that people in these different groups are doing different things, they may have different cultures, and are members of different disciplines. Information within a cluster–e.g., information that person 1 and 2 who are in Group A possess–is much likely to be redundant than information across clusters. Consequently, information in group A and group B is said to be non-redundant. That is, a person from group A, by talking to someone in group B is more likely to learn something new than if she talked to someone else from group A.

The “big idea” from the Strength of Weak Ties hypothesis is that there are “holes” in the social structure and that weak ties are the conduits that can transmit information across these holes. Thus, more weak ties mean that people have access more and newer information.

Picture1

The Holes in Social Structure

The crystal clear mechanisms implied by the weak tie hypothesis can be credited to the imagination of the author for seeing something that others missed. Yet, the empirical facts of the original paper were consistent with this hypothesis, but the measurement did not capture spanning the holes in the structure per se. The theoretical argument was that weak ties because of why they exist should correspond to this structural configuration.

Another major breakthrough came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens.

His, very powerful, argument to us was to think about “structural holes” as “opportunities.

That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged.

The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge.

This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes.  Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999).

Picture1.png

On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties.

The Control Benefits of Structural Holes

Let us examine the control benefits first. In the first representation, who has control?

Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics.

While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking.

At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them.

If you are in the third structure, there are two specific control benefits that you have:

  • The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you.  For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay.
  • The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other. 

These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill.

The Information Benefits of Structural Holes

All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits:

  • Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups.  Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it.
  • Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others.  Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc.
  • Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness.   

The Structural Holes in DNA

Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes.

This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA.

James Dewey Watson

17th October 1962: American biochemist Dr. James Dewey Watson seated in his lab at Harvard University, Massachusetts. He shared the 1962 Nobel Prize in medicine for the discovery of the molecular structure of DNA. (Photo by Hulton Archive/Getty Images)

Here are some quotes about the quest for the structure of DNA from the Nobel Prize website:

In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.”

…Nobody had the slightest idea of what the molecule might look like.

In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together…

As in the solving of other complex problems, the work of many people was needed to establish the full picture.

Picture1.png

Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick:

“Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity.  This was not because he thought it uninteresting. Quite the contrary.

Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation.  At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science.

The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.”

Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA.

Screen Shot 2017-05-02 at 11.09.10 AM.png

Break #1:

At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA.

“I proceeded to forget Maurice, but not his DNA photograph.”

Break #2:

A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself.

Break #3:

Knowledge about Chargaff’s rules through is doctoral training in Indiana.

Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA.  Because of his position, he was able to put these pieces together faster than anyone else.

All three processes helped Watson:

  • Access to novel information.
  • Timing, getting access to information before it was published.
  • Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise.

Luck? No. Social Networks.

Growing your network strategically

Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit.

  • Decreasing returns to network size:  If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection.
  • Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided.
  • Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly?

In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get.

Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way:

Efficiency = Effective Size / Actual Size

Expanding this function out, we can define:

Actual size = The number of connections that you have.

Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections.

 

 

Bandwidth and Diversity

The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff.

On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information.

Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties:

  1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important.
  2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need.

However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare.

 Extending the Core Insights from Structural Hole Theory

As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements.

Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time.

Two processes often lead to decay:

  • Disintermediation: Disconnected parties learn to exchange on their own.
  • Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists.

Why bridges decay:

  • -Low performance / High performers have lower rates of decay for bridges
  • If other relations are decaying, bridges are also likely to decay
  • Experience bridging improves the chances that new bridges survive
  • “Hole decay” may be limited when:
    • Deep barriers limit interaction across the hole.
    • The benefits to the bridged parties is high enough and switching costs are high.
    • The bridged individuals don’t question the role of the broker, or it is not salient to them.

Beyond Information and Control

There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations.

  •  In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us.  This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.”
  • Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance.

Outcomes as Mean versus Variance

The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome.

Picture1

However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first.

Which would you prefer below?

Picture1

James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well.

With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low.

The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out.

The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine.

The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones. 

Network Analysis in R: Getting Started

In some respects, the history of network analysis cannot be separated from the tools used to conduct network analysis. The importance of software to the enterprise of network analysis has been true since the very beginning of the field. Scholars have written and made available software programs to allow others to collect data and conduct analysis themselves.  For instance, you can find some description of a software program called CONCOR in White et al. (1976) that finds roles in an informal social network. Other great technologies such as UCINET, KrackPlot and a host of other social network analysis software allowed network approaches to spread rapidly through the field. My hypothesis is that without these technologies and their ease of use (UCINet, I think was a game changer for the field), network analysis might still be in the backwaters.

Today, there are lots of options for the researcher who wants to do network analysis. I myself use two primary tools that fit well into my workflow (e.g., I use an Apple Mac and I do a lot of non-network analysis as well). Those tools are: The R Statistical Programming Language + the SNA Package developed by Professor Carter Butts of UCI Irvine and STATA. While some of my posts (and the accompanying analysis) will use STATA, I  will focus primarily on the use of R for network analysis.

Getting started with R for Social Network Analysis

Let us begin by downloading and installing the R programming language. Begin by navigating to the R-Project. I will do the walkthrough for the Mac version of R.

Screen Shot 2017-04-28 at 9.09.05 AM.png

After navigating there, click on the CRAN link under download. The closest server to me is probably at UC Berkeley, but pick which ever one is closest.

Screen Shot 2017-04-28 at 9.10.56 AM.png

Next, download and install the version of R for your operating system. I will click the Download R for (Mac) OS X, and then click on the most recent version (which, at the writing of this post is R-3.4.0. Download and install. I won’t walk you through this.

Screen Shot 2017-04-28 at 9.14.27 AM.png

Now that R is installed, lets open it up and get some basic network analysis going. Once the R console is open, click on File (in the top menu) and then click on New Document. This should open a blank script file. Type a comment (a line that begins with #). I’ve typed:

# This file provides some simple code to get you started on your Network Analysis Journey

Save the file (I’ve called it RSNApractice.R). Clicking on the file name will give you access to the complete file.

Screen Shot 2017-04-28 at 9.16.38 AM.png

Now that we have that sorted out, let us begin by installing some important packages. You can type this code directly into the console.

install.packages(“data.table”)
install.packages(“curl”)
install.packages(“sna”)

The data.table package allows us to import data from the web; the curl is a required package for data.table and sna. Once these packages are installed, lets get them loaded.

library(data.table)
library(curl)
library(sna)

Now that these are installed, let me tell you a little about the data that we are going to analyze. This data comes from professional services consulting firm on the east coast of the United States, collected some time in the early 2000s. There are 247 people at the firm and each of them responded to a network survey where they answered 6 questions. Here are the questions:

#(Q0) “who do you know or know of at [the firm]”,

#(Q1) “who you would approach for help or advice on work related issues”,

#(Q2) “who might typically come to you for help or advice on work related issues”,

#(Q3) who you go to “about more than just how to do your work well. For example, you may be interested in ‘how things work’ around here, or how to optimize your chances for a successful career here”,

#(Q4) “who might typically come to you for help or advice along these [non-task related] dimensions” and finally

#(Q5) “who you think of as friends here at [firm].”

I’ve uploaded their responses to a dropbox folder in the form of matrices. The rows of the matrix indicate “senders” or “Ego” and the columns represent “receivers” or “Alters.”

We can load the data using the following code:

#Load the “Professionals” network data from Dropbox.

q0 <- fread(https://www.dropbox.com/s/xsk5t5nhsmp8614/q0_res.csv?dl=1&#8217;)
q1 <- fread(https://www.dropbox.com/s/aplyb7h947993ca/q1_res.csv?dl=1&#8217;)
q2 <- fread(https://www.dropbox.com/s/qrwr6j5mjz57kbr/q2_res.csv?dl=1&#8217;)
q3 <- fread(https://www.dropbox.com/s/wlw8w34cjlxvs3y/q3_res.csv?dl=1&#8217;)
q4 <- fread(https://www.dropbox.com/s/o82cg1mcjx0u09u/q4_res.csv?dl=1&#8217;)
q5 <- fread(https://www.dropbox.com/s/x86r63ewbh2ol6p/q5_res.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.

q0 = as.matrix(q0)
q1 = as.matrix(q1)
q2 = as.matrix(q2)
q3 = as.matrix(q3)
q4 = as.matrix(q4)
q5 = as.matrix(q5)

# Create a vector of numbers from 1-247 and convert them to a string.
# We will use these to rename our rows and columns.

names = paste(seq(1:247))

# Rename all the rows

rownames(q0) = names
rownames(q1) = names
rownames(q2) = names
rownames(q3) = names
rownames(q4) = names
rownames(q5) = names

# Rename all the columns

colnames(q0) = names
colnames(q1) = names
colnames(q2) = names
colnames(q3) = names
colnames(q4) = names
colnames(q5) = names

This code should load all of the network data into the R console.

Now, lets import some attributes.

# Imports the attributes file and outcomes file, and converts it into a data frame.

attr attr

Now that these are all loaded, lets see how the data look. Type the following to look at the first ten rows and columns of q0.

# Lets look at the first ten rows/columns of q0

q0[1:10,1:10]

Screen Shot 2017-04-28 at 10.45.52 AM.png

How do we interpret this? Person 1 doesn’t appear to know persons 2-10. However, person 2 says they know person 5, 7 and 10.

Lets plot this as a graph.

# Plot the first 10 people in the q0 matrix.

gplot(q0[1:10,1:10])

Screen Shot 2017-04-28 at 10.47.47 AM.png

Let us now plot the full q0 network. This is the “knowing” network of this firm of 247.

# Plot the full “knowing” network

gplot(q0)

Screen Shot 2017-04-28 at 10.51.02 AM.png

Quite dense. A lot of people know a lot of other people at the firm. Try to do this analysis for q1 to q5. What are the differences/similarities?

Lets do some simple centrality calculations (more on Centrality in the Representing Networks post).

# Calculate two simple centrality calculations on the q0 network.
# Indegree is the number of people who say they know a focal person (in arrows on a node)
# Outdegree is the number of people who a focal person says they know (out arrows from a node)

q0.indegree = degree(q0, cmode =”indegree”)
q0.outdegree = degree(q0, cmode =”outdegree”)

The centrality measures are now saved in the objects q0.indegree and q0.outdegree. Lets plot histograms of these two measures.

# Plot histograms of q0.indegree and q0.outdegree

hist(q0.indegree)
hist(q0.outdegree)

These look very nicely distributed, almost poisson. Lets calculate some summary statistics on these measures.

# Summary statistics on the indegree/outdegree measures

summary(q0.indegree)
summary(q0.outdegree)

Screen Shot 2017-04-28 at 11.06.25 AM.png

Now, lets do one final thing before we conclude this post (you can keep analyzing stuff, I will delve deeper into centrality measures and the like in a different post). I have also given you an outcomes file with three outcomes.

Here are the outcome variables:

relationships: whether the respondent feels their relationships at the firm are fulfilling
success: whether the respondent feels that they have the knowledge to succeed at the firm
appreciate: whether they feel appreciated

Here is a description of the attribute variables:

tenure: tenure at this firm
title: whether the employee is an analyst, lateral hire, or partner
location: what office they work in
gender: male or female
ethnicity: 91% are white
age: age of employee
elite: whether the employee graduated from an elite university
feeder: whether the employee graduated from a “feeder” university
work1-work24: types of work the employee does

Lets conduct one final analysis. Lets see if there is a correlation between how many people an employee knows, and whether they feel like they have the knowledge to scuc

# Examine if there is a correlation between how many people someone knows and whether they feel like they have the knowledge to succeed.

m.0 summary(m.0)

Screen Shot 2017-04-28 at 11.29.52 AM.png

Looks like there is at least a bivariate correlation.  Lets plot it.

# Plot the regression and the data points.

plot(q0.indegree,attr$success)
abline(m.0)

Screen Shot 2017-04-28 at 11.31.12 AM.png

Now that you have most of the data, you can explore yourself. Here is the full code @ RSNApractice.R

The Foundations of Network Analysis

The course “Topics in Social Network Analysis: Structure and Dynamics” is targeted towards doctoral students in management, organizational behavior and strategy. This blog post summarizes the first lecture, “The Foundations of Network Analysis.”

The goal of the first lecture is to introduce you to the “why” behind network theory and a bit of the “what.” Overall, the mission of the course is to help you become a sophisticated consumer of networks research, and hopefully a sophisticated producer of it as well.

By the end of the course, you should be able to:

  • Develop network-theoretic explanations for the behavior of people, teams and organizations. Network theoretic explanations use “relationships” (we’ll talk more about this in the future) and “patterns of relationships” as explanatory devices rather than traits or characteristics.
  • Learn how to set up high-quality research designs for your network theories.
  • Conduct statistical tests of your theories that can help you refute alternative explanations.

So, lets begin with a simple question: What is network analysis?

There are a lot of definitions, but here is one I like:

Network theory is a scientific perspective that reasons about the behavior of a target system or elements of that system, using the pattern of relationships between elements of that system. 

Lets begin with a super simple example. Stanford GSB has approximately 400 students. Let us assume, for a moment that all 400 students end up getting jobs with certain wages w(i). Some students earn a lot of money (a lot!) and some student’s might make less than what they made before they came into the MBA program. An analyst might wonder: What causes this variation in MBA salaries?

A astute PhD student might theorize a function that maps some vector of characteristics of each MBA student c(i) to their wage w(i), such that:

w(i) = f(c(i))

Elements in the vector c might include:

  1. The undergraduate institution of the student (before their Stanford MBA)
  2. Their grades at Stanford
  3. GMAT score
  4. Prior wage before business school
  5. Personality
  6. Gender
  7. Specialization
  8. …so on.

The above function assumes that wages depend on these individual characteristics and perhaps the reaction of employers to these characteristics. But the dependency is between these traits and wages. 

Screen Shot 2017-04-27 at 10.09.27 AM.png

In the above graph, the circles (nodes) represent the MBA students (I’ve depicted 20). Large nodes represent individuals who may be high on the characteristics we described above, and vice versa. Thus, our reasoning focuses on how the nodes vary based on some characteristic.

Network Analysis’ Value Add

Network analysts take a different perspective. They propose a different type of dependency: that people’s outcomes depend on the types of people they have relationships with and/or the pattern of those relationships.

Screen Shot 2017-04-27 at 10.12.49 AM.png

This concept, that individual outcomes depend on a person’s relationships to others is not at all that new. This idea is as old as human history. However, what network analysis contributed was to provide a useful and tractable representation of this dependence among people and a way to empirically test the effects of such dependencies.

To summarize, the “new knowledge” that network analysis contributed was:

  • To strongly argue that these dependencies among individuals matter.
  • That these dependencies could be represented by a network (consisting of nodes, the elements of the system; and edges, the dependencies between the elements)
  • That analysis of these dependencies (e.g., summaries of or descriptions of patterns of) could help us predict the performance of elements of the system better than the individual-trait based approach alone.
  • That specific social mechanisms (basically stories) link certain patterns to certain outcomes through some well-specified chain of logic.

These are not simple problems. Theoretical and empirical issues related to this set of basic problems have challenged us for nearly a century now. As you can imagine, incorporating social relationships into the analysis of human and organizational behavior, will require a new way of thinking about human action and new methods to empirically validate our theories.

Before we get to the core problems of network analysis, it is perhaps useful to sketch a bit of its history and development.

Network analysis has its “origins” in many disciplines

If you are really interested in the history of social network analysis, check out Linton Freeman’s book “The Development of Social Network Analysis.”

Psychology

Jacob Moreno:  Invented sociometry, the network that we see today is a direct consequence of Moreno, he invented the sociogram which is a set of points that are connected by lines.  He used sociograms to identify leaders, isolates and uncover patterns of asymmetry and reciprocity. He discovered what we now know as the “star” network.

Kurt Lewin: Studied group behavior. His basic argument was that individual action in groups was constrained by the concrete relationships that existed between members of the groups.  He is often credited as being on of the founding fathers of social psychology, and the person who coined the term “group dynamics.”

Fritz Heider: Studied social perception and attitudes and developed what he called “balance theory” – we all know the basic mechanics of balance theory:

  • “A friend of a friend is a” …
  • “A friend of a enemy is a” …
  • “An enemy of an enemy is a” …

Balance theory was converted into mathematical form by Dorwin Cartwright (a psychologist) and Frank Harary (a mathematician)  — Harary is often credited with being one of the founders of modern graph theory.

As you can see all three approaches either directly used graphical or mathematical notation, or later were turned into a mathematical form.

Anthropology

Another parallel set of developments came in social anthropology – they conceptualized “social structure” as concrete relations between individuals in a society. SF Nadel, especially, theorized about the relationship between networks and “roles” in his treatise “A Theory of Social Structure.” A quote about A Theory of Social Structure from Britannica:

In his posthumous Theory of Social Structure (1958), sometimes regarded as one of the 20th century’s foremost theoretical works in the social sciences, Nadel examined social roles, which he considered to be crucial in the analysis of social structure.

The famous “Hawthorne experiment” was conducted in Chicago in the 1920’s — found that one of the best predictors of productivity was the “informal organization” of the plant—the pattern of personal relationships that people had with each other.

Sociology

The big revolution in social network analysis happened in the 1960’s and 70’s – and the primary protagonists of this revolution were located at Harvard, and led by Harrison White and at University of California – Irvine, led by Linton Freeman.  Much of the basic language, the tools and the theories we use today in network analysis was developed in this period.

In the 1980’s and 1990’s a group of scholars in management and organizational behavior entered the fray, and thus began the organizational social network revolution. These individuals include scholars who received their PhDs in business schools or sociology departments, but had some contact with the network theorists in sociology or sociologists who were hired by business schools. The names include people like: Ronald Burt at the U of Chicago, Daniel Brass at Penn State and later at U Kentucky, David Krackhardt at Cornell and later Carnegie Mellon, and Brian Uzzi at Northwestern, and Joel Podolny who was at Stanford.

Modern Network Analysis is Multi and sometimes Inter-Disciplinary

Today, network analysis is a multi-disciplinary, and sometimes inter-disciplinary enterprise. A lot of work has been done by scholars in a variety of disciplines. Many of the important theoretical ideas about what types of network should matter and why, were developed by sociologists (Ron Burt, for instance) and the further developed and extended by others in sociology (Fernandez and Gould) as well as scholars in management (Gulati, McEvily, etc.).

Concurrently, a large number of statisticians, including Stanley Wasserman, Tom Snijders, etc. developed methodologies for modeling the formation and dynamics of social networks. They developed models such as the p* models, Stochastic Actor-oriented Models, and much more..

The economists, starting with Charles Manski developed and theorized about methods that would allow for causal inference for network effects. Venkatesh Bala and Sanjeev Goyal (two economists from Cambridge, UK) developed and formalized a game theoretic model of network formation. Matthew Jackson of Stanford has pushed the development of formal models of network formation and “network games” forward along a variety of dimensions.

  • Most of the best network research today draws on many of these traditions. Research in organizational behavior that examines network effects must draw on the work of Charles Manski for guidance about the empirical validity of the network effects they estimate.
  • A large body of management research—those focusing both explicitly or implicitly—on network ideas draws on the ideas of sociologists  — both in the business schools and in the sociology departments.
  • Research in economics has drawn heavily on sociology—with or without citation—the most interesting intersection of this research is happening in the economics of education and labor, development economics, and finance.
  • Today, you will find “network” research in almost all the top journals in management, economics, sociology, statistics, and computer science. What is more, is that you will also find specialty journals focused just on network analysis (e.g., Social Networks and Network Science).

Network Reasoning –  Micro to Macro, back to Micro

An important feature of network analysis is that it gives us a way to think about both the micro (the behavior of the elements of a system, e.g., people) and the macro (society, organizations, etc.) simultaneously.

One of the most beautiful demonstrations of this is presented in the following graph:

Screen Shot 2017-04-27 at 10.43.33 AM

This graph comes from Mark Granovetter’s Strength of Weak Ties. Why is this triad forbidden? That is, why is this structure unlikely to occur?

To answer this question, we will need some balance theory. Let us assign a positive valance to the present strong ties (i.e., AC and AB) and a negative sign to the absent ties (i.e., BC). To get the sign of this graph, let us just multiply the signs of the individual dyads in the triad (AC, AB, BC).

  • The forbidden triad: (+)(+)(-) = (-)

Balance theory considers the sign of this graph to be negative. That means that it is unstable. For instance, if A and C are friends as are A and B, there is likely to be greater opportunities for B and C to interact and as a result form a tie to each other. This closes the triad and results in: a closed triad with the following structure: (+)(+)(+) = (+). On the other hand, if C and B cannot get along, then there will be conflict either between A and C or A and B, resulting in one of A’s  ties breaking, resulting in: a triad with a singular tie tie: (+)(-)(-) = (+).

OK, so what?

Well, lets take the perspective of A. In the forbidden triad, A is a “bridge” she is the only connection between C and B and as a result has access to information from two sources that might not have overlapping information. However, the forbiddenness of this structure means that it is unstable with strong ties. The position of A reverts to either Equilibrium 1, where A is no longer a bridge because she doesn’t have a connection to B (or C); or Equilibrium 2, where A is no longer a bridge because C and B have a connection to each other and no longer have to go through A to share information. Thus, A’s role as a passthrough bridge is diminished.

Screen Shot 2017-04-27 at 11.01.36 AM.png

This are very micro arguments. They are based on the psychological processes of individuals and their interpersonal dynamics. How do these micro processes translate into network processes at a larger scale (e.g., an organization, community or society.)?

Let us start with some assumptions:

One assumption we start with is that information is distributed unevenly across groups, and that different groups or cliques have different pieces of information.

This is not an unrealistic assumption. If you compare Berkeley to Stanford, people in the two places are likely talking about different ideas. Most people in each group do not have a complete understanding as to what ideas the other group is interested in or talking about. This is probably (or even more) true across companies, countries, different regional geographies, etc.

However, strong-tie bridges across these groups—according to our micro reasoning above—do not exist because of the two equilibria we described above.

Granovetter’s deep insight was that this problem of bridges not existing can be solved if the bridges are weak ties rather than strong ties. Weak ties, allow individuals to access information across disconnected clusters, where as strong ties, because they are embedded in cliques—e.g., exist within a cluster—only provide redundant information.

Screen Shot 2017-04-27 at 2.23.32 PM.png

What Granovetter (1973) showed was:

  • Weak ties are more useful for job seekers (that is, acquaintances) than are their close and strong ties (friends and family).
  • Weak ties provide access to novel information, not present within a cluster.
  • The relationship between tie strength and finding a job has less to do with the strength of the tie per se and more to do with the macro-structure of the larger network (e.g., the connections between clusters).

The beauty, I think that is the most appropriate word, of this theory is that it elegantly links a psychological process (balance theory) to the macro structure of the network (society or organization wide network), and then back to the individual outcome.  This type of reasoning allows us to represent the functioning of an important system, in a way that will be difficult to do with more atomized theories of human action.

Thus, our ultimate goal is to develop theories that link a person’s social network to some larger structure, then back again to individual human action.

Where can network representations be useful for analysis?

Most novice students of network analysis often begin with the perspective that a network is a real thing and as a thing it can become the object of analysis. However, this is not true. Networks are representations—and imperfect ones at that—of a very complicated target system. Because networks are indeed representations and not real things, an analyst can represent many different target systems using a network representation.

The most basic network representations consist of two parts: nodes and edges. Below, you will see a network called the “Kite Network.” For now, lets ignore the structure of the network and its properties, but focus on two elements. Networks consist of nodes (the circles) that represent the entities we are studying in the target system and the edges (sometimes called links) which represent the relationships between the nodes/entities.

Screen Shot 2017-04-27 at 2.46.22 PM.png

The edges in the network above are undirected, meaning that they have no direction. For instance, co-authorship is a relationship that is naturally undirected. Screen Shot 2017-04-27 at 2.49.37 PM.png

Above, I’ve taken the same kite network and made the edges directed. This means that there is a direction of flow (of information, etc.) between the nodes that is specified in the network. For instance, imagine if the relationship represented in the graph is “Seeks advice from” we could read the network to indicate that A seeks advice from B, but not the other way around. On the other hand, both B and E seek advice from each other.

Now that we have these basics down, we can use these two basic elements to represent many different systems:

  • The studying behaviors among students
  • The friendships among workers in a firm
  • The alliances among firms
  • The relationships among different units/teams within a corporation

The above examples are very pertinent to OB/Strategy. However, networks can be used to represent other systems as well:

  • The interactions between genes
  • The similarity among jobs in an organization
  • Shared funders among startups
  • The co-presence of two ingredients in a recipe

While a network representation is useful for all these very diverse situations, the underlying theory describing the functioning of these various systems is rather different. This is true for at least three reasons:

It is obvious that the kind of reasoning we use for each of these domains will be different for at least three reasons:

  1. The actions and outcomes of the actors in each domain are likely to be different. Students do different things from firms, and they both do different things from genes, jobs and ingredients.
  2. The mechanisms (e.g., the step-by-step processes) that link actions to outcomes are likely to be different across the contexts.
  3. Finally, the links between actors are qualitatively different across the domains, different types of information flow between nodes through these links, different amounts of information can flow, and different meanings are ascribed to the links.

The flexibility of the network representation allows for a critique that network theory is a free-for-all where anything goes because the actors, mechanisms, and links can be anything in any context.

While there is an element of this critique that is valid, I will argue in this class that the network representation is tremendously powerful and there is a decent amount of consistency in network reasoning across many different contexts and target systems. That is,  we can apply many of the the same types of reasoning, with modifications of course, to explain actions and behavior across a variety of contexts. Further, learning and insight from one domain can be applied to learn about another.

Krackhardt’s Levels of Analysis

Networks are rich in their expressiveness of social reality. As a consequence the analyst sometimes has to ignore many other facets of the structure/content of a network to focus analytical attention on one facet. A useful typology for network analysis, developed by David Krackhardt, is called the “Levels of Analysis.”  In his typology networks have (at least) four levels of analysis: Level 0 to Level 3.

The distinction across levels is important to make for several reasons, including the fact that:

  1. The theories are different
  2. The statistical techniques are different
  3. The data requirements are (potentially different.

Level 1: The node level of analysis

Consider the following graph:

Screen Shot 2017-04-27 at 3.26.16 PM.png

And this matrix, which was used to generate this graph.

Screen Shot 2017-04-27 at 3.26.25 PM.png

This is the “raw” data of the network.  This data can be analyzed in many different ways. One of the most common approaches in network analysis to focus on node level analysis, or Level 1 (it is called level 1 because if there are n nodes, the number of observations one has is on the order of n^1.)

Screen Shot 2017-04-27 at 3.27.11 PM.png

So far, we have been focusing on nodes—these are the actors whose behavior we are trying to explain.  More specifically, we are trying to explain the behavior of “Ego” (from the Latin I) based on the nature of or pattern of his or her connections to “alters” (from the Latin others).  Thus, in this case, our goal is to primarily take two kinds of measurements:

  1. Measurements about some action or outcome of Ego (our dependent variables)
  2. Measurements about the features of Ego’s connections to the alters in the network (our explanatory variables)

Thus, depending on the theory, we figure out how to quantify the connections that ego has to his or her alters, and see whether there exists a correlation between this and Ego’s outcomes.

  • Ego1       Outcome               NetworkMeasure
  • Ego2       Outcome               NetworkMeasure
  • .
  • .
  • .
  • .

There are generally two approaches to the Level 1 analysis. I would like to call one “structural analysis” and the other “peer effects” or “peer influence.” We will cover both in the class.

  • Structural Analysis/Analysis of Network Position defines the NetworkMeasure based on a summary of the pattern of edges in the network with respect to the focal node (e.g., the node whose outcome we are interested in).
  • Peer effects often ignores the structure and focuses on understanding how the characteristics of a focal node’s connections (e.g., the prior performance of a node’s connections) affect that node’s outcomes. For instance, this could be done by taking the average of the characteristics of the alter’s SAT score or some other metric.

As you can see, the data look pretty much like a traditional regression analysis at the individual level. We call this level (1) analysis because there are N(1) observations. For the number of nodes in the network.

The nice thing about both of these types of analyses, is that the statistical methods we use are ones that you should be quite familiar with as a doctoral student. While there are empirical issues in interpreting the coefficients from these models, the setup is pretty standard.  Most network analysis takes one of these two forms.

Level 2: The dyad level of analysis

Another class of problems requires us to focus on understanding the processes that led the network to take the structure that it has taken. In the static case, the micro-question is: Why is one connection present, while another one not present? In the dynamic case, the question might be reframed as: why do some ties persist, while others dissolve?

Screen Shot 2017-04-27 at 3.28.43 PM.png

This type of analysis is called Level 2 because in a network consisting of N actors, there are N(N-1) or ~ N(2) observations in the data.

The focus of Level 2 analyses is understanding why a tie or interaction, or relationship, exists between an ego and an alter.

For instance, the questions we can ask, include:

  • Why two workers decide to become friends.
  • Why two companies decide to pursue a research collaboration.
  • Why two scientists decide to co-author a paper.

The kind of information and often the types of theories we need are richer here than is often necessary at the L(1) level of analysis.

Can you tell me what kind of information we might need to make a prediction about whether two scientists decide to collaborate?

  • Characteristics about Ego
  • Characteristics about Alter
  • The interaction of the characteristics of Ego and Alter (e.g. whether they are in the same discipline, the distance from one office to the next, etc.)
  • The ties that exist indirectly between Ego and Alter.

Further, the methods we use here are much more complex than the ones used for the N(1) analysis, primarily because of  point #4. There are dependencies in the network that interfere with the presence/absence of a tie for a given pair of individuals. Consider the forbidden triad. It illustrates clearly that A’s decision to form a tie with C is not independent of C’s relationship to B nor independent of A’s relationship with B. Ignoring these dependencies could potentially bias our understanding of why a tie between A and B forms or does not form.

As a consequence, people have developed specific statistical approaches for testing theories at Level two – Multiple Regression – Quadratic Assignment Procedure, Exponential Random Graph Models, (ERGM), and then the older P1 models.

Here the analysis is conducted so that the data structure looks like:

  • Actor1   Actor2
  • Actor1    Actor3
  • Actor1   Actor4
  • Actor2   Actor1
  • Actor2   Actor3
  • Actor2   Actor4

The dependent variable is whether a tie exists between two actors (or whether some kind of interaction occurs, i.e. knowledge transfer). The explanatory variables in these models are the characteristics of ego, alter, their shared characteristics, and the other structures in which they are embedded predict this interaction.

Level 0: The whole network. 

Another level of analysis is the N(0) level of analysis, here the entire network results in only one observation. N^0 = 1

.Screen Shot 2017-04-27 at 3.30.16 PM.png

The goal of Level 0 analysis is drastically different than the goal in the first two levels of analysis. Here, the analyst is trying to understand  how the entire social network and its configuration affects, the outcomes of the system as a whole. This is an interesting and exciting level of analysis, and there are very few studies that have been conducted at this level.

First, we have to have network data on enough networks that we can do a network analysis.  That is hard in itself. The best research of this type has been done by people studying teams (e.g. Ray Reagans, Ezra Zuckerman and Bill McEvily). In many respects, the small groups research has also looked at this level of analysis going back to some very early work by Bavelas.

In Level 0 analysis we are trying to do is look at the entire network and what it represents (e.g. an entire organization) and relate it to the organizations’ outcome.

Thus, we need theories and measures that can summarize the macro structure of the network and link it to organizational performance.

For instance, a class of problems may include:

  • How does the internal network structure of a start-up firm affect its ability to come up with innovative ideas. We would need:
    • A set of startups; say 75 or more.
    • We look at some measure of the start up’s innovative output.

An example analysis might be to measure the startup’s internal network structure, and then conduct a regression analysis linking the outcome to some measure of the internal network structure (e.g. what proportion of the people have ties to each other i.e. density).

A well known study at this level is Reagans, Zuckerman and McEvily, who found that project teams within an organization who have high density are more effective (they finish their projects faster) than project teams who have low density.

Level 3: Cognitive Social Structures

Finally, another area of research within social network analysis recognizes that networks are indeed representations and imperfect ones at that.  This is called Level 3 analysis–this is because we have on the order of N*N*N or N(3) observations for use in our analyses.

Consider three graphs, and an organization chart from Krackhardt (1992).

The top-left graph is the “actual” advice network at the firm. By actual, I mean that these are the relationships people say they have with others. The top-right is the actual organizational chart. Note that the organization chart and the advice networks are imperfectly related to each other.

However, once we go to the bottom panel, we see how important cognition and representation is in the network story. On the bottom-left, we see Chris’ representation which is not perfect, but it is not as bad as Ev’s (Ev is a Manager). In terms of human action, people might behave in concert with the network on the top-left (the “actual” network), but also might behave in concert with their own perceptions.

This cognitive angle is critical in network analysis. Cognition links actual structure (if it really exists) to action and then to outcomes. Think of the faux pas.

Conducting Level 3 analysis. his requires collecting data about perceptions and theorizing about how perceptions matter independently and interactively with the “true” structure.

With Level 3 analyses we have N people who have perceptions about N x N-1 relationships.  Resulting in potentially on the order of N^3  observations. In practice however, most of the modeling is done at the node level. Though this is an active area of research and much can be developed here.

Summary

You should have a pretty general overview of the kinds of problems we will be covering during the course. By the end of the course, you should be able to conduct and extend these types of analysis for a wide range of domains and levels.

Topics in Social Network Analysis, PhD Syllabus

This course is designed for PhD students in management, organizational behavior and strategy who are interested in applying network ideas in their research. The course will provide an introduction to applied network theory and empirical methods. Over the 6 sessions of the course, students will learn:

  • The basic building blocks of most network theories and how they have been applied in various empirical contexts.
  • How to collect network data, visualize it and calculate basic network statistics.
  • Formulate and test hypotheses drawing on network mechanisms.
  • Understand the broad uses of network analysis in the study of organizations and strategy.

Course Requirements

  • Attendance and Participation (30% of grade)
  • Theoretical Integration Paper (30% of grade)
  • Research Proposal (40% of grade)

April 28, The Foundations of Social Network Analysis

New knowledge is anything that allows you to predict some outcome more accurately than before. The enterprise of network analysis is one example of a focused search for new knowledge. Network scholars seek to find patterns in human relationships that explain important outcomes—health, economic, and political—that are ignored, non-obvious, or run counter to conventional wisdom. The readings for this class helped set the stage for the network revolution in the social sciences. They articulate, very clearly, what our prior assumptions were about how the world worked, and systematically showed us that we should think differently.

Check out the post on how to get started with network analysis in R. 

Readings:


May 5, Network Position and Performance

Part 1: Structural Holes; Part 2: Status

The most frequent use of network analysis has been to examine the relationship between network “position” and the performance of people and organizations. This line of research has produced exciting and important ideas, including those of structural holes, status, and closure. Network ideas have also helped scholars reformulate ideas about power, leadership and identity. The readings from this class will introduce you to some of the central ideas about network positions and their relationship to performance outcomes such as innovation or promotion.

Readings:


May 12, Peer Effects

Theories of network positions are built upon individual-level assumptions regarding informational content and knowledge transfer. Yet, until recently, rigorous empirical evidence for information transfer and learning at the dyadic level has been scarce. In this class we will dig deeper into the growing literature on peer effects and examine when we can expect to observe knowledge transfer, and how to evaluate the quality of evidence.

Readings:


May 19, Network Formation

Are there general patterns in how networks are shaped? What forces lead these patterns to emerge and what are the implications for social processes that we care about (e.g., the generation of innovations)? In this class we will cover some core ideas behind the formation of social networks including homophily, triadic closure, reciprocity, and at the macro-scale small worlds and clusters.

Readings:


May 26, Network Cognition, Activation and Team Structures

There is a lot more to networks than classical formulations of network effects as “positions” or as “peer effects.” Scholars have creatively shown that how people perceive networks also affects their performance, how the overall structure of a team’s internal and external networks affects team outcomes.

Readings:


June 2, The Future of Network Analysis

Your final project presentations go here.


Syllabus header information

Stanford University
Graduate School of Business

OB622, Spring (Second half) 2017

Professor: Sharique Hasan, Associate Prof. of Organizational Behavior, Stanford GSB
Office: W239 (KMC, Stanford, CA)
Email: [firstname]@stanford.edu

Times:  Friday from 1:30 to 4:20 PM
Room: GSB Bass 301