# 44 tools and resources for social scientists

Over the years I’ve gotten great tips from colleagues and students about the tools that have helped them become more productive researchers.  Below is a list of the 44 tools and resources that have changed how I do research.

## Get teched up

Excellent technical skills are the bedrock of a successful research career. Today, publishing requires both the understanding of theory and the ability to tease out meaningful insights from complex data sets. Even before you start your Ph.D., tool up. Here are four resources to get you started

• R: is perhaps the only statistical programming language you really need to know. It is free, comprehensive (you can do visualization, machine learning, traditional econometrics, or write your own custom algorithms.) Even if you don’t use R that often, it is one language that all social scientists must learn.
• R Studio: will make it easier to use R, and for some use cases, R studio is substantially faster. For academics, R studio is free so its worth a shot.
• Machine Learning A to Z: Hands-on Python and R In Data Science: OK, you so you don’t know R and can’t figure out where to start. Start here. This is a comprehensive online course to get you up-to-date with some of the major functionality in R and Python (the other language of data science).
• Stata SE or MP: If you are an economist, or generally estimate Y = B0 + B1X type equations with worries about clustering standard errors or endogeneity. R may sometimes feel like overkill. Stata is my go-to software for most of my analysis. I learned Stata by working with a collaborator. With great Stata code, you can go from your raw data to publication quality tables with a press of the “do” button.
• The complete web developer bootcamp: So you want to do online field experiments, but you can’t build a website? Fret no longer. I recently completed this Udemy course called “The Complete Web Developer Bootcamp” and was up and running with an excellent quality web application in just two weeks. If I can learn it, you can too. In this class, you’ll learn about some cool environments like c9, mLab, and Heroku that can get you started on a slick web application with little setup time.

## Communicating more effectively

Academics have two products. They write and they present. By honing these two skills, you can become a star. Polish your writing and presentations skills with these resources. I’m always on the lookout for great resources to help me improve my writing.

• The Art of Styling Sentences is the book I recommend to all my Ph.D. students. Like most skills, you can improve your writing dramatically by following a few simple rules. Check out this book if you want to have prettier sentences.
• Ninja writing:  The Four Levels of Writing Mastery: Mark Twain once said that the difference between the right word and nearly the right word is the same as the difference between lightning and the lightning bug. Shani Raja’s Ninja Writing and Writing With Flair have given me some handy tools for editing my academic writing.
• Writing With Flair: How To Become An Exceptional Writer: This is a superb class, and worth every penny.
• Hire a copy editor on Upwork: When I got my first “conditional accept” at a top journal, the editors asked me to get my paper professionally copy-edited. I was offended. Today, I almost always send my paper to a copy editor before I submit it to a journal, and always before I send the final version in for publication. I’ve used several copy editors throughout the years, but if you are looking to experiment with finding a copy editor, try Upwork.
• How to make a great presentation and TED’s secret to great public speaking: When I was a graduate student at Carnegie Mellon, I was fortunate enough to take a class on public speaking taught by the late Pamela Lewis. Her insights on how to create a powerpoint presentation, how to present ideas, and how to make your ideas stick have been invaluable. Today, you can find excellent resources on public speaking online. The founders of the TED conference, where slick presentations abound, have great resources to help you improve your presentation skills.

## Develop a writing workflow

Building a process for clear and well-produced writing is paramount for success. Here are a few tools and resources that can help you improve the quality of your written work.

• Latex: Robert Hall, the Stanford economist in an article about becoming a professional economist said: “Pay close attention to the appearance and dissemination of your work. I hold the following controversial view that my economist wife thinks betrays a lack of spiritual development: There is a separating equilibrium between researchers who put out nicely typeset papers in Latex and those who struggle with the infirmities of Microsoft Word.”  Learn Latex, your readers will appreciate it.
• Overleaf: Once you learn Latex, start using Overleaf. It is like GoogleDocs for Latex and helps you write beautiful latex manuscripts in a collaborative environment. Version 2 is even nicer with the ability to add comments.
• Grammarly: I am a fast typer and sometimes I forget to include words in my writing. Grammarly is a bit pricey, but I use it all the time so it has been worth it for me.
• Ulysses: I started using Ulysses a few years ago to organize my writing. Ulysses is the app I use when I want to start on a project or work on a revision. Its a great tool for breaking up a long project into manageable chunks.
• Grammark.org: Another automated (but free) grammar tool. It is worth a try and it is especially useful for writers who struggle with wordiness.
• Endnote: Some people still use endnote to organize their bibliographies. I don’t.
• Mendeley: Mendeley is a free bibliography tool. The best part of Mendeley is its integration with Overleaf.
• GoogleScholar BibTex export: I write in Latex, and Google Scholar has a BibTex output feature that lets you cut-and-paste a BibTeX bib from GoogleScholar. Beware though, sometimes things missing or journal titles are awkwardly capitalized. But if you develop a good process, you only need to fix the errors once.

## Streamline processes

Improve the processes around your work. Get the technology that reduces duplication, idle time, and inefficient

• Dropbox: 8 years ago, I used to email my in-progress manuscripts to myself at the end of the day so I could work with them on my home computer. Today, Dropbox has been the singular technology that has reduced the amount of digital shipping waste that I create. I usually keep one copy of a file—whether it is my data, notes, or code. I can work on these assets from anywhere. How amazing.
• Google Docs: Writing a collaborative proposal? Responding to a reviewer letter? GoogleDocs has helped speed up these collaborative tasks.
• Sublime: Need a simple text editor with lots of power? Sublime is amazing and I find myself using it every day for the little bits of text work that I have to do.
• Master your email and calendar: A few years ago I realized that I sometimes checked 4 email addresses a day: an old Yahoo address, my work email, my new Gmail account, and a Gmail address for subscriptions or junk mail. Now, I’ve got one email address. I probably got 30 minutes of time back.

## Learn to delegate

The best professors know how to break up their work into modular chunks and delegate it to others. This frees up time to do more important things. Sure, it might be instructive the first few times to clean your data, to do a preliminary search of the literature, or develop a website for your research project. But its worth learning how to delegate these tasks so you can put your energies to more creative uses.

• Hire someone: Learn to break up your work into modular pieces and delegate the stuff that is probably not worth your time. Delegating is perhaps the master skill of the productive academic. If you want to learn how to delegate better work with someone who does this well. Start by hiring an undergraduate research assistant and get them to work on a small project. Remember to manage the work effectively, ask yourself these four questions.
• Hire someone on Upwork: You can hire people to do almost task on Upwork. I’ve gotten people to copy edit my articles, build a citation database on a topic I wanted to learn about and to scrape data from a website. Start with small projects and build your delegation skills on a platform like this.

## Get data and develop a process around it

Every great recipe needs great ingredients. Data gathering is a first-order skill that every social scientist should master.

• ICPSR: An easy way to get data is to download that someone has already spent the time, effort and money gathering and cleaning. ICPSR is a great starting point on your data gathering journey.
• Compustat: If you study publicly traded firms, you should learn how to use Compustat.
• Qualtrics: Learn how to run a survey on Qualtrics. You can launch a survey in just a few minutes and start collecting responses using a platform such as MechanicalTurk.
• MechanicalTurk: You need a quick and cheap subject pool? Try MechanicalTurk. There are some good tutorials online.
• SurveyMonkey
• Google Customer Survey: Google also has a survey service that you can use to ask questions from a nationally representative sample of Americans.
• Learn to scrape a website: Learn how to scrape
• Talk to people: The best data is often not easily available online. Talk to people in your field or in the real world at companies. You might find a gem that turns into a great research paper.

## Create a comfortable workspace

A laptop is all you really need to be a great social scientist. But a good workplace can definitely improve your productivity.

• MacBook Pro: I stopped using PCs about a decade ago and my go-to computer is a MacBook Pro. The MacbookAir is a good entry point for a social scientist looking for a computer that can handle most of the software you need to do statistical analysis and academic writing.
• RAM is a barrier: If you can afford it, go for a computer with a great CPU and lots of RAM. But it is worthwhile buying more RAM for your computer as your data sets increase in size.
• Good monitor: Monitors are cheap. You can find large and high-quality monitors anywhere really.
• Get another monitor.
• Keyboard: Get a comfortable keyboard. I use the Microsoft Sculpt Ergonomic Wireless keyboard at work.

## Keep learning

Here are two hacks to help you get up-to-date on the academic literature and also find interesting ideas in the popular literature.

• Audible: If you go to the gym, have a commute, or want to learn something new at the end of the day, start listening to audio books. A few years ago I got a subscription to Audible. I’ve been able to learn a ton of new things about topics I didn’t know much about on my drive home. The selection of audiobooks available today is remarkable and you will surely find books that relate to your existing research area or a new area you would like to explore.

## Life

Finally, there is more to life than research. Here are some resources that may be useful for young social scientists beginning their career.

• Personal finance: Be good with money. Stanford CS has a great class on personal finance called cs007. There are a lot of great personal finance blogs out there. My favorite is the Financial Samurai. I like it because it has lots of facts and figures and helps me benchmark where I should be at my career stage.
• Experiment: Try little things and see where they take you.
• Meditate: I recently listened to a great (and funny) audiobook on Buddhist meditation. It got me to try it out meditation and now I frequently use the headspace app.

# Where do networks come from?

The key assumption underlying both the peer effects and structural approaches to network effects assume some degree of exogeneity in the existence and structure of network ties.

Exogeneity is both a theoretical claim as well as an empirical assumption. All reasonable theories are built on a set of axioms that assume some primitive or exogenous features of the world or of the target system which is being analyzed.  Many models in economics, for instance, assume that preferences are exogenous. From these preferences, we are then able to then derive things like behavior, choice, “roles” as well as the structure of social relationships.

Similarly, some sociological and anthropological traditions start with axioms that assume that “roles” are exogenous. These roles—e.g., the position a individual occupies in a social structure—govern behavior, preferences, as well as social relationships.

Much of the network analysis we’ve been conducting or discussing thus far also has an exogeneity assumption built in. The primitives are social relationships and their structure. All other things we observe such as behavior, preferences and roles emerge from the pattern of exogenous network ties. In the lectures on structural holes, status and peer effects, we argue that the pattern of social relationships cause in differences in behavior, preferences, as well as roles and not vice versa.

### The challenge of network formation

However, a challenge for the social relationships first perspective is that networks are unlikely to be fully “exogenous.” They form and evolve through certain processes that make some people more likely to connect to each other, and make some people less likely to do so.

Network scholars have spent considerable time on trying to understand how networks form and change. At a broad conceptual level, we can think about five factors that shape whether a tie between two individuals—e.g., ego and alter—forms.

The logic behind most models of network formation is simple. At one end, there are “benefits” whether actual or perceived as well as pecuniary and non-pecuniary/psychic  for connecting with someone. At the other end, there are “costs” which make it either easier or harder to form a relationship with someone, either because searching for them, coordinating with them, or potentially dealing with them is more costly than with someone else. Relatedly, some individuals may have a lower cost of building a network than others and/or it may be lower cost (relative to benefit) to connect with someone.

### Factor 1: Characteristics of Ego, the sender.

Characteristics encapsulated in “Factor 1” include a range of factors that make it easier for certain types of people (e.g., those who have a certain characteristics themselves) to connect with many others. This characteristic may include things that either make it easier for these people (relative to others) to make many connections or perhaps provide them greater benefit from doing so. Research in this stream has found a substantial range of characteristics that vary at the individual level, that also predict an increased or decreased propensity to have a certain type of network surrounding them. These things include:

• Personality: Some work has found that differences in personality traits are correlated with network structure. For instance, individuals who have many ties are also likely to have Extroverted personalities. Relatedly, those who are high in “self monitoring” also have a greater likelihood of being “brokers” or occupying “structural holes” in a social network.
• Other factors that may also be related to larger networks include:
• Strategic intent
• Intelligence
• Physical characteristics (e.g., beauty or height)
• Age
• Some factors may be describe an individual at a certain point in time:
• After the loss of a job
• After being promoted to a new role
• Other factors may be socially constructed, but describing the Ego in a given context:
• Caste
• Religion

One can reason about the various ways in which these characteristics of Ego either lower their costs of making ties or increase the benefit they get. Can you come up with other individual-level factors that might matter?

### Factor 2: Characteristics of Alter, the receiver.

A related set of arguments can be made about the characteristics of an alter or alters. For instance, one could theorize about the following characteristics of alter(s) that may make them more likely to receive connections from others.

• Personality
• Intelligence
• Skill
• Wealth
• Social standing
• Formal role in the organization

Like the Ego-centric perspective, one could logically use a “cost” and “benefit” perspective for reasoning about why some Alter may have more advice seekers (e.g., they are smart) or more friends (e.g., they are helpful). In purely altercentric models, we ignore the characteristics of Ego.

### Factor 3: The interaction of Ego/Alter characteristics (e.g., homophily)

The 3rd Factor is one related to the “Ego-Alter” interaction. In such models, there is something about the characteristics of Ego and Alter together that predict an increased or decreased propensity to have network ties. The most common theme in these models is homophily or the tendency for individuals who are similar to each other to have a higher propensity to connect. Research has found that individuals who are similar in the following characteristics are more likely to connect with each other, relative to the alternatives:

• Race and ethnicity
• Gender
• Age
• Formal organizational position
• Occupation
• Religion

There are many theories about why such a preference exists. On one hand, social contexts (e.g., communities, neighborhoods, etc.) are often organized by these characteristics. This makes it much easier to connect with people who are similar to you. There is also an element of choice. Individuals who are similar to you are likely have similar experiences, share similar values, and like and dislike similar things. As a consequence, the costs of interacting with similar people is likely to be less than interacting with people who are different.

However, the type of relation may matter here. In mating networks you are more likely to see heterophily than homophily. This might also be true of mentoring relationships, where individuals are more likely to be mentored by those of a different level of senority than them.

What other factors at this level might increase or decrease the cost of interaction or raise its benefits?

### Factor 4: Social and Physical Context

The fourth factor can broadly be thought of as the social or physical context within which individuals are forming social networks. A simple example is office or neighborhood layout. A substantial amount of research has found that physical distance has a substantial effect on whether two individuals form ties. Scientists who are nearby, for instance, are more likely to collaborate and their research trajectories also become rather similar.

Research has found that there is a exponential relationship between physical distance and the propensity to connect. This effect is called propinquity. Individuals who are physically proximate are substantially more likely to interact, followed by steep declines in the rates of interaction as distance increases.

In addition to propinquity, other aspects of the social context are also likely to affect the extent of tie formation. These factors could be the reorganization of roles, task inter-dependencies, as well as cultural or organizational norms regarding competition or collaboration. Incentives are also important in determining what the shape of the network might be. The challenge with many of these effects are that they are often “absorbed” into the intercept of the model. That is, they are only able to be detected when looking across contexts, but not within context.

### Factor 5: Endogenous Network Processes

Finally, the structure of one part of the network may affect the structure of another. Consider a simple example: Reciprocity. If I consider you a friend. There is a social-psychological as well as a sociological process that also increases the likelihood that I consider you a friend. This is akin to tit-for-tat. If you give me a gift, I will give you one in return. Networks exhibit this property with substantial regularity (but not always!). In this context, the emergence of a network tie, the reciprocal one, is endogenous to the network. That is, it emerges from within the network structure and not outside of it.

Similarly, there are other endogenous network processes that others have detected in networks. These include factors such as transitivity. For instance, a friend of a friend is often a friend. Heiderian balance theory, for example, argues that individuals desire balance in their relationships. The situation of being friend’s with your friend’s enemy is unsustainable according to balance theory (why?). Because it is, that structure will endogenously change into something else–either the enemies become friends or  the network splits.

Other forces include preferential attachment. New entrants into a network are proportionally more likely to connect to individuals based on the size of their degree centrality. This process gives some networks a power law distribution, rather than a binomial/normal distribution that would be expected if the network was formed through a purely random process.

Power law distribution

Normal Distribution

### Empirical considerations

Though the theoretical ideas behind network formation are quite straightforward, disentangling the differential impact of these effects remains quite challenging. In a subsequent post, we will discuss the various approaches to estimating these models.

# Peer effects, knowledge transfer and social influence

The structural approach to social networks is inherently beautiful as a representational approach. I am always in awe of the fact that we can learn so much about how human beings act or their outcomes based merely on the pattern of their social ties. The idea is both simple and profound.

The structural approach is built on assumptions regarding information transfer across a simpler unit of analysis: the dyad. In the world of dyads, new complications arise and different theories must be developed and tested.

Let us take the Professionals data we have been analyzing as an example. Here is the advice network among these professionals.

In the prior analyses, we have focused on analyzing the structure of each node’s connections.  For example, each node has a specific number of incoming connections, its outdegree:

The beauty of the structural approach to social networks is that we can learn a lot about the outcomes of individuals and organizations by merely looking at the pattern of their relationships. Recall our prior analysis. There is information in indegree. We were able to explain 6.5% of the variation in our measure of whether a person has the “knowledge to succeed” just by looking at the count of their incoming connections! While indegree may capture or reflect other processes and might not be causal, it is nevertheless information rich.

However, an Ego’s alters (e.g., the people that a focal node is connected to) are not all the same—as we sometimes implicitly assume in our models. As a note, I don’t believe that researchers actually believe that all the people we are connected to are the same. Indeed, betweenness, closeness, eigenvector centrality, all assume that not all connections are the same by their very construction. However, the heterogeneity in alter characteristics is implicit rather than explicit because we never specify in our theories or models, exactly how these individuals vary.

The peer effects framework on the other had often ignores variation in structure, but emphasizes variation in the characteristics of connections.

Below, I walk through some examples of this approach.

### A simple model of peer effects

The “peer effects” framework is called as such because it is based on a line of research in the economics of education where scholars were attempting to understand the impact of classroom peers on academic outcomes. Hence, peer effects.

Let us start with a simple setup. Let us assume there are 100 students in a classroom. The teacher has decided that everyone in the class will have a study partner, so he asks each of the students to pair up into groups of two. There are now 50 pairs, each with two people. The teacher wonders, whether having a smart peer (i.e., alter) increases the performance of for a focal student (e.g. Ego). Visually, he is interested in understanding this influence process:

At the end of the class, all of the students take a standardized exam. This exam is scored on a 100 point scale, and students can get anywhere from a score of 0 to 100. The teacher takes this score and runs the following regression with 100 observations, 1 for each student. She’s also good with standard errors, so she clusters standard errors at the level of the dyad:

$score_{i} = \beta_{0} + \beta_{1} score_{j} + \epsilon$

After running the regression, she finds a large and statistically significant coefficient for $\beta_{1}$. How should she interpret it?

A naive causal interpretation is: for every unit increase in $score_{j}$ there is a corresponding $\beta_{1}$ increase in $score_{i}$. Or, by having a study partner with a certain score, there is a corresponding increase/decrease in the performance of the focal student. This interpretation is naive for a reason, because is probably (though not definitely) wrong.

But before we dive into why it is probably wrong, it is useful to reiterate that this “peer effects” representation is quite general. For example these outcomes might be determined in part by the influence of peers (however defined).

• Finance: Putting money away into a retirement savings account, adopting a microfinance product, etc.
• Health behaviors: Obesity, Happiness, use of HIV/AIDS test, etc.
• Entrepreneurship: Becoming an entrepreneur; deciding against becoming an entrepreneur.
• Careers: Quitting; moving to a new company.
• Adoption of behaviors: Smoking, drinking, sexual events.
• Adoption of ideas: Learning from patents.
• Organizational behavior:  Adoption of corporate practices and policies.

The basic idea is simple: We observe some level or change in the behavior or characteristics of an alter (or alters) and we see whether these are correlated to the behaviors or outcomes of Ego.

This apparently simple process is much more nuanced and complicated than it appears. There are dozens of “mechanisms” that can lead to the correlation we might observe (or that the teacher observes. Here are some examples of a few reasons why we might observe a correlation, either positive or negative. Consider the case of product adoption.

Can you think of more mechanisms?

### Which mechanism is actually at play in a specific context?

This question is a hard one. Because we have several potential mechanisms that we must work with, how do we rule out some of them? Some mechanisms are easier to rule out then others, but most are actually quite difficult to conclusively confirm or deny.

To deal with this issue (which is VERY common during the review process) I have come up with a two part classification. The first set of mechanisms are what I call “pseudo-mechanisms.” Pseudo-mechanisms are alternative explanations of the correlation that have nothing to do with social influence of the type we care about: influence flowing from the peer to the focal individual. Charles Manski, in a famous paper has defined these as the reflection problem and the selection problem.

Reflection problem: The reflection problem asks you to imagine a mirror. You see two objections moving. And if it is unclear to you that you are looking at a mirror, then you can’t tell which one is the actual person who is moving and which one is the mirror image. More formally, imagine that we have two sets of variables, let us call them  x and y; let x be the measurement of the characteristics of individual ’s peers’ characteristics at time t and let y be the measurement of the focal individual ’s characteristics at time t. Now, because of the simultaneous measurement, we are unable to tell whether the change in x’s characteristics has caused a change in y’s characteristic, or vice versa. And this indeterminacy exists for each observation.

Furthermore, we are unable to tell whether each of these actors was exposed to some environmental shock (advertising, etc. at the same time, which make their adoption correlated). The only way that we can insure that the reflection problem is not an issue is by measuring the traits and characteristics of the xs prior to measuring those of y.

However, solving the doing this does not resolve the issue of causality. Thus, it is a necessary, but insufficient condition.

Another important, and much more difficult condition now has to be met in order for the effect to have the title “Causal.”  This is the selection problem. The set of conditions that solves the selection problem are twofold:

1. Either you know all the reasons why two people were paired together (i.e. why person y is friends with, shares a room with, enters the college as, with x).
2. OR the two individuals are randomly assigned, and thus breaking the correlation between the characteristics of x and y.

Assume for a moment that we have ruled out reflection and selection effects by (1) using a lagged measure of peer consumption or action, and (2) the ego and alter are randomly paired, we have only ruled out a handful of possible “mechanisms” producing the peer effects. We can rule out the “pseudo-mechanisms” #8 – #13 (except for #11), but that leaves us with 8 possible mechanisms.

Imagine a doctor telling you that “Yes, we’ve ruled out the fact that you are faking your symptoms, but there are 8 or more possible viruses that could be causing your infection!”

So, we need to now try and distinguish between these.

This is hard, even harder than resolving the reflection and selection problems.  The reflection and selection problems are interesting in that they are hard problems to solve, but we know how to solve them. Not to make too many medical analogies, but this like separating conjoined twins. Hard, but someone can do it and has done it.

So how do we distinguish between different mechanisms, say #1 – #7?

This will depend a lot on context, and a lot on the data that you have available.

Let us examine a very simple situation where we have two students. Let us call the first student “Ego” and let us call the second student “Alter.” Assume for a moment that we have completely alleviated the problems of reflection and selection.

Let us say that really there are two contender mechanisms.  (This is probably not true; but, for a moment assume that it is true.)

Mechanism 1: A student learns general study habits from his/her peer (alter) and this why his performance increases.

Mechanism 2: A student interacts a lot with his/her peer (alter) and they study together, and the peer helps the student learn the material.

How would we go about designing a test that would distinguish between these two mechanisms?

1. For instance, if what the student is getting from her peer is increased motivation, that should have a positive effect on various subjects.
2. On the other hand, if the student is learning something rather specific (like how to do an integral), then the effects should be subject specific.

Assume you do this test, and you find out that there are effects across subjects, what can you say about the mechanisms? Can you say anything?

### How to conduct the estimation in R

Standard peer effects estimations are quite straightforward. This is especially true when you have randomization in the pairing of focal individuals to peers and longitudinal data so you can lag the characteristics of the peer.

$score_{i,t+1} = \beta_{0} + \beta_{1} score_{j,t} + \epsilon$

Here is a synthetic peer effects dataset in which 2000 individuals have been randomly paired: peer_effects.csv.

Let us examine the extent to which there are peer effects.

The model we want to estimate is:

$postself_{i,t+1} = \beta_{0} + \beta_{1} prepeer{j,t} + \epsilon$

Estimating this equation in R with this data results in:

If the randomization is proper, this coefficient should be stable if we control for the focal individuals own pretreatment score.

Another worry we have is whether this effect of the peer (captured by the pre-treatment characteristics) is homogeneous or heterogeneous. That is, does it depend on the characteristics of the focal individual or does it apply to everyone? To test this, we include a main effect of the characteristics of the focal individual (self_char) and an interaction term (pre_peer * self_char).

Here, we see that the peer effects depends on the characteristic of the focal individual. If the focal individual has this characteristic (e.g., willingness to listen), the peer effect is larger.

This is only a simple demonstration of the complexity of peer effects, there are likely to be many interactional factors that turn peer effects “on” or “off” or modulate them in some important way. One could imagine the following contingencies, where peer effects depend on characteristics of:

• the focal individual
• the environment
• the alter/peer
• personalities of both

# Entrepreneurial networks

Who is this? Keep this face in mind, at least for a bit.

A major breakthrough in our understanding of the social nature of competition came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens.

His, very powerful, argument to us was to think about “structural holes” as “opportunities.

That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged.

The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge.

This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes.  Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999).

On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties.

### The Control Benefits of Structural Holes

Let us examine the control benefits first. In the first representation, who has control?

Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics.

While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking.

At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them.

If you are in the third structure, there are two specific control benefits that you have:

• The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you.  For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay.
• The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other.

These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill.

### The Information Benefits of Structural Holes

All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits:

• Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups.  Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it.
• Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others.  Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc.
• Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness.

### The Structural Holes in DNA

Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes.

This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA.

Here are some quotes about the quest for the structure of DNA from the Nobel Prize website:

In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.”

…Nobody had the slightest idea of what the molecule might look like.

In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together…

As in the solving of other complex problems, the work of many people was needed to establish the full picture.

Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick:

“Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity.  This was not because he thought it uninteresting. Quite the contrary.

Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation.  At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science.

The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.”

Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA.

Break #1:

At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA.

“I proceeded to forget Maurice, but not his DNA photograph.”

Break #2:

A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself.

Break #3:

Knowledge about Chargaff’s rules through is doctoral training in Indiana.

Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA.  Because of his position, he was able to put these pieces together faster than anyone else.

All three processes helped Watson:

• Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise.

Luck? No. Social Networks.

Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit.

• Decreasing returns to network size:  If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection.
• Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided.
• Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly?

In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get.

Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way:

Efficiency = Effective Size / Actual Size

Expanding this function out, we can define:

Actual size = The number of connections that you have.

Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections.

Bandwidth and Diversity

The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff.

On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information.

Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties:

1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important.
2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need.

However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare.

### Extending the Core Insights from Structural Hole Theory

As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements.

Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time.

Two processes often lead to decay:

• Disintermediation: Disconnected parties learn to exchange on their own.
• Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists.

Why bridges decay:

• -Low performance / High performers have lower rates of decay for bridges
• If other relations are decaying, bridges are also likely to decay
• Experience bridging improves the chances that new bridges survive
• “Hole decay” may be limited when:
• Deep barriers limit interaction across the hole.
• The benefits to the bridged parties is high enough and switching costs are high.
• The bridged individuals don’t question the role of the broker, or it is not salient to them.

Beyond Information and Control

There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations.

•  In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us.  This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.”
• Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance.

### Outcomes as Mean versus Variance

The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome.

However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first.

Which would you prefer below?

James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well.

With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low.

The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out.

The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine.

The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones.

# Network Positions and Advantage: Status

One of the most important things we do on a day-to-day basis is make predictions about the value of individuals or companies, or really, any entity.  Making such predictions is challenging because we have limited information about the qualities of the entity we are attempting to make predictions about. For instance:

•  A hiring manager at a firm is trying to make a prediction about whether a certain applicant will be a high performer.
• A PhD admissions committee makes predictions about whether an applicant to their program will turn into a star researcher.
• A venture capitalist makes predictions about whether a startup or founding team will create a breakthrough product that will become a billion dollar company.
• A search engine is making a prediction about whether a certain webpage contains useful information for its users.
• A consumer makes predictions about the quality of a product before he/she buys it.

Predictions of this type are commonplace and often rather difficult to make. This difficulty exists for two reasons. First, only a limited set of characteristics are observable to the decision maker, whereas much else is unobservable. A hiring manger, for instance, may observe a resume and a list of references. Based on this resume and reference list, she attempts to make an inference about many things: how hard working the applicant is, their base of knowledge, their ability to get along with other members of her team, and so on. Thus, the hiring manager attempts to use “observables” to infer something about the unobservables.

The goal therefore is to map observables (the things that you can easily measure and observe about someone or some organization) to unobservables. What are some examples of unobservables and/or things that are difficult to observe:

• Creativity
• Whether a person you hire will “fit” with an organization’s culture
• Whether a company you invest in will turn a profit
• Trustworthiness

The inability to effectively communicate information about these hard to quantify traits from one person to another becomes a problem for both the evaluator and in many cases for the person being evaluated, particularly if they are high quality, but others can’t tell this is the case.

That is, how does one separate the signal from the noise?

One solution proposed to this problem is signaling theory. People send signals and these signals contain information that allow “buyers” to ascertain whether the seller (a job market candidate) is of high quality or not. But anyone can send signals, and sometimes the signals are noisy or uninformative. If the signals are no good, then they don’t solve the asymmetric information problem.

Michael Spence argued that some signals are harder to acquire than others, and this difficulty in acquiring the signal is related to some dimension of underlying quality.

For instance, a hiring manager might be looking to hire someone with great machine learning talent. Anyone can put “machine learning” on his/her resume, so merely doing so isn’t likely to be a very good signal of having that skill. However, it is probably easier to win a Kaggle competition if you have good machine learning skills than if you do not. As a result, those with more machine learning skill are more likely to be represented among Kaggle winners than those without that skill. Thus, winning in Kaggle is likely to be a decent signal of ML skill. Further, since winning in Kaggle is easily observable, it is perhaps a decent signal for what we care about.

Can you think of other signals that contain a lot of information and are difficult to fake?

Joel Podolny in a series of articles proposed that social relations also help signal quality. This is a profound idea, and I will walk through it further. But let us fast forward to another application of Eigenvector centrality: the original Google PageRank algorithm.

For example, social cues such as endorsements, recommendations, funding decisions or hiring decisions,  convey/signal information.

Consider James and Betty. Both have two connections of their own. And both of their connections think highly of them and recommend them. In an abstract sense, Betty and James are rated by their raters-e.g., their two connections. But a new problem arises: who has more reliable raters?

This is what we can consider the “rating the raters” problem. While in the first degree out (the direct connections of these two individuals) they are indistinguishable, there is substantial variation in their second and third degree ties. Although James and Better have similarly sized networks, Betty’s network connections have far more connections of their own.

While it is relatively easy to figure out the difference between the size of Betty and James’ second degree network, the problem gets more complicated the further we move out. Real networks don’t usually have connections out to the 2nd or 3rd degree, but to 4th, 5th, 6th, etc. The second problem is that real networks aren’t usually trees. Networks loop back on themselves over and over again which make the “rating the rater” problem hard.​ So we cannot just re-weight the rating by the ratings received by the rater.

There is concept, called Eigenvector centrality, that does exactly what we thought was hard: it rates the raters, the rater’s raters, the rater’s rater’s raters, and so on.​ This measure gives us a nice summary statistic telling us how much “status” a node in the network has. ​Hard to fake because you can perhaps fake your own network ties, but not the ties of your connections’ connections. The nodes below, for instance, are resized by eigenvector centrality.

The problem of determining the “value” or credibility of an object based on its connections and its connections’ connections is a general one.​  Google’s original algorithm, PageRank, is sociometric status. ​ The basic intuition of PageRank was if a site gets a lot of incoming links, and the sites linking to the original site also do, and so on. Then there must be some value to it.​ The insight arises by viewing the Web as a network, and using its structure to determine whether a page is useful or not.​

### Ego and Altercentric Perspectives

Now that we have the basic concept of sociometric status down. The “big idea” in sociology came from Joel Podolny. He suggested that we had focused primarily on seeing networks as “pipes” through which information, resources, support, and other “stuff” flows. However, networks are also useful for individuals in resolving problems of uncertainty because certain types of network structures also signal trust, reputation, and identity — network structures are prisms that reveal information as well.

The extent to which networks operate as pipes or as prisms depends on the level of uncertainty faced by market participants. He developed a highly useful framework for thinking about characterizing what structure may matter when. There are two types of uncertainty, Egocentric and Altercentric.

Fig. 1.—Illustrative markets arrayed by altercentric and egocentric uncertainty

### Egocentric  uncertainty

A market or market segment can rate highly on one type of uncertainty without rating highly on the other.

Consider the four markets represented in the figure above. From Podolny (2001):

Vaccines: Beginning with the market for a particular vaccine, such as polio or smallpox, in the upper left-hand quadrant. The most salient source of uncertainty in this market is that which underlies the development of the vaccine. Once the vaccine is developed and is given regulatory approval, there is little uncertainty on the part of consumers as to whether they will benefit from the innovation. Accordingly, a market  for a vaccine is a market that rates high on egocentric uncertainty, but low on altercentric uncertainty.

Roofers: Alternatively, consider the market in the lower right-hand corner, a regional market for roofers. “Roofing technology” is relatively well understood, and while roofers may face some uncertainty as to who needs a roof in any particular year, they can be confident that every homeowner will need repair work or a replacement every 20 years or so. By sending out fliers or advertising in the yellow pages, they can be assured of reaching a constituency with a demand for their service. However, because an individual consumer only infrequently enters the market, the consumer is generally unaware of quality-based distinctions among roofers. The consumer may be able to alleviate some of this uncertainty through consultation with others who have recently had roof repairs; however, the need for such consultation is an illustration of the basic point. Only through such search and consultation can the consumer’s relatively high level of uncertainty be reduced. Accordingly, this is a market that is comparatively low in terms of egocentric uncertainty, but relatively high in terms of altercentric uncertainty.

What are some other examples of markets that are low on one type of uncertainty and high on another? What about markets that are high on both?

### How does one deal with altercentric uncertainty?

Let us loop back to our earlier discussion of sociometric status. Why is sociometric status a useful signal to help resolve altercentric uncertainty?

• Sociometric Status: A position in a social network – defined by the ties that you have to others – where you receive deference from others who are themselves highly respected or deferred to.

### When does Status goes awry?

However, there are many instances where status does not serve as a perfect signal of quality – and this can lead to mis-perceptions of status and thus misperceptions of quality.   When status is a perfect signal of quality it is said that there is tight coupling between status and quality. However, as a I mentioned, this is often not the case.

Matthew Effect / Self-fulfilling prophecy:  The classic example of this is the phenomenon of the 41st chair.  This is the example of the “French Academy” where there are only 40 chairs, and there perhaps no substantive difference between #40 and #41 – but the 40th person becomes a holder of a chair, and the 41st person does not.  This results in the 40th person get more rewards, recognition, etc. Which in turn allows them to do better work – because they now have significantly more resources than people who do not. In sociological parlance, the phenomenon of the 41st chair is called  “Decoupling.” Here, the linear relationship between quality and status – the 40th person gains far more status than the 41st—breaks down.

Buy low, sell high: This decoupling is an arbitrage situation for managers – because most people use status signals that are imperfect. There are two possible strategies to exploit this gap:

1. Figure out a more readily observable representation of social signals that maps onto to quality more tightly and sell that information.
2. Figure out a way to measure sociometric status in a situation where it is not currently used. Then use this as a better way of valuation.

### Beyond the basics

The study of sociometric (and other Status) is an extremely rich area of research in organizational sociology and economic sociology. I have merely scratched the surface of this topic.

Some excellent articles and reviews in this stream include:

Stuart, Toby E., Ha Hoang, and Ralph C. Hybels. “Interorganizational endorsements and the performance of entrepreneurial ventures.” Administrative science quarterly 44.2 (1999): 315-349.

Sauder, Michael, Freda Lynn, and Joel M. Podolny. “Status: Insights from organizational sociology.” Annual Review of Sociology 38 (2012): 267-283.

Lynn, Freda B., Joel M. Podolny, and Lin Tao. “A Sociological (De) Construction of the Relationship between Status and Quality.” American Journal of Sociology 115.3 (2009): 755-804.

Chen, Ya-Ru, et al. “Introduction to the special issue: Bringing status to the table—attaining, maintaining, and experiencing status in organizations and markets.” (2012): 299-307.

Phillips, Damon J., and Ezra W. Zuckerman. “Middle-Status Conformity: Theoretical Restatement and Empirical Demonstration in Two Markets.” American Journal of Sociology 107.2 (2001): 379-429.

# Network Positions and Advantage: Structural Holes

Who is this? Keep this face in mind, at least for a bit.

In the prior lecture we discussed the simple micro-macro-micro process described in Granovetter (1973), the “Strength of Weak Ties.” Recall what we discussed: The forbidden triad is forbidden because in equilibrium it is generally unstable, because it is unbalanced.

The unstable structure of the forbidden triad is particularly unstable for strong ties in which strength increases as some function of.

• The amount of time that two people spend together
• The emotional intensity of the interaction
• The intimacy between the two parties (i.e., mutual confiding)
• The reciprocal services which the two parties engage in.

The way to sustain the “bridge structure” implied by the forbidden triad is to weaken one of these conditions.  The weak tie that is a result, can allow for the persistence of “bridges” or “brokerage” across distinct and differentiated strong tie clusters across groups that divide the social world.

One key assumption that we make is that there is different information that is being discussed across these different groups. For instance, these different groups could be scientific research communities, regional economic clusters, different departments in the same business school, and so on. We start with the assumption that people in these different groups are doing different things, they may have different cultures, and are members of different disciplines. Information within a cluster–e.g., information that person 1 and 2 who are in Group A possess–is much likely to be redundant than information across clusters. Consequently, information in group A and group B is said to be non-redundant. That is, a person from group A, by talking to someone in group B is more likely to learn something new than if she talked to someone else from group A.

The “big idea” from the Strength of Weak Ties hypothesis is that there are “holes” in the social structure and that weak ties are the conduits that can transmit information across these holes. Thus, more weak ties mean that people have access more and newer information.

### The Holes in Social Structure

The crystal clear mechanisms implied by the weak tie hypothesis can be credited to the imagination of the author for seeing something that others missed. Yet, the empirical facts of the original paper were consistent with this hypothesis, but the measurement did not capture spanning the holes in the structure per se. The theoretical argument was that weak ties because of why they exist should correspond to this structural configuration.

Another major breakthrough came through a series of papers and then a foundational book by Professor Ronald Burt of the University of Chicago, “Structural Holes: The Social Structure of Competition.” While others had made similar arguments before (see Bavelas 1948, and for a fantastic review see Centrality in Social Networks: Conceptual Clarification by Linton Freeman) Burt grounded this idea in theory and provided a very clear framework for other scholars to rethink competition and strategy through this structural lens.

His, very powerful, argument to us was to think about “structural holes” as “opportunities.

That is, bridges across this holes in social structure are sources of value for everyone involved—the person who bridges, as well as those being bridged.

The research that followed resulted in a paradigmatic shift in our understanding of how competition within organizations and in markets functions. The early work made a clean and forceful point: the causal agent is not the “strength or weakness” of a tie, but the fact that bridges create value. Focus on the bridge.

This structural argument was supported by two mechanisms of action. These can be described as the control and information benefits of structural holes.  Consider the three archetypical networks depicted below (I’ve adapted this representation from Krackhardt 1999).

On the left, the focal individual “YOU” is in a structure with very few structural holes. That is, all of his connections are connected to each other. On the far right, is the high structural holes condition. In this case, not of the focal individual’s connections are connected to each other. The intermediate network, which we will discuss later, is theorized to have its own special properties.

### The Control Benefits of Structural Holes

Let us examine the control benefits first. In the first representation, who has control?

Consider the situation in the figure on the left. What happens if you cheat one person in the network? They talk to each other. Your reputation suffers. You lose some of your control. So, who is in control? Not you, but the group. The role that closed networks play in creating trust through control is not uncommon. For instance, small businessmen/women in America and other countries often tend to do business with their co-ethnics.

While preventing cheating is a good thing, a closed structure could also be highly constraining. Small and closed-knit groups have strong group norms that can force members to conform in unproductive or harmful ways. Innovation, for example, often requires people to take risks—both social and economic—and closed groups might stymie such risk taking.

At the other end of the spectrum, the focal person’s connections are not connected to each other. This lack of connection implies that they cannot communicate, and as a result, information or gossip cannot travel between these disconnected parties as quickly. The focal individual in this case has more control, because they have the freedom to act without others coordinating against them.

If you are in the third structure, there are two specific control benefits that you have:

• The first strategy to exploit your control benefits here is one where you are the broker who can leverage your position to play-off two individuals (perhaps buyers or even sellers) who want the same thing from you.  For instance, you can in subtle ways, make them either lower their demands or increase their willingness to pay.
• The second strategy based on control is to be a broker between two people (companies) who have conflicting demands. The broker, in order to get one person change their demands, can leverage the demands of the other. Furthermore, since these two parties do not interact with each other — the broker has the ability (because of this increased control) to shape the information that one party gets about the other.

These are obviously dangerous strategies – and ones that require a significant amount of finesse and skill.

### The Information Benefits of Structural Holes

All is not lost if you can’t pull off the control strategy. Spanning structural holes also provides information benefits. The literature broadly posits three types of information benefits:

• Access benefits: Access benefits consist of two components. First, because the broker spans structural holes, she connects two groups that do not have a high degree of overlap in their knowledge. Thus, the broker has access to information that is not accessible to those in the separate and spanned social groups.  Second, since you are getting more diverse information because you have diverse connections — when you receive valuable information you know who can use it.
• Timing benefits: Information can be transmitted over multiple channels. Consider job postings. Before a job is posted in an official manner, people in the department where the job will be know about it. Talking to someone in that department will give you knowledge about the job before everyone else. This subtle difference in timing can mean the difference between getting and not getting a job. Because the broker gets information through informal channels, she often has access to information before others.  Timing matters in many contexts, including venture deals, hiring, knowing a house is on the market, etc.
• Referrals: Trust matters. Period. People avoid hiring people, buying products, or investing in companies that they have limited information about. Those who span structural holes have contacts in different social worlds with their different opportunities. Contacts with people in these social circles can refer you to their own network, thereby increasing your trustworthiness.

### The Structural Holes in DNA

Ok, now that we have the theory down. I want to share an example from real life that exemplifies the beauty of the theory of structural holes.

This is James Watson, one of the co-discoverers of the structure of DNA. This discovery is described by many as one of the most (if not the most) important single scientific discoveries of the 20th century. In his gripping account of this discover, The Double Helix he recounts how he and Francis Crick discovered the structure of DNA.

17th October 1962: American biochemist Dr. James Dewey Watson seated in his lab at Harvard University, Massachusetts. He shared the 1962 Nobel Prize in medicine for the discovery of the molecular structure of DNA. (Photo by Hulton Archive/Getty Images)

Here are some quotes about the quest for the structure of DNA from the Nobel Prize website:

In the late 1940’s, the members of the scientific community were aware that DNA was most likely the molecule of life, even though many were skeptical since it was so “simple.”

…Nobody had the slightest idea of what the molecule might look like.

In order to solve the elusive structure of DNA, a couple of distinct pieces of information needed to be put together…

As in the solving of other complex problems, the work of many people was needed to establish the full picture.

Francis Crick, a brilliant scientist was already at Cambridge before James Watson had arrived, Watson describes Crick:

“Before my arrival in Cambridge, Francis only occasionally thought about deoxyribonucleic acid (DNA) and its role in heredity.  This was not because he thought it uninteresting. Quite the contrary.

Francis, nonetheless, was not then prepared to jump into the DNA world…[S]uch a decision would create an awkward personal situation.  At this time molecular on DNA in England was, for all practical purposes, the personal property of Maurice Wilkins, a bachelor who worked in London at Kings College…It would have looked very bad if Francis had jumped in on a problem that Maurice had worked over for several years. The matter was even worse because the two, almost equal in age, knew each other and, before Francis remarried, had frequently met for lunch of dinner to talk about science.

The combination of England’s coziness – all the important people, if not related by marriage, seemed to know one another – plus the English sense of fair play would not allow Francis to move in on Maurice’s problem.”

Watson, on the other hand was an outsider. He describes a few episodes that were critical to his discovery of DNA.

Break #1:

At a conference in the spring of 1951 in Naples, Watson heard Maurice Wilkins’ talk on the molecular structure of DNA.

“I proceeded to forget Maurice, but not his DNA photograph.”

Break #2:

A manuscript on DNA (as a triple helix) had been written, a copy of which would soon be sent to Peter Pauling, the son of Linus Pauling, Nobel Prize Winner, and a scientist who was working on the structure of DNA himself.

Break #3:

Knowledge about Chargaff’s rules through is doctoral training in Indiana.

Watson had unique access, through his network, to the photos produced by Rosalind Franklin in the Wilkin’s Lab, the unpublished manuscript prepared by Linus Pauling, and exposure to Erwin Chargaff’s rules about the ratio of bases in DNA.  Because of his position, he was able to put these pieces together faster than anyone else.

All three processes helped Watson:

• Referrals, through his famous and Nobel prize winning advisor, he was able to hop from one great lab in Europe to an other, and get access to conferences that he would not be able to attend otherwise.

Luck? No. Social Networks.

Structural holes theory also implies a series of tradeoffs between the size of one’s network and the benefits that the network produces. A large network is not necessarily a good thing. This is because maintaining a network connection implies some cost and results in some benefit.

• Decreasing returns to network size:  If we measure benefits in units of novel information, one could imagine that adding a new tie might entail some cost (time, resources, emotional energy, etc.) but subsequently not result in access to much more new information-e.g., you hear about the same job opportunities from the new connection that you heard about from your existing friend or acquaintance.) So at least in terms of information, there is a decreasing return to the network size: you pay the additional cost of the new connection, but it is providing less information per unit cost than a prior connection.
• Constant returns to network size: A more palatable case is constant returns. Here doubling your network size, doubles the amount of information you have access to. Every new network connection provides information in proportion to what the prior network connections provided.
• Increasing returns to network size: The most ideal situation is one where doubling the size of your network more than doubles the information you get. Is this even possible, since adding a new network connection that provides more information than before might also be substantially more costly?

In any case, you clearly want to be at a point before your costs of maintaining a network significantly outweigh any benefits that you get.

Structural holes theory provides some useful guidance on not going too far down the route of decreasing returns to size. A good heuristic for understanding this tradeoff is a calculation developed by Professor Ronald Burt called efficiency. Efficiency can be calculated in the following way:

Efficiency = Effective Size / Actual Size

Expanding this function out, we can define:

Actual size = The number of connections that you have.

Effective size = Actual Size – Sum of percent of overlapping ties for each of your connections.

### Bandwidth and Diversity

The model above has been tremendously useful and very predictive. In recent years, some scholars have also highlighted another interesting tradeoff between stronger non-bridging ties and weaker bridging ties: the bandwidth/diversity tradeoff.

On one hand, greater bandwidth ties result in greater greater informational volume. On the other hand, weaker bridging ties result in greater variance in information.

Recent work suggests this relationship depends fundamentally on the nature of the environment in which people are building their social networks. There are two factors that can reduce the value of bridging ties and privilege high-bandwidth ties:

1. If the network has a homogenous set of knowledge – where most people talk about the same things. Then having more high-bandwidth ties may be more important.
2. If the “refresh rate” – is high – where people’s contacts and interactions churn very fast, or where the environment turbulent and the information is extremely complex — meaning that an idea contains multiple topics or subjects — then high bandwidth ties are better at sustaining the high variance information you need.

However, what studies have found is that “strong” bridging ties that have both bandwidth and diversity are the best — but they are indeed rarer rare.

### Extending the Core Insights from Structural Hole Theory

As one can imagine, structural holes theory was extremely powerful and scholars have been working to extend and refine the predictions of the theory further to account for structures that don’t neatly fit into the standard dichotomy or have dynamic elements.

Consider dynamics: Given how difficult it is to maintain bridging positions, it is likely that bridges are fragile. Research suggests that bridging ties followed what is called a kinked decay function. Initially bridges have a low likelihood of breaking, followed shortly by a sharp rise in decay, if the bridge survives this spike in decay rates, it is likely to persist for a long time.

Two processes often lead to decay:

• Disintermediation: Disconnected parties learn to exchange on their own.
• Competition from rival brokers: Rivals enter the fray and by offering either greater benefits or lower cost, whittle away at the original bridge’s benefits from occupying the hole. Indeed, the hole no longer exists.

Why bridges decay:

• -Low performance / High performers have lower rates of decay for bridges
• If other relations are decaying, bridges are also likely to decay
• Experience bridging improves the chances that new bridges survive
• “Hole decay” may be limited when:
• Deep barriers limit interaction across the hole.
• The benefits to the bridged parties is high enough and switching costs are high.
• The bridged individuals don’t question the role of the broker, or it is not salient to them.

Beyond Information and Control

There are also cases where brokering is disadvantageous. The underlying mechanism leading to the disadvantages of brokering have to do with identity and expectations.

•  In addition to information, networks also convey expectations about who one is (identity) and how one should behave (expectations). Many of us have been caught between two groups that expect different things from us.  This happens at work, at home, and even in our social and personal lives with friends. The more disconnected are connections are, the more likely it is that they have different expectations about how we should behave. Podolny and Baron (1997) show that when a person is a broker in a network that conveys “identity” they are less likely to benefit from their brokerage position than when the network primarily provides “information.”
• Similarly, Krackhardt in his Simmelian tie theory makes a related argument that brokering between two strongly connected groups creates pressure to conform to different norms which can create internal role conflict, stress, and thus reduce performance.

### Outcomes as Mean versus Variance

The theories that we have focused on thus far attempt to predict mean or expected outcomes. That is, what is the average difference in wages/promotion rates/bonuses/ideas for those with or without structural holes. The graph below shows that there is a mean shift. The blue distribution (e.g., structural holes condition) has a higher mean outcome.

However, this analysis can be pushed further by asking: is there a shift in the variance of potential outcomes. Does a specific structure reduce or increase the possible variation in outcomes. Note that the blue distribution below, is “tighter” than the black distribution. The black distribution has a greater likely hood of worse, but also better outcomes than the first.

Which would you prefer below?

James Lincoln of UC Berkeley did pioneering studies on business networks in Japan and found that companies that were members of the Keiretsu, while having lower means in terms of outcomes, also had lower variation and as a consequence were less likely to both do extremely poorly but also less likely to do extremely well.

With respect to brokerage, we can also think about floors and ceilings. Networks that are high in closure reduce variation in performance, both high and low.

The high performance is minimized because of the subsidizing of the lower performers by the high performers, and the low performers don’t do as poorly because the high performers help them out.

The network structures that tend to most facilitate the low-variance strategy are closed networks, as one can imagine.

The classic examples of this are ethnic networks, where people – the more wealthy people help out the less fortunate ones.

# Rankings of Social Science Research Productivity of Indian Universities, Colleges and Institutes

In 2011, I created a ranking of Indian universities based on productivity in the social sciences. Here is the original ranking. Look out for the new ranking forthcoming in the next few months.

### Rankings of Social Science Research Productivity of Indian Universities, Colleges and Institutes

Below you will find rankings of Indian universities and institutes based on productivity in social science research. The universities and institutes are ranked in four categories: (1) sociology, demography and family studies, (2) economics, (3) psychology, and (4) business and management. The rankings presented here are based on a limited set of variables, namely the number of peer-reviewed journal articles produced by an institution and the number of citations these articles received.

The data used for the rankings are derived from Thomson’s ISI Web of Knowledge. The raw data included any article published in one of 3015 social science journals indexed by ISI (including Indian journals as well as international journals) by an author affiliated with a university or institute located in India between 2000 and 2010. These data were subset to include only those institutions that had more than 20 publications in any social science category during this ten-year time frame; the final sample consists of 61 universities and institutes. Finally, the institutions were ranked according to their research productivity (citations and publications) in the four categories mentioned above.

Omissions and Caveats

There is no doubt that these rankings are limited in important ways. Most significantly, the measures I use do not incorporate research output in the form of books, book chapters, journals not indexed by ISI, peer-reviewed conferences, as well as other academic writing such as case studies. Excluding these output venues reduces the value of the rankings in some respects, but also makes comparing universities straightforward and systematic. Moreover, peer-reviewed journal publications and citation counts are universally accepted measures of academic productivity and are clearly valuable and informative.

The rankings also do not incorporate other information—such as academic placements of doctoral students, quality of instruction, facilities, and peer quality—that might be useful for prospective graduate students.  It is advised that you learn more about the universities and institutes you are considering before making any choices.

Reason for the rankings

The rankings were created for my own use; I wanted to find out which Indian universities were producing the most (and, if possible, the most interesting) social science research.   The rankings are therefore my attempt at making sense of the social science ecosystem in India and not an attempt at producing a definitive and thorough ranking.

Measures:

Int. Collab (International Collaboration):  The proportion of articles co-authored with international collaborators (e.g. co-authors of the focal Indian author located outside of India).  For example, in the Sociology, Demography and Family Studies category, nine percent of all articles published by Jawaharlal Nehru University were co-authored with scholars affiliated to institutions located outside of India. Whereas sixty-seven percent of all articles from the Indian Institute of Management – Bangalore were co-authored with international collaborators. This measure is not incorporated into the ranking. It is provided only for informational purposes.

Avg. Cite (Average citations):  This measure is the sum of the citations received by the articles produced by an institution divided by the total number of articles. This measure attempts to quantify the impact of the research published by a given institution.

NOTE: This measure does not take into account the amount of research produced by an institution; thus, two universities may have similar scores on Avg. Cite, but vastly different levels of productivity (e.g. one produced only 5 articles and another produced 100). The Cite Adj. Pubs measure attempts to incorporate both the number of articles produced and the impact of the articles.

Pubs (Number of Publications): The total number of publications in the journals belonging to the category (e.g. Economics) during the period 2000 to 2010 authored or co-authored by someone affiliated with the institution.  For instance, scholars from the University of Hyderabad published forty-nine articles in the journals indexed by ISI’s Social Science Index in the Sociology, Demography and Family Studies categories between 2000 and 2010.

Cite Adj. Pubs (Citation Adjusted Publications): This measure is a relatively straightforward combination of the Avg. Cite and Pubs measures.  Every article i is given a score which is equal to a(i) = 1 + ln(1+citations(i)). I then add up all the article scores a(i) for each institution, resulting in a institution specific Citation Adj. Publication score. If none of the articles published by a university has received a citation, the Cite Adj. Pubs score is equal to the number of publications. As the number of citations increase, this score increases at a decreasing rate.

 The Rankings Where do Indian Scholars Publish? I have also provided a list of journals in which India-based scholars have published more than five articles in the past ten years. Collaboration Network The collaboration network among Indian institutions (An arrow means that scholars from the two institutions have been co-authors on at least one journal publication).

THE RANKINGS

Sociology, Demography and Family Studies
 Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs 1 Delhi Univ – All Others 0.09 0.43 69 80.50 2 Jawaharlal Nehru Univ 0.04 1.08 51 71.15 3 Inst Econ Growth – Delhi University 0.02 0.39 49 56.78 4 Univ Hyderabad 0.03 1.87 38 42.28 5 Delhi Sch Econ – Delhi University 0.09 4.05 22 37.32 6 Int Inst Populat Sci – Mumbai 0.19 4.00 16 34.81 7 Ctr Dev Studies – Trivandrum 0.05 1.48 21 31.27 8 Populat Council 0.56 5.11 9 22.23 9 Punjab Univ-Chandigarh 0.00 0.16 19 20.79 10 Tata Inst Social Sci – Mumbai 0.13 0.33 15 17.48 11 Ctr Studies Social Sci – Calcutta 0.17 1.00 12 14.56 12 Indian Stat Inst – Calcutta 0.10 0.70 10 14.16 13 Madras Inst Dev Studies 0.20 1.20 10 13.69 14 Indian Inst Technol – Bombay 0.08 0.00 13 13.00 15 Karnatak Univ 0.80 6.00 5 12.17 16 Inst Social & Econ Change – Bangalore 0.25 1.00 8 12.09 17 Indian Stat Inst – Delhi 0.50 9.25 4 11.85 18 Ctr Womens Dev Studies – West Midnapore 0.00 0.56 9 11.77 19 Indian Inst Technol – Delhi 0.67 3.33 6 11.70 20 Univ Pune 0.00 0.00 11 11.00 21 Indian Inst Technol – Kanpur 0.00 1.17 6 9.47 22 Indian Inst Technol – Kharagpur 0.00 5.25 4 9.46 23 Indian Inst Management – Bangalore 0.67 18.33 3 9.15 24 Univ Allahabad 0.75 2.50 4 8.28 25 Goa Univ 0.00 0.00 8 8.00 26 Banaras Hindu Univ 0.17 0.67 6 7.61 27 Indian Sch Business – Hyderabad 1.00 6.33 3 7.50 28 Univ Calcutta 0.00 1.20 5 6.95 29 Indian Inst Management – Calcutta 0.00 11.00 3 6.53 30 Reserve Bank India 0.33 3.67 3 6.40 31 Natl Inst Adv Studies – Bangalore 0.00 0.00 6 6.00 32 Indian Inst Technol – Guwahati 0.00 0.00 6 6.00 33 Univ Mumbai 0.00 0.20 5 5.69 34 Natl Council Appl Econ Res – New Delhi 0.67 2.67 3 5.20 35 Jadavpur Univ 0.00 1.33 3 5.08 36 Jamia Millia Islamia 0.00 0.00 5 5.00
Economics Research

 Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs 1 Indian Stat Inst – Delhi 0.544 8.000 57 130.85 2 Indian Stat Inst – Calcutta 0.277 6.851 47 98.41 3 Jawaharlal Nehru Univ 0.263 2.053 57 97.75 4 Delhi Sch Econ – Delhi University 0.382 3.559 34 65.44 5 Indira Gandhi Inst Dev Res – Mumbai 0.412 3.235 34 60.71 6 Ctr Studies Social Sci – Calcutta 0.704 4.667 27 58.13 7 Delhi Univ – All Others 0.429 1.657 35 52.43 8 Inst Econ Growth – Delhi University 0.320 4.480 25 50.46 9 Indian Inst Technol – Delhi 0.333 15.400 15 42.48 10 Univ Calcutta 0.250 2.700 20 39.01 11 Reserve Bank India 0.111 2.389 18 31.04 12 Indian Inst Management – Bangalore 0.471 1.235 17 26.17 13 Jadavpur Univ 0.385 1.692 13 22.40 14 Indian Inst Technol – Kanpur 0.500 10.375 8 21.96 15 Indian Inst Management – Ahmedabad 0.500 12.600 10 20.75 16 Madras Sch Econ 0.556 3.778 9 19.89 17 Indian Inst Technol – Kharagpur 0.571 8.143 7 18.18 18 Inst Social & Econ Change – Bangalore 0.273 3.545 11 17.67 19 Punjab Univ-Chandigarh 0.273 2.909 11 16.85 20 Ctr Dev Studies – Trivandrum 0.222 3.111 9 16.23 21 Natl Council Appl Econ Res – New Delhi 0.667 1.444 9 15.76 22 Indian Sch Business – Hyderabad 1.000 5.500 6 14.93 23 Natl Inst Sci Technol & Dev Studies – New Delhi 0.000 2.714 7 14.20 24 Tata Inst Social Sci – Mumbai 0.143 3.429 7 13.98 25 Univ Hyderabad 0.667 6.000 6 13.61 26 Indian Inst Management – Calcutta 1.000 5.000 5 12.78 27 Indian Inst Sci – Bangalore 0.800 5.800 5 12.77 28 Madras Inst Dev Studies 0.222 0.778 9 11.64 29 Visva Bharati Univ – Santiniketan 0.200 2.000 5 9.56 30 Indian Inst Management – Lucknow 0.250 4.500 4 9.19 31 Utkal Univ 0.333 7.333 3 8.97 32 Univ Mumbai 0.125 0.125 8 8.69 33 Indian Inst Technol – Madras 0.250 4.750 4 8.64 34 Indian Stat Inst – Bangalore 0.500 2.250 4 7.87 35 Indian Inst Informat Technol – Various 0.200 0.800 5 7.20 36 Vidyasagar Univ – Midnapore 0.000 4.333 3 6.78 37 Natl Inst Technol – Various 0.667 5.333 3 6.47 38 Populat Council 0.500 16.500 2 6.19 39 Univ Allahabad 0.333 2.000 3 5.48

Psychology

 Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs 1 Delhi Univ – All Others 0.29 3.60 45 74.30 2 Univ Allahabad 0.26 0.97 35 49.88 3 Indian Inst Technol – Kharagpur 0.29 3.48 21 40.12 4 Banaras Hindu Univ 0.47 3.35 17 32.05 5 Indian Inst Technol – Delhi 0.21 11.36 14 30.48 6 Indian Inst Management – Ahmedabad 0.36 8.36 14 28.70 7 Indian Inst Technol – Kanpur 0.12 3.47 17 27.36 8 Indian Sch Business – Hyderabad 0.83 22.00 6 22.91 9 Univ Calcutta 0.00 1.80 10 16.58 10 Indian Stat Inst – Calcutta 0.43 11.29 7 16.40 11 Aligarh Muslim Univ 0.63 5.13 8 16.12 12 Indian Inst Management – Calcutta 0.50 28.50 4 13.77 13 Jawaharlal Nehru Univ 0.09 0.91 11 13.40 14 Punjab Univ-Chandigarh 0.08 0.25 12 13.39 15 Indian Inst Management – Bangalore 0.67 5.50 6 13.38 16 Populat Council 0.80 6.20 5 13.31 17 Indian Inst Technol – Roorkee 0.00 16.80 5 12.79 18 Cent Inst Psychiat 0.20 3.00 5 11.58 19 Univ Pune 0.00 0.10 10 10.69 20 Univ Mysore 0.57 1.14 7 10.04 21 Indian Inst Technol – Bombay 0.22 0.11 9 9.69 22 Indian Inst Sci – Bangalore 0.25 4.75 4 9.12 23 Indian Inst Technol – Madras 0.60 2.80 5 8.66 24 Ctr Dev Studies – Trivandrum 0.25 3.25 4 8.38 25 Tata Inst Fundamental Res – Mumbai 0.75 3.00 4 8.28 26 Jadavpur Univ 0.40 1.20 5 8.00 27 Karnatak Univ 1.00 12.00 3 7.91 28 Int Inst Populat Sci – Mumbai 0.50 2.00 4 7.69 29 Utkal Univ 0.00 0.00 7 7.00 30 Univ Hyderabad 0.20 0.60 5 6.79 31 Jamia Millia Islamia 0.00 0.40 5 6.10 32 Natl Inst Sci Technol & Dev Studies – New Delhi 0.33 3.00 3 5.30 33 Indian Stat Inst – Delhi 0.50 11.00 2 5.14 34 Indian Inst Informat Technol – Various 0.00 0.00 5 5.00

 Rank Name Int. Collab Avg. Cite Pubs Cite adj. Pubs 1 Indian Inst Management – Bangalore 0.53 6.82 45 98.78 2 Indian Inst Technol – Delhi 0.25 16.21 24 72.60 3 Indian Inst Management – Calcutta 0.46 5.75 28 67.70 4 Indian Sch Business – Hyderabad 0.88 4.63 24 52.98 5 Management Dev Inst – Gurgaon 0.33 3.00 21 38.58 6 Indian Inst Management – Ahmedabad 0.37 2.53 19 32.33 7 Delhi Univ – All Others 0.06 2.00 16 27.61 8 Indian Inst Technol – Madras 0.25 6.42 12 26.64 9 Indira Gandhi Inst Dev Res – Mumbai 0.20 8.60 10 25.67 10 Indian Inst Sci – Bangalore 0.50 4.36 14 24.30 11 Jawaharlal Nehru Univ 0.33 1.40 15 23.95 12 Indian Inst Technol – Kanpur 0.50 11.00 8 23.78 13 Indian Inst Technol – Bombay 0.00 2.08 12 22.21 14 Indian Inst Technol – Kharagpur 0.20 6.60 10 20.89 15 Natl Inst Sci Technol & Dev Studies – New Delhi 0.43 5.43 7 17.18 16 Indian Inst Management – Lucknow 0.10 1.30 10 16.58 17 Inst Econ Growth – Delhi University 0.29 5.86 7 16.44 18 Indian Stat Inst – Delhi 0.33 3.67 6 13.86 19 Indian Inst Technol – Roorkee 0.00 8.20 5 12.97 20 Govt India 0.50 3.00 6 11.19 21 Ctr Studies Social Sci – Calcutta 0.33 0.83 6 9.18 22 Jadavpur Univ 0.40 2.40 5 9.09 23 Indian Stat Inst – Calcutta 0.14 0.43 7 8.79 24 Punjab Univ-Chandigarh 0.25 4.25 4 7.87 25 Natl Council Appl Econ Res – New Delhi 0.50 1.50 4 7.58 26 Natl Inst Technol – Various 0.00 0.33 6 7.10 27 Indian Stat Inst – Bangalore 0.00 2.25 4 6.89 28 Univ Calcutta 0.25 1.00 4 6.48 29 Cent Inst Psychiat 0.00 2.33 3 6.18 30 Banaras Hindu Univ 0.00 2.33 3 6.00 31 Tata Inst Fundamental Res – Mumbai 0.67 1.67 3 5.89 32 Delhi Sch Econ – Delhi University 0.33 4.67 3 5.71 33 Tata Inst Social Sci – Mumbai 0.50 18.50 2 5.64

THE JOURNALS

Sociology, Demography and Family Studies

 Journal No. of Pubs Contributions To Indian Sociology 413 Journal Of Biosocial Science 33 Culture Health & Sexuality 23 Social Indicators Research 22 International Sociology 17 Journal Of Comparative Family Studies 11 Journal Of Family Planning And Reproduc 9 Studies In Family Planning 9 Population Studies-A Journal Of Demogra 8 Journal Of Medical Ethics 7 Men And Masculinities 7 Agriculture And Human Values 6 Human Ecology 6

Economics

 Journal No. of Pubs Ecological Economics 50 Value In Health 47 World Development 43 Journal Of Development Studies 40 Futures 35 Applied Economics Letters 33 Journal Of Development Economics 27 Journal Of Policy Modeling 27 Economics Letters 21 Applied Economics 20 Agricultural Economics 19 Kyklos 19 Economic Modelling 18 Environmental & Resource Economics 13 Singapore Economic Review 13 Economic Theory 12 International Review Of Economics & Fin 12 Journal Of Economic Behavior & Organiza 12 Journal Of Economic Policy Reform 12 Social Choice And Welfare 12 Review Of Development Economics 11 American Journal Of Agricultural Econom 10 Cambridge Journal Of Economics 10 Developing Economies 10 Economic Development And Cultural Chang 10 Energy Economics 10 Japanese Economic Review 10 Journal Of The Asia Pacific Economy 10 Journal Of World Trade 10 Hitotsubashi Journal Of Economics 9 Journal Of International Trade & Econom 9 Food Policy 8 Games And Economic Behavior 8 International Journal Of Industrial Org 8 International Labour Review 8 Journal Of Economic Theory 8 Journal Of Economics 8 Manchester School 8 Pacific Economic Review 8 Emerging Markets Finance And Trade 7 Japan And The World Economy 7 Oxford Economic Papers-New Series 7 European Economic Review 6 Feminist Economics 6 Journal Of Agrarian Change 6 Journal Of Economic Dynamics & Control 6

Psychology

 Journal No. of Pubs International Journal Of Psychology 248 Psycho-Oncology 34 Aids Care-Psychological And Socio-Medic 32 Perceptual And Motor Skills 16 Physiology & Behavior 13 Applied Psychophysiology And Biofeedbac 10 Brain And Cognition 10 Journal Of Cross-Cultural Psychology 10 Child Care Health And Development 9 Ergonomics 9 Human Resource Management 9 Perception 9 Applied Ergonomics 8 Culture & Psychology 7 Psychological Reports 7 Asian Journal Of Social Psychology 6 Environment And Behavior 6 International Journal Of Behavioral Med 6 Journal Of Social Psychology 6 Studia Psychologica 6

 Journal Number of Pubs. Total Quality Management & Business Exc 42 Journal Of The Operational Research Soc 34 Asian Case Research Journal 25 International Journal Of Technology Man 23 Omega-International Journal Of Manageme 23 Journal Of Business Ethics 17 Management Decision 17 Supply Chain Management-An Internationa 17 Technological Forecasting And Social Ch 17 Harvard Business Review 16 Interfaces 13 International Journal Of Human Resource 12 International Review Of Economics & Fin 12 Journal Of Knowledge Management 12 Technovation 11 Research Policy 10 Human Resource Management 9 African Journal Of Business Management 8 International Journal Of Operations & P 8 International Labour Review 8 Systems Research And Behavioral Science 8 Disaster Prevention And Management 7 Emerging Markets Finance And Trade 7 Journal Of International Business Studi 7 Asian Business & Management 6 Corporate Social Responsibility And Env 6 Information Technology & Management 6 International Transactions In Operation 6 Journal Of Futures Markets 6 Marketing Science 6

# Network Analysis in R: Getting Started

In some respects, the history of network analysis cannot be separated from the tools used to conduct network analysis. The importance of software to the enterprise of network analysis has been true since the very beginning of the field. Scholars have written and made available software programs to allow others to collect data and conduct analysis themselves.  For instance, you can find some description of a software program called CONCOR in White et al. (1976) that finds roles in an informal social network. Other great technologies such as UCINET, KrackPlot and a host of other social network analysis software allowed network approaches to spread rapidly through the field. My hypothesis is that without these technologies and their ease of use (UCINet, I think was a game changer for the field), network analysis might still be in the backwaters.

Today, there are lots of options for the researcher who wants to do network analysis. I myself use two primary tools that fit well into my workflow (e.g., I use an Apple Mac and I do a lot of non-network analysis as well). Those tools are: The R Statistical Programming Language + the SNA Package developed by Professor Carter Butts of UCI Irvine and STATA. While some of my posts (and the accompanying analysis) will use STATA, I  will focus primarily on the use of R for network analysis.

### Getting started with R for Social Network Analysis

Let us begin by downloading and installing the R programming language. Begin by navigating to the R-Project. I will do the walkthrough for the Mac version of R.

After navigating there, click on the CRAN link under download. The closest server to me is probably at UC Berkeley, but pick which ever one is closest.

Now that R is installed, lets open it up and get some basic network analysis going. Once the R console is open, click on File (in the top menu) and then click on New Document. This should open a blank script file. Type a comment (a line that begins with #). I’ve typed:

# This file provides some simple code to get you started on your Network Analysis Journey

Save the file (I’ve called it RSNApractice.R). Clicking on the file name will give you access to the complete file.

Now that we have that sorted out, let us begin by installing some important packages. You can type this code directly into the console.

install.packages(“data.table”)
install.packages(“curl”)
install.packages(“sna”)

The data.table package allows us to import data from the web; the curl is a required package for data.table and sna. Once these packages are installed, lets get them loaded.

library(data.table)
library(curl)
library(sna)

Now that these are installed, let me tell you a little about the data that we are going to analyze. This data comes from professional services consulting firm on the east coast of the United States, collected some time in the early 2000s. There are 247 people at the firm and each of them responded to a network survey where they answered 6 questions. Here are the questions:

#(Q0) “who do you know or know of at [the firm]”,

#(Q1) “who you would approach for help or advice on work related issues”,

#(Q2) “who might typically come to you for help or advice on work related issues”,

#(Q3) who you go to “about more than just how to do your work well. For example, you may be interested in ‘how things work’ around here, or how to optimize your chances for a successful career here”,

#(Q4) “who might typically come to you for help or advice along these [non-task related] dimensions” and finally

#(Q5) “who you think of as friends here at [firm].”

I’ve uploaded their responses to a dropbox folder in the form of matrices. The rows of the matrix indicate “senders” or “Ego” and the columns represent “receivers” or “Alters.”

We can load the data using the following code:

#Load the “Professionals” network data from Dropbox.

#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.

q0 = as.matrix(q0)
q1 = as.matrix(q1)
q2 = as.matrix(q2)
q3 = as.matrix(q3)
q4 = as.matrix(q4)
q5 = as.matrix(q5)

# Create a vector of numbers from 1-247 and convert them to a string.
# We will use these to rename our rows and columns.

names = paste(seq(1:247))

# Rename all the rows

rownames(q0) = names
rownames(q1) = names
rownames(q2) = names
rownames(q3) = names
rownames(q4) = names
rownames(q5) = names

# Rename all the columns

colnames(q0) = names
colnames(q1) = names
colnames(q2) = names
colnames(q3) = names
colnames(q4) = names
colnames(q5) = names

This code should load all of the network data into the R console.

Now, lets import some attributes.

# Imports the attributes file and outcomes file, and converts it into a data frame.

attr attr

Now that these are all loaded, lets see how the data look. Type the following to look at the first ten rows and columns of q0.

# Lets look at the first ten rows/columns of q0

q0[1:10,1:10]

How do we interpret this? Person 1 doesn’t appear to know persons 2-10. However, person 2 says they know person 5, 7 and 10.

Lets plot this as a graph.

# Plot the first 10 people in the q0 matrix.

gplot(q0[1:10,1:10])

Let us now plot the full q0 network. This is the “knowing” network of this firm of 247.

# Plot the full “knowing” network

gplot(q0)

Quite dense. A lot of people know a lot of other people at the firm. Try to do this analysis for q1 to q5. What are the differences/similarities?

Lets do some simple centrality calculations (more on Centrality in the Representing Networks post).

# Calculate two simple centrality calculations on the q0 network.
# Indegree is the number of people who say they know a focal person (in arrows on a node)
# Outdegree is the number of people who a focal person says they know (out arrows from a node)

q0.indegree = degree(q0, cmode =”indegree”)
q0.outdegree = degree(q0, cmode =”outdegree”)

The centrality measures are now saved in the objects q0.indegree and q0.outdegree. Lets plot histograms of these two measures.

# Plot histograms of q0.indegree and q0.outdegree

hist(q0.indegree)
hist(q0.outdegree)

These look very nicely distributed, almost poisson. Lets calculate some summary statistics on these measures.

# Summary statistics on the indegree/outdegree measures

summary(q0.indegree)
summary(q0.outdegree)

Now, lets do one final thing before we conclude this post (you can keep analyzing stuff, I will delve deeper into centrality measures and the like in a different post). I have also given you an outcomes file with three outcomes.

Here are the outcome variables:

relationships: whether the respondent feels their relationships at the firm are fulfilling
success: whether the respondent feels that they have the knowledge to succeed at the firm
appreciate: whether they feel appreciated

Here is a description of the attribute variables:

tenure: tenure at this firm
title: whether the employee is an analyst, lateral hire, or partner
location: what office they work in
gender: male or female
ethnicity: 91% are white
age: age of employee
elite: whether the employee graduated from an elite university
feeder: whether the employee graduated from a “feeder” university
work1-work24: types of work the employee does

Lets conduct one final analysis. Lets see if there is a correlation between how many people an employee knows, and whether they feel like they have the knowledge to scuc

# Examine if there is a correlation between how many people someone knows and whether they feel like they have the knowledge to succeed.

m.0 summary(m.0)

Looks like there is at least a bivariate correlation.  Lets plot it.

# Plot the regression and the data points.

plot(q0.indegree,attr\$success)
abline(m.0)

Now that you have most of the data, you can explore yourself. Here is the full code @ RSNApractice.R

# The Foundations of Network Analysis

The course “Topics in Social Network Analysis: Structure and Dynamics” is targeted towards doctoral students in management, organizational behavior and strategy. This blog post summarizes the first lecture, “The Foundations of Network Analysis.”

The goal of the first lecture is to introduce you to the “why” behind network theory and a bit of the “what.” Overall, the mission of the course is to help you become a sophisticated consumer of networks research, and hopefully a sophisticated producer of it as well.

By the end of the course, you should be able to:

• Develop network-theoretic explanations for the behavior of people, teams and organizations. Network theoretic explanations use “relationships” (we’ll talk more about this in the future) and “patterns of relationships” as explanatory devices rather than traits or characteristics.
• Learn how to set up high-quality research designs for your network theories.

So, lets begin with a simple question: What is network analysis?

There are a lot of definitions, but here is one I like:

Network theory is a scientific perspective that reasons about the behavior of a target system or elements of that system, using the pattern of relationships between elements of that system.

Lets begin with a super simple example. Stanford GSB has approximately 400 students. Let us assume, for a moment that all 400 students end up getting jobs with certain wages w(i). Some students earn a lot of money (a lot!) and some student’s might make less than what they made before they came into the MBA program. An analyst might wonder: What causes this variation in MBA salaries?

A astute PhD student might theorize a function that maps some vector of characteristics of each MBA student c(i) to their wage w(i), such that:

w(i) = f(c(i))

Elements in the vector c might include:

1. The undergraduate institution of the student (before their Stanford MBA)
3. GMAT score
4. Prior wage before business school
5. Personality
6. Gender
7. Specialization
8. …so on.

The above function assumes that wages depend on these individual characteristics and perhaps the reaction of employers to these characteristics. But the dependency is between these traits and wages.

In the above graph, the circles (nodes) represent the MBA students (I’ve depicted 20). Large nodes represent individuals who may be high on the characteristics we described above, and vice versa. Thus, our reasoning focuses on how the nodes vary based on some characteristic.

Network analysts take a different perspective. They propose a different type of dependency: that people’s outcomes depend on the types of people they have relationships with and/or the pattern of those relationships.

This concept, that individual outcomes depend on a person’s relationships to others is not at all that new. This idea is as old as human history. However, what network analysis contributed was to provide a useful and tractable representation of this dependence among people and a way to empirically test the effects of such dependencies.

To summarize, the “new knowledge” that network analysis contributed was:

• To strongly argue that these dependencies among individuals matter.
• That these dependencies could be represented by a network (consisting of nodes, the elements of the system; and edges, the dependencies between the elements)
• That analysis of these dependencies (e.g., summaries of or descriptions of patterns of) could help us predict the performance of elements of the system better than the individual-trait based approach alone.
• That specific social mechanisms (basically stories) link certain patterns to certain outcomes through some well-specified chain of logic.

These are not simple problems. Theoretical and empirical issues related to this set of basic problems have challenged us for nearly a century now. As you can imagine, incorporating social relationships into the analysis of human and organizational behavior, will require a new way of thinking about human action and new methods to empirically validate our theories.

Before we get to the core problems of network analysis, it is perhaps useful to sketch a bit of its history and development.

### Network analysis has its “origins” in many disciplines

If you are really interested in the history of social network analysis, check out Linton Freeman’s book “The Development of Social Network Analysis.”

Psychology

Jacob Moreno:  Invented sociometry, the network that we see today is a direct consequence of Moreno, he invented the sociogram which is a set of points that are connected by lines.  He used sociograms to identify leaders, isolates and uncover patterns of asymmetry and reciprocity. He discovered what we now know as the “star” network.

Kurt Lewin: Studied group behavior. His basic argument was that individual action in groups was constrained by the concrete relationships that existed between members of the groups.  He is often credited as being on of the founding fathers of social psychology, and the person who coined the term “group dynamics.”

Fritz Heider: Studied social perception and attitudes and developed what he called “balance theory” – we all know the basic mechanics of balance theory:

• “A friend of a friend is a” …
• “A friend of a enemy is a” …
• “An enemy of an enemy is a” …

Balance theory was converted into mathematical form by Dorwin Cartwright (a psychologist) and Frank Harary (a mathematician)  — Harary is often credited with being one of the founders of modern graph theory.

As you can see all three approaches either directly used graphical or mathematical notation, or later were turned into a mathematical form.

Anthropology

Another parallel set of developments came in social anthropology – they conceptualized “social structure” as concrete relations between individuals in a society. SF Nadel, especially, theorized about the relationship between networks and “roles” in his treatise “A Theory of Social Structure.” A quote about A Theory of Social Structure from Britannica:

In his posthumous Theory of Social Structure (1958), sometimes regarded as one of the 20th century’s foremost theoretical works in the social sciences, Nadel examined social roles, which he considered to be crucial in the analysis of social structure.

The famous “Hawthorne experiment” was conducted in Chicago in the 1920’s — found that one of the best predictors of productivity was the “informal organization” of the plant—the pattern of personal relationships that people had with each other.

Sociology

The big revolution in social network analysis happened in the 1960’s and 70’s – and the primary protagonists of this revolution were located at Harvard, and led by Harrison White and at University of California – Irvine, led by Linton Freeman.  Much of the basic language, the tools and the theories we use today in network analysis was developed in this period.

In the 1980’s and 1990’s a group of scholars in management and organizational behavior entered the fray, and thus began the organizational social network revolution. These individuals include scholars who received their PhDs in business schools or sociology departments, but had some contact with the network theorists in sociology or sociologists who were hired by business schools. The names include people like: Ronald Burt at the U of Chicago, Daniel Brass at Penn State and later at U Kentucky, David Krackhardt at Cornell and later Carnegie Mellon, and Brian Uzzi at Northwestern, and Joel Podolny who was at Stanford.

Modern Network Analysis is Multi and sometimes Inter-Disciplinary

Today, network analysis is a multi-disciplinary, and sometimes inter-disciplinary enterprise. A lot of work has been done by scholars in a variety of disciplines. Many of the important theoretical ideas about what types of network should matter and why, were developed by sociologists (Ron Burt, for instance) and the further developed and extended by others in sociology (Fernandez and Gould) as well as scholars in management (Gulati, McEvily, etc.).

Concurrently, a large number of statisticians, including Stanley Wasserman, Tom Snijders, etc. developed methodologies for modeling the formation and dynamics of social networks. They developed models such as the p* models, Stochastic Actor-oriented Models, and much more..

The economists, starting with Charles Manski developed and theorized about methods that would allow for causal inference for network effects. Venkatesh Bala and Sanjeev Goyal (two economists from Cambridge, UK) developed and formalized a game theoretic model of network formation. Matthew Jackson of Stanford has pushed the development of formal models of network formation and “network games” forward along a variety of dimensions.

• Most of the best network research today draws on many of these traditions. Research in organizational behavior that examines network effects must draw on the work of Charles Manski for guidance about the empirical validity of the network effects they estimate.
• A large body of management research—those focusing both explicitly or implicitly—on network ideas draws on the ideas of sociologists  — both in the business schools and in the sociology departments.
• Research in economics has drawn heavily on sociology—with or without citation—the most interesting intersection of this research is happening in the economics of education and labor, development economics, and finance.
• Today, you will find “network” research in almost all the top journals in management, economics, sociology, statistics, and computer science. What is more, is that you will also find specialty journals focused just on network analysis (e.g., Social Networks and Network Science).

### Network Reasoning –  Micro to Macro, back to Micro

An important feature of network analysis is that it gives us a way to think about both the micro (the behavior of the elements of a system, e.g., people) and the macro (society, organizations, etc.) simultaneously.

One of the most beautiful demonstrations of this is presented in the following graph:

This graph comes from Mark Granovetter’s Strength of Weak Ties. Why is this triad forbidden? That is, why is this structure unlikely to occur?

To answer this question, we will need some balance theory. Let us assign a positive valance to the present strong ties (i.e., AC and AB) and a negative sign to the absent ties (i.e., BC). To get the sign of this graph, let us just multiply the signs of the individual dyads in the triad (AC, AB, BC).

• The forbidden triad: (+)(+)(-) = (-)

Balance theory considers the sign of this graph to be negative. That means that it is unstable. For instance, if A and C are friends as are A and B, there is likely to be greater opportunities for B and C to interact and as a result form a tie to each other. This closes the triad and results in: a closed triad with the following structure: (+)(+)(+) = (+). On the other hand, if C and B cannot get along, then there will be conflict either between A and C or A and B, resulting in one of A’s  ties breaking, resulting in: a triad with a singular tie tie: (+)(-)(-) = (+).

OK, so what?

Well, lets take the perspective of A. In the forbidden triad, A is a “bridge” she is the only connection between C and B and as a result has access to information from two sources that might not have overlapping information. However, the forbiddenness of this structure means that it is unstable with strong ties. The position of A reverts to either Equilibrium 1, where A is no longer a bridge because she doesn’t have a connection to B (or C); or Equilibrium 2, where A is no longer a bridge because C and B have a connection to each other and no longer have to go through A to share information. Thus, A’s role as a passthrough bridge is diminished.

This are very micro arguments. They are based on the psychological processes of individuals and their interpersonal dynamics. How do these micro processes translate into network processes at a larger scale (e.g., an organization, community or society.)?

One assumption we start with is that information is distributed unevenly across groups, and that different groups or cliques have different pieces of information.

This is not an unrealistic assumption. If you compare Berkeley to Stanford, people in the two places are likely talking about different ideas. Most people in each group do not have a complete understanding as to what ideas the other group is interested in or talking about. This is probably (or even more) true across companies, countries, different regional geographies, etc.

However, strong-tie bridges across these groups—according to our micro reasoning above—do not exist because of the two equilibria we described above.

Granovetter’s deep insight was that this problem of bridges not existing can be solved if the bridges are weak ties rather than strong ties. Weak ties, allow individuals to access information across disconnected clusters, where as strong ties, because they are embedded in cliques—e.g., exist within a cluster—only provide redundant information.

What Granovetter (1973) showed was:

• Weak ties are more useful for job seekers (that is, acquaintances) than are their close and strong ties (friends and family).
• Weak ties provide access to novel information, not present within a cluster.
• The relationship between tie strength and finding a job has less to do with the strength of the tie per se and more to do with the macro-structure of the larger network (e.g., the connections between clusters).

The beauty, I think that is the most appropriate word, of this theory is that it elegantly links a psychological process (balance theory) to the macro structure of the network (society or organization wide network), and then back to the individual outcome.  This type of reasoning allows us to represent the functioning of an important system, in a way that will be difficult to do with more atomized theories of human action.

Thus, our ultimate goal is to develop theories that link a person’s social network to some larger structure, then back again to individual human action.

### Where can network representations be useful for analysis?

Most novice students of network analysis often begin with the perspective that a network is a real thing and as a thing it can become the object of analysis. However, this is not true. Networks are representations—and imperfect ones at that—of a very complicated target system. Because networks are indeed representations and not real things, an analyst can represent many different target systems using a network representation.

The most basic network representations consist of two parts: nodes and edges. Below, you will see a network called the “Kite Network.” For now, lets ignore the structure of the network and its properties, but focus on two elements. Networks consist of nodes (the circles) that represent the entities we are studying in the target system and the edges (sometimes called links) which represent the relationships between the nodes/entities.

The edges in the network above are undirected, meaning that they have no direction. For instance, co-authorship is a relationship that is naturally undirected.

Above, I’ve taken the same kite network and made the edges directed. This means that there is a direction of flow (of information, etc.) between the nodes that is specified in the network. For instance, imagine if the relationship represented in the graph is “Seeks advice from” we could read the network to indicate that A seeks advice from B, but not the other way around. On the other hand, both B and E seek advice from each other.

Now that we have these basics down, we can use these two basic elements to represent many different systems:

• The studying behaviors among students
• The friendships among workers in a firm
• The alliances among firms
• The relationships among different units/teams within a corporation

The above examples are very pertinent to OB/Strategy. However, networks can be used to represent other systems as well:

• The interactions between genes
• The similarity among jobs in an organization
• Shared funders among startups
• The co-presence of two ingredients in a recipe

While a network representation is useful for all these very diverse situations, the underlying theory describing the functioning of these various systems is rather different. This is true for at least three reasons:

It is obvious that the kind of reasoning we use for each of these domains will be different for at least three reasons:

1. The actions and outcomes of the actors in each domain are likely to be different. Students do different things from firms, and they both do different things from genes, jobs and ingredients.
2. The mechanisms (e.g., the step-by-step processes) that link actions to outcomes are likely to be different across the contexts.
3. Finally, the links between actors are qualitatively different across the domains, different types of information flow between nodes through these links, different amounts of information can flow, and different meanings are ascribed to the links.

The flexibility of the network representation allows for a critique that network theory is a free-for-all where anything goes because the actors, mechanisms, and links can be anything in any context.

While there is an element of this critique that is valid, I will argue in this class that the network representation is tremendously powerful and there is a decent amount of consistency in network reasoning across many different contexts and target systems. That is,  we can apply many of the the same types of reasoning, with modifications of course, to explain actions and behavior across a variety of contexts. Further, learning and insight from one domain can be applied to learn about another.

### Krackhardt’s Levels of Analysis

Networks are rich in their expressiveness of social reality. As a consequence the analyst sometimes has to ignore many other facets of the structure/content of a network to focus analytical attention on one facet. A useful typology for network analysis, developed by David Krackhardt, is called the “Levels of Analysis.”  In his typology networks have (at least) four levels of analysis: Level 0 to Level 3.

The distinction across levels is important to make for several reasons, including the fact that:

1. The theories are different
2. The statistical techniques are different
3. The data requirements are (potentially different.

### Level 1: The node level of analysis

Consider the following graph:

And this matrix, which was used to generate this graph.

This is the “raw” data of the network.  This data can be analyzed in many different ways. One of the most common approaches in network analysis to focus on node level analysis, or Level 1 (it is called level 1 because if there are n nodes, the number of observations one has is on the order of n^1.)

So far, we have been focusing on nodes—these are the actors whose behavior we are trying to explain.  More specifically, we are trying to explain the behavior of “Ego” (from the Latin I) based on the nature of or pattern of his or her connections to “alters” (from the Latin others).  Thus, in this case, our goal is to primarily take two kinds of measurements:

1. Measurements about some action or outcome of Ego (our dependent variables)
2. Measurements about the features of Ego’s connections to the alters in the network (our explanatory variables)

Thus, depending on the theory, we figure out how to quantify the connections that ego has to his or her alters, and see whether there exists a correlation between this and Ego’s outcomes.

• Ego1       Outcome               NetworkMeasure
• Ego2       Outcome               NetworkMeasure
• .
• .
• .
• .

There are generally two approaches to the Level 1 analysis. I would like to call one “structural analysis” and the other “peer effects” or “peer influence.” We will cover both in the class.

• Structural Analysis/Analysis of Network Position defines the NetworkMeasure based on a summary of the pattern of edges in the network with respect to the focal node (e.g., the node whose outcome we are interested in).
• Peer effects often ignores the structure and focuses on understanding how the characteristics of a focal node’s connections (e.g., the prior performance of a node’s connections) affect that node’s outcomes. For instance, this could be done by taking the average of the characteristics of the alter’s SAT score or some other metric.

As you can see, the data look pretty much like a traditional regression analysis at the individual level. We call this level (1) analysis because there are N(1) observations. For the number of nodes in the network.

The nice thing about both of these types of analyses, is that the statistical methods we use are ones that you should be quite familiar with as a doctoral student. While there are empirical issues in interpreting the coefficients from these models, the setup is pretty standard.  Most network analysis takes one of these two forms.

### Level 2: The dyad level of analysis

Another class of problems requires us to focus on understanding the processes that led the network to take the structure that it has taken. In the static case, the micro-question is: Why is one connection present, while another one not present? In the dynamic case, the question might be reframed as: why do some ties persist, while others dissolve?

This type of analysis is called Level 2 because in a network consisting of N actors, there are N(N-1) or ~ N(2) observations in the data.

The focus of Level 2 analyses is understanding why a tie or interaction, or relationship, exists between an ego and an alter.

For instance, the questions we can ask, include:

• Why two workers decide to become friends.
• Why two companies decide to pursue a research collaboration.
• Why two scientists decide to co-author a paper.

The kind of information and often the types of theories we need are richer here than is often necessary at the L(1) level of analysis.

Can you tell me what kind of information we might need to make a prediction about whether two scientists decide to collaborate?

• The interaction of the characteristics of Ego and Alter (e.g. whether they are in the same discipline, the distance from one office to the next, etc.)
• The ties that exist indirectly between Ego and Alter.

Further, the methods we use here are much more complex than the ones used for the N(1) analysis, primarily because of  point #4. There are dependencies in the network that interfere with the presence/absence of a tie for a given pair of individuals. Consider the forbidden triad. It illustrates clearly that A’s decision to form a tie with C is not independent of C’s relationship to B nor independent of A’s relationship with B. Ignoring these dependencies could potentially bias our understanding of why a tie between A and B forms or does not form.

As a consequence, people have developed specific statistical approaches for testing theories at Level two – Multiple Regression – Quadratic Assignment Procedure, Exponential Random Graph Models, (ERGM), and then the older P1 models.

Here the analysis is conducted so that the data structure looks like:

• Actor1   Actor2
• Actor1    Actor3
• Actor1   Actor4
• Actor2   Actor1
• Actor2   Actor3
• Actor2   Actor4

The dependent variable is whether a tie exists between two actors (or whether some kind of interaction occurs, i.e. knowledge transfer). The explanatory variables in these models are the characteristics of ego, alter, their shared characteristics, and the other structures in which they are embedded predict this interaction.

### Level 0: The whole network.

Another level of analysis is the N(0) level of analysis, here the entire network results in only one observation. N^0 = 1

.

The goal of Level 0 analysis is drastically different than the goal in the first two levels of analysis. Here, the analyst is trying to understand  how the entire social network and its configuration affects, the outcomes of the system as a whole. This is an interesting and exciting level of analysis, and there are very few studies that have been conducted at this level.

First, we have to have network data on enough networks that we can do a network analysis.  That is hard in itself. The best research of this type has been done by people studying teams (e.g. Ray Reagans, Ezra Zuckerman and Bill McEvily). In many respects, the small groups research has also looked at this level of analysis going back to some very early work by Bavelas.

In Level 0 analysis we are trying to do is look at the entire network and what it represents (e.g. an entire organization) and relate it to the organizations’ outcome.

Thus, we need theories and measures that can summarize the macro structure of the network and link it to organizational performance.

For instance, a class of problems may include:

• How does the internal network structure of a start-up firm affect its ability to come up with innovative ideas. We would need:
• A set of startups; say 75 or more.
• We look at some measure of the start up’s innovative output.

An example analysis might be to measure the startup’s internal network structure, and then conduct a regression analysis linking the outcome to some measure of the internal network structure (e.g. what proportion of the people have ties to each other i.e. density).

A well known study at this level is Reagans, Zuckerman and McEvily, who found that project teams within an organization who have high density are more effective (they finish their projects faster) than project teams who have low density.

### Level 3: Cognitive Social Structures

Finally, another area of research within social network analysis recognizes that networks are indeed representations and imperfect ones at that.  This is called Level 3 analysis–this is because we have on the order of N*N*N or N(3) observations for use in our analyses.

Consider three graphs, and an organization chart from Krackhardt (1992).

The top-left graph is the “actual” advice network at the firm. By actual, I mean that these are the relationships people say they have with others. The top-right is the actual organizational chart. Note that the organization chart and the advice networks are imperfectly related to each other.

However, once we go to the bottom panel, we see how important cognition and representation is in the network story. On the bottom-left, we see Chris’ representation which is not perfect, but it is not as bad as Ev’s (Ev is a Manager). In terms of human action, people might behave in concert with the network on the top-left (the “actual” network), but also might behave in concert with their own perceptions.

This cognitive angle is critical in network analysis. Cognition links actual structure (if it really exists) to action and then to outcomes. Think of the faux pas.

Conducting Level 3 analysis. his requires collecting data about perceptions and theorizing about how perceptions matter independently and interactively with the “true” structure.

With Level 3 analyses we have N people who have perceptions about N x N-1 relationships.  Resulting in potentially on the order of N^3  observations. In practice however, most of the modeling is done at the node level. Though this is an active area of research and much can be developed here.

### Summary

You should have a pretty general overview of the kinds of problems we will be covering during the course. By the end of the course, you should be able to conduct and extend these types of analysis for a wide range of domains and levels.

# Topics in Social Network Analysis, PhD Syllabus

This course is designed for PhD students in management, organizational behavior and strategy who are interested in applying network ideas in their research. The course will provide an introduction to applied network theory and empirical methods. Over the 6 sessions of the course, students will learn:

• The basic building blocks of most network theories and how they have been applied in various empirical contexts.
• How to collect network data, visualize it and calculate basic network statistics.
• Formulate and test hypotheses drawing on network mechanisms.
• Understand the broad uses of network analysis in the study of organizations and strategy.

### Course Requirements

• Attendance and Participation (30% of grade)
• Theoretical Integration Paper (30% of grade)
• Research Proposal (40% of grade)

### April 28, The Foundations of Social Network Analysis

New knowledge is anything that allows you to predict some outcome more accurately than before. The enterprise of network analysis is one example of a focused search for new knowledge. Network scholars seek to find patterns in human relationships that explain important outcomes—health, economic, and political—that are ignored, non-obvious, or run counter to conventional wisdom. The readings for this class helped set the stage for the network revolution in the social sciences. They articulate, very clearly, what our prior assumptions were about how the world worked, and systematically showed us that we should think differently.

### May 5, Network Position and Performance

Part 1: Structural Holes; Part 2: Status

The most frequent use of network analysis has been to examine the relationship between network “position” and the performance of people and organizations. This line of research has produced exciting and important ideas, including those of structural holes, status, and closure. Network ideas have also helped scholars reformulate ideas about power, leadership and identity. The readings from this class will introduce you to some of the central ideas about network positions and their relationship to performance outcomes such as innovation or promotion.

### May 12, Peer Effects

Theories of network positions are built upon individual-level assumptions regarding informational content and knowledge transfer. Yet, until recently, rigorous empirical evidence for information transfer and learning at the dyadic level has been scarce. In this class we will dig deeper into the growing literature on peer effects and examine when we can expect to observe knowledge transfer, and how to evaluate the quality of evidence.

### May 19, Network Formation

Are there general patterns in how networks are shaped? What forces lead these patterns to emerge and what are the implications for social processes that we care about (e.g., the generation of innovations)? In this class we will cover some core ideas behind the formation of social networks including homophily, triadic closure, reciprocity, and at the macro-scale small worlds and clusters.

### May 26, Network Cognition, Activation and Team Structures

There is a lot more to networks than classical formulations of network effects as “positions” or as “peer effects.” Scholars have creatively shown that how people perceive networks also affects their performance, how the overall structure of a team’s internal and external networks affects team outcomes.

### June 2, The Future of Network Analysis

Your final project presentations go here.