Topics in Strategy (PhD Syllabus)

 

TOPICS IN STRATEGY

BA 972.01 Spring 2019

This course provides Ph.D. students in strategic management and related disciplines an introduction to research on core areas of strategy. The goal of the course is twofold: First, students will get a broad overview of the core topics and theories in the field of strategic management. Second, students will learn how to find, understand, appreciate and communicate research ideas and findings. The course covers the following topics: the sources of firm performance, value creation and capture, managing and organizing, organizational learning, technological change, sociology of the firm, entrepreneurship and emerging markets.

Here is a link to the full syllabus: Duke Strategy PhD Seminar – Syllabus 2019

46 skills startups want and what they pay

What skills should I learn to get a job at a startup? 

What are my skills worth?

Those two questions come up all the time, and frankly, the internet doesn’t have easily accessible information to answer them. Search for “startup skills” on Google and you’ll get vanilla advice about the top skills being “facing failure” or “hustle.” Sure. But you need to know a few things too.

To fill this gap we analyzed thousands of job postings and ranked the top skills that startups are looking for and what they pay for them.  If you’re thinking of joining a startup or even starting one, this list and the linked resources will get you up to speed on the startup labor market.

The average salary shown below is just that, the average. Depending on how good your skills are, which company you work for, and when you get hired, the pay can be higher or lower. We’ve also included pay for the top 10% of the Bay Area jobs for these skills to get a sense of how much top talent gets paid. Also, remember, there is also equity compensation and other perks for many jobs and these are not included in the salary figures.

Check out the sister article: 41 skills Indian startups want and what they pay

#1 – Javascript

  • Average pay: $81,929
  • Average pay in the Bay Area: $113,690
  • Top 10% pay in the Bay Area: $170,000
  • Other must-have skills: Node.js, HTML5, CSS3, React.js, Angular.js, Ruby on Rails, PHP, SQL, JQuery.
  • Common job titles: Software Engineer, Full Stack Developer
  • Where to learn this skill: Lots of options, this course provides a great intro. 

#2 – Python

  • Average pay: $90,094
  • Average pay in the Bay Area: $118,344
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Python, SQL, MySQL, Machine learning, AWS, Linux, Django. 
  • Common job titles: Software Engineer, Data Scientist, Data Engineer
  • Where to learn this skill:  Again, plenty of options. Try this comprehensive course for R and Python.

#3 – Sales & Marketing

  • Average pay: $70,247
  • Average pay in the Bay Area: $94,779
  • Top 10% pay in the Bay Area: $175,000
  • Other must-have skills: Product Marketing, Email Marketing, Saas, Sales Strategy, Leadership, Salesforce, Lead generation.
  • Common job titles:  Account Executive, Sales Development Representative, Business Development Manager. 
  • Where to learn this skill:  Checkout the digital marketing specialization on Coursera

#4 – Social media marketing

  • Average pay: $61,094
  • Average pay in the Bay Area: $85,532
  • Top 10% pay in the Bay Area: $150,000
  • Other must-have skills: Digital marketing, SEO/SEM, git, Email Marketing, Content Creation, Google Analytics, Writing, Facebook Advertising.
  • Common job titles:  Marketing manager, Director of Marketing, Head of Marketing.
  • Where to learn this skill: a Northwestern course on social media marketing.

#5 – Business development

  • Average pay: $68,691
  • Average pay in the Bay Area: $91,345
  • Top 10% pay in the Bay Area: $160,000
  • Other must-have skills: Sales, Sales Strategy, Business strategy
  • Common job titles:  Business development manager, Business development representative, account executive, sales development representative.
  • Where to learn this skill: Big list of business development courses on edX.

#6 – Java

  • Average pay: $89,332
  • Average pay in the Bay Area: $121,392
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Android, Scala
  • Common job titles:  Software engineer, Android developer, Backend Engineer
  • Where to learn this skill: Check out the Learn Java tutorial.

#7 – Node.js

  • Average pay: $85,388
  • Average pay in the Bay Area: $112,701
  • Top 10% pay in the Bay Area: $170,000
  • Other must-have skills: Javascript, HTML5, CSS3, React.js, Angular.js, MongoDB, AWS.
  • Common job titles:  Software engineer, Full Stack Engineer, Full Stack Developer
  • Where to learn this skill: Lots of options, this course provides a great intro to the full MEAN stack. 

#8 – React.js

  • Average pay: $86,006
  • Average pay in the Bay Area: $112,782
  • Top 10% pay in the Bay Area: $170,000
  • Other must-have skills: Javascript, Node.js, HTML5, Angular.js
  • Common job titles:  Software engineer, full stack developer
  • Where to learn this skill: A highly rated react.js course on Udemy.

#9 – Ruby on rails

  • Average pay: $90,351
  • Average pay in the Bay Area: $118,568
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Javascript, SQL, MySQL, Postgresql
  • Common job titles:  Software engineer, full stack developer, DevOps engineer
  • Where to learn this skill: Free Ruby on Rails tutorial.

#10 – HTML5/CSS/Front-end development

  • Average pay: $72,767
  • Average pay in the Bay Area: $105,332
  • Top 10% pay in the Bay Area: $160,000
  • Other must-have skills: Javascript, Node.js, React.js, Angular.js, PHP, SQL, Postgresql, Jquery
  • Common job titles:  Software Engineer, Full Stack Developer, Frontend Developer
  • Where to learn this skill: Lots of options, this course provides a great intro to the MEAN stack. 

#11 – UI/UX Design

#12 – iOS Development

  • Average pay: $82,072
  • Average pay in the Bay Area: $109,894
  • Top 10% pay in the Bay Area: $175,000
  • Other must-have skills: Android, Swift, Objective C, C
  • Common job titles:  iOS Developer, iOS Engineer, Software Engineer
  • Where to learn this skill: Free Udacity course on iOS Development.

#13 – Communication skills

  • Average pay: $58,123
  • Average pay in the Bay Area: $84,165
  • Top 10% pay in the Bay Area: $150,000
  • Other must-have skills: …
  • Common job titles:  Account manager, customer success manager, executive assistant.
  • Where to learn this skill: Here is a WikiHow page on how to communicate effectively.

#14 – SQL/MySQL/PostgreSQL

  • Average pay: $84,792
  • Average pay in the Bay Area: $112,298
  • Top 10% pay in the Bay Area: $170,000
  • Other must-have skills: Python, HTML, AWS, NoSQL, Ruby on Rails, PHP
  • Common job titles:  Software Engineer, Data Engineer
  • Where to learn this skill: Fifty best ways to learn mySQL.

#15 – Angular.js

  • Average pay: $75,785
  • Average pay in the Bay Area: $113,690
  • Top 10% pay in the Bay Area: $170,000
  • Other must-have skills: Javascript, Node.js, HTML5, React.js
  • Common job titles:  Software engineer, full stack developer
  • Where to learn this skill: Angular.js tutorial.

#16 – Customer service

  • Average pay: $50,090
  • Average pay in the Bay Area: $65,087
  • Top 10% pay in the Bay Area: $120,000
  • Other must-have skills: Relationship management
  • Common job titles:  Customer  success manager, account manager
  • Where to learn this skill: Free online customer service courses.

#17 – Machine learning/artificial intelligence

  • Average pay: $87,919
  • Average pay in the Bay Area: $118,602
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Python, Data Science, R
  • Common job titles:  Data Scientist, Machine Learning Engineer
  • Where to learn this skill: A beginners guide to AI/ML.

#18 – Project Management

  • Average pay: $73,522
  • Average pay in the Bay Area: $96,430
  • Top 10% pay in the Bay Area: $157,000
  • Other must-have skills: Leadership, Agile
  • Common job titles:  Product manager, Project manager
  • Where to learn this skill: Free edX project management course.

#19 – Android

  • Average pay: $78,894
  • Average pay in the Bay Area: $108,966
  • Top 10% pay in the Bay Area: $175,000
  • Other must-have skills: Java, iOS Development, Mobile app design
  • Common job titles:  Android developer, Android engineer.
  • Where to learn this skill: Google’s Android Development Course.

#20 – PHP

  • Average pay: $68,792
  • Average pay in the Bay Area: $105,915
  • Top 10% pay in the Bay Area: $160,000
  • Other must-have skills: Javascript, HTML, CSS,  SQL, JQuery
  • Common job titles:  Software engineer, full stack developer
  • Where to learn this skill: Learn PHP Online

#21 – Data analysis

  • Average pay: $75,597
  • Average pay in the Bay Area: $98,775
  • Top 10% pay in the Bay Area: $160,000
  • Other must-have skills: R, Python
  • Common job titles:  Data scientist, data analyst
  • Where to learn this skill: Free online data analysis curriculum.

#22 – Photoshop

  • Average pay: $55,438
  • Average pay in the Bay Area: $81,892
  • Top 10% pay in the Bay Area: $120,000
  • Other must-have skills: Graphic design, Adobe Illustrator, Sketch.
  • Common job titles:  Graphic designer, UI/UX Designer, Visual designer
  • Where to learn this skill: Udemy photoshop course.

#23 – C++

  • Average pay: $89,442
  • Average pay in the Bay Area: $117,050
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: C
  • Common job titles:  Software Engineer, Senior Software Engineer
  • Where to learn this skill: Google for Education ++ Course.

#24 – Growth hacking

  • Average pay: $63,824
  • Average pay in the Bay Area: $91,906
  • Top 10% pay in the Bay Area: $150,000
  • Other must-have skills: Social media marketing, SEO/SEM, Facebook advertising, Email marketing, Digital Marketing.
  • Common job titles: Growth Hacker, Head of Growth, Marketing Manager, Head of Marketing
  • Where to learn this skill:  Coursera has a specialization in social media marketing.

#25 – Product management

  • Average pay: $92,888
  • Average pay in the Bay Area: $120,642
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Leadership, product development.
  • Common job titles: Product manager
  • Where to learn this skill: Lots of courses on product management. Carnegie Mellon also has a degree.

#26 – Graphic Design

  • Average pay: $59,059
  • Average pay in the Bay Area: $87,595
  • Top 10% pay in the Bay Area: $140,000
  • Other must-have skills: Photoshop, Illustrator
  • Common job titles: Graphic designer, Visual designer, UI/UX Designer
  • Where to learn this skill: Format magazine has a list of free graphic design resources.

#27 – MongoDB

  • Average pay: $83,948
  • Average pay in the Bay Area: $115,899
  • Top 10% pay in the Bay Area: $175,000
  • Other must-have skills: Node.js
  • Common job titles:  Software Engineer, DevOps Engineer, Full Stack Engineer
  • Where to learn this skill: MongoDB University

#28 – Linux

  • Average pay: $85,181
  • Average pay in the Bay Area: $115,914
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Linux, Python, AWS, Docker
  • Common job titles: DevOps Engineer, Software Engineer, Site Reliability Engineer.
  • Where to learn this skill: Free Linux Courses

#29 – SEO/SEM

  • Average pay: $67,713
  • Average pay in the Bay Area: $93,938
  • Top 10% pay in the Bay Area: $158,000
  • Other must-have skills: Social media marketing, digital marketing, content marketing, social media, Facebook advertising.
  • Common job titles: Marketing manager, Digital Marketing Manager
  • Where to learn this skill: Lots of courses on SEO/SEM at Coursera and Udemy

#30 – Swift

  • Average pay: $83,932
  • Average pay in the Bay Area: $111,461
  • Top 10% pay in the Bay Area: $175,000
  • Other must-have skills: iOS development, Objective C, C
  • Common job titles: iOS Developer, iOS Engineer,
  • Where to learn this skill: Download the swift ebook.

#31 – AWS

  • Average pay: $94,167
  • Average pay in the Bay Area: $122,986
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Python, Node.js, SQL, Linux, Docker
  • Common job titles: DevOps Engineer, Software Engineer
  • Where to learn this skill: Tutorial.

#32 – Business operations

  • Average pay: $62,297
  • Average pay in the Bay Area: $83,575
  • Top 10% pay in the Bay Area: $140,000
  • Other must-have skills:
  • Common job titles: Operations Manager, Director of Operations, Head of Operations
  • Where to learn this skill: Here’s a Coursera course in Operations Management.

#33 – JQuery

  • Average pay: $72,764
  • Average pay in the Bay Area: $104,335
  • Top 10% pay in the Bay Area: $160,000
  • Other must-have skills: Javascript, HTML, CSS, PHP
  • Common job titles:  Software Engineer, Front End Developer
  • Where to learn this skill: JQuery Learning Center

#34 – Git

  • Average pay: $69,944
  • Average pay in the Bay Area: $99,252
  • Top 10% pay in the Bay Area: $155,000
  • Other must-have skills: Social media marketing, Facebook advertising
  • Common job titles: Marketing manager, digital marketing manager
  • Where to learn this skill: Learn git.

#35 – Django

  • Average pay: $82,039
  • Average pay in the Bay Area: $114,143
  • Top 10% pay in the Bay Area: $170,000
  • Other must-have skills: Python
  • Common job titles: Software engineer, backend engineer, full stack developer
  • Where to learn this skill: Getting started with Django

#36 – Salesforce

  • Average pay: $71,613
  • Average pay in the Bay Area: $91,664
  • Top 10% pay in the Bay Area: $160,000
  • Other must-have skills: Sales and Marketing
  • Common job titles: Sales Development Representative, Account Executive
  • Where to learn this skill: Salesforce tutorial

#37 – Adobe Illustrator

  • Average pay: $57,461
  • Average pay in the Bay Area: $88,482
  • Top 10% pay in the Bay Area: $135,000
  • Other must-have skills: Photoshop, Graphic Design, Sketch
  • Common job titles: Graphic designer, Product Designer, UI/UX Designer
  • Where to learn this skill: Tutorials from Adobe

#38 – Business strategy

  • Average pay: $76,368
  • Average pay in the Bay Area: $100,677
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Business development
  • Common job titles: Business development manager
  • Where to learn this skill: Business strategy courses on Coursera.

#39 – Sketch

  • Average pay: $74,488
  • Average pay in the Bay Area: $95,487
  • Top 10% pay in the Bay Area: $140,000
  • Other must-have skills: UI/UX Design, Photoshop, Illustrator
  • Common job titles: Product designer, UI/UX Designer, Designer
  • Where to learn this skill: Check out this intro to Sketch post on Medium.

#40- Restful services

  • Average pay: $80,852
  • Average pay in the Bay Area: $112,327
  • Top 10% pay in the Bay Area: $175,000
  • Other must-have skills:…
  • Common job titles: Software engineer, Backend Developer, Full Stack Developer
  • Where to learn this skill: Build a restful service. 

#41 – C

  • Average pay: $82,871
  • Average pay in the Bay Area: $111,707
  • Top 10% pay in the Bay Area: $175,000
  • Other must-have skills: iOS Development, C++, Swift, Objective C
  • Common job titles: iOS Developer, Software Engineer
  • Where to learn this skill: Wikibooks about C programming.

#42 – Docker

  • Average pay: $90,988
  • Average pay in the Bay Area: $125,133
  • Top 10% pay in the Bay Area: $160,000
  • Other must-have skills: Linux, AWS, DynamoDB
  • Common job titles: DevOps Engineer, Software Engineer
  • Where to learn this skill: Docker tutorial for beginners.

#43 – Agile

  • Average pay: $91,686
  • Average pay in the Bay Area: $118,306
  • Top 10% pay in the Bay Area: $170,000
  • Other must-have skills: Project management
  • Common job titles: Product and Project Manager
  • Where to learn this skill: The agile coach

#44 – R

  • Average pay: $87,128
  • Average pay in the Bay Area: $110,854
  • Top 10% pay in the Bay Area: $180,000
  • Other must-have skills: Python, Machine Learning
  • Common job titles: Data Scientist, Data Analyst
  • Where to learn this skill: List of free R tutorials also try this comprehensive course for R and Python.

#45 – Scala

  • Average pay: $107,869
  • Average pay in the Bay Area: $133,935
  • Top 10% pay in the Bay Area: $200,000
  • Other must-have skills: Java
  • Common job titles: Software engineer, Data engineer
  • Where to learn this skill: Scala Exercises

#46 – WordPress

  • Average pay: $55,676
  • Average pay in the Bay Area: $79,868
  • Top 10% pay in the Bay Area: $140,000
  • Other must-have skills: Content writing
  • Common job titles: Marketing director, Digital Marketing Manager
  • Where to learn this skill: learn.wordpress.com

41 skills Indian startups want and what they pay

 

This is the India version of 46 skills Indian startups want and what they pay. Like the US edition, we’ve analyzed thousands of jobs advertised by Indian startups to bring you an informed look at the skills you need to get hired. We also looked just at Bengaluru salaries and the top 10% of those as well to get a sense of the top tier of the labor market. 

What skills should I learn to get a job at a startup? 

What are my skills worth?

Those two questions come up all the time, and frankly, the internet doesn’t have easily accessible information to answer them. Search for “startup skills” on Google and you’ll get vanilla advice about the top skills being “facing failure” or “hustle.” Sure. But you need to know a few things too.

To fill this gap we analyzed thousands of job postings and ranked the top skills that startups in India are looking for and what they pay for them.  If you’re thinking of joining a startup or even starting one, this list and the linked resources will get you up to speed on the startup labor market.

The average salaries shown below are just that, the average. Depending on how good your skills are, which company you work for, and when you get hired, the pay can be higher or lower. We’ve also included pay for the top 10% of the Bengaluru jobs for these skills to get a sense of how much top talent gets paid. Also, remember, there is also equity compensation and other perks for many jobs and these are not included in the salary figures.

#1 – Javascript

  • Average pay: ₹6,07,934 INR
  • Average pay in Bengaluru: ₹8,33,552 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Node.js, HTML5, CSS3, React.js, Angular.js, Ruby on Rails, PHP, SQL, JQuery.
  • Common job titles: Software Engineer, Full Stack Developer
  • Where to learn this skill: Lots of options, this course provides a great intro.

#2 – Python

  • Average pay: 8,18,416 INR
  • Average pay in Bengaluru: ₹11,66,243 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: Python, R, Django  Machine learning, Data Science
  • Common job titles: Data Scientist, Python developer
  • Where to learn this skill:  Again, plenty of options. Try this comprehensive course for R and Python.

#3 – Java

  • Average pay:7,81,899 INR
  • Average pay in Bengaluru: ₹11,97,291 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: Android
  • Common job titles:  Software engineer, Android developer, Java developer
  • Where to learn this skill: Check out the Learn Java tutorial.

#4 – HTML5/CSS/Front-end development

  • Average pay: ₹5,38,974 INR
  • Average pay in Bengaluru: ₹7,60,051 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Javascript, Node.js, React.js, Angular.js, PHP, SQL, Postgresql, Jquery, WordPress
  • Common job titles:  Full Stack Developer, Web developer, Frondend Developer
  • Where to learn this skill: Lots of options, this course provides a great intro to the MEAN stack. 

#5 – Angular.js

  • Average pay:  ₹6,48,103 INR
  • Average pay in Bengaluru: ₹8,10,801 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Javascript, Node.js, HTML5, React.js
  • Common job titles:  Full stack developer, Software Engineer, Frontend Developer, Web Developer
  • Where to learn this skill: Angular.js tutorial.

#6 – Android

  • Average pay: ₹6,94,384 INR
  • Average pay in Bengaluru: ₹8,35,801 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Java, iOS Development, Mobile app design, Rest APIs, Mobile Application Development
  • Common job titles:  Android developer, Android engineer.
  • Where to learn this skill: Google’s Android Development Course.

#7 – PHP

  • Average pay: ₹4,90,688 INR
  • Average pay in Bengaluru: ₹8,01,587 INR
  • Top 10% pay in Bengaluru: ₹18,00,000 INR
  • Other must-have skills: Javascript, HTML, CSS,  SQL, JQuery
  • Common job titles:  PHP Developer, Full Stack Developer, Web Developer, Software Engineer
  • Where to learn this skill: Learn PHP Online

#8 – Photoshop

  • Average pay: ₹3,83,070 INR
  • Average pay in Bengaluru: ₹6,62,598 INR
  • Top 10% pay in Bengaluru: ₹14,00,000 INR
  • Other must-have skills: Graphic design, Adobe Illustrator, Sketch.
  • Common job titles:  Graphic designer, UI/UX Designer, Visual designer
  • Where to learn this skill: Udemy photoshop course.

#9 – Business development

  • Average pay: ₹4,55,274 INR
  • Average pay in Bengaluru: ₹10,38,997 INR
  • Top 10% pay in Bengaluru: ₹15,00,000 INR
  • Other must-have skills: Sales, Sales Strategy, Business strategy
  • Common job titles:  Business development manager, Business development executive, Business Development Associate
  • Where to learn this skill: Big list of business development courses on edX.

#10 – Sales & Marketing

  • Average pay: 4,63,466
  • Average pay in Bengaluru: ₹8,18,852
  • Top 10% pay in Bengaluru: ₹15,00,000
  • Other must-have skills: Product Marketing, Email Marketing, Saas, Sales Strategy, Leadership, Salesforce, Lead generation.
  • Common job titles:  Account Executive, Sales Development Representative, Business Development Manager. 
  • Where to learn this skill:  Checkout the digital marketing specialization on Coursera

#11 – Communication skills

  • Average pay:  ₹3,39,885 INR
  • Average pay in Bengaluru: ₹4,15,325 INR
  • Top 10% pay in Bengaluru: ₹9,00,000 INR
  • Other must-have skills: Sales, Marketing
  • Common job titles:  Business development manager, business development executive, sales executive.
  • Where to learn this skill: Here is a WikiHow page on how to communicate effectively.

#12 – Node.js

  • Average pay: ₹7,09,486 INR
  • Average pay in Bengaluru: ₹8,92,134 INR
  • Top 10% pay in Bengaluru: ₹22,00,000 INR
  • Other must-have skills: Javascript, React.js, Angular.js, MongoDB
  • Common job titles:  Software engineer, Full Stack Engineer, Backend Developer, MEAN Stack Developer
  • Where to learn this skill: Lots of options, this course provides a great intro to the full MEAN stack. 

#13 – Social media marketing

  • Average pay: ₹4,92,255 INR
  • Average pay in Bengaluru: ₹7,81,886 INR
  • Top 10% pay in Bengaluru: ₹15,00,000 INR
  • Other must-have skills: Digital marketing, SEO/SEM, git, Email Marketing, Content Creation, Google Analytics, Writing, Facebook Advertising.
  • Common job titles:  Digital marketing manager, Content Writer, Marketing Manager, Social Media Manager.
  • Where to learn this skill: a Northwestern course on social media marketing.

#14 – SQL/MySQL/PostgreSQL

  • Average pay: ₹6,20,343 INR
  • Average pay in Bengaluru: ₹8,19,849 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Python, HTML, AWS, NoSQL, Ruby on Rails, PHP
  • Common job titles:  Software Engineer, Data Engineer
  • Where to learn this skill: Fifty best ways to learn mySQL.

#15 – JQuery

  • Average pay: ₹5,35,782 INR
  • Average pay in Bengaluru: ₹7,42,365 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Javascript, HTML, CSS, PHP
  • Common job titles:  Full Stack Developer, Web Developer, Software Engineer, Front End Developer
  • Where to learn this skill: JQuery Learning Center

#16 – Adobe Illustrator

  • Average pay: ₹3,95,190 INR
  • Average pay in Bengaluru: ₹7,12,912 INR
  • Top 10% pay in Bengaluru: ₹14,00,000 INR
  • Other must-have skills: Photoshop, Graphic Design, Sketch
  • Common job titles: Graphic designer, Product Designer, UI/UX Designer
  • Where to learn this skill: Tutorials from Adobe

#17 – React.js

  • Average pay: ₹8,03,557 INR
  • Average pay in Bengaluru: ₹9,94,444 INR
  • Top 10% pay in Bengaluru: ₹24,00,000 INR
  • Other must-have skills: Javascript, Node.js, HTML5, Angular.js
  • Common job titles:  Full Stack Developer, Frontend Developer, Software Engineer.
  • Where to learn this skill: A highly rated react.js course on Udemy.

#18 – MongoDB

  • Average pay: ₹7,92,561 INR
  • Average pay in Bengaluru: ₹9,57,228 INR
  • Top 10% pay in Bengaluru: ₹24,00,000 INR
  • Other must-have skills: Node.js, Angular.js, MySQL, AWS
  • Common job titles:  Software Engineer, Backend Engineer Engineer, Full Stack Engineer
  • Where to learn this skill: MongoDB University

#19 – iOS Development

  • Average pay: ₹7,20,205 INR
  • Average pay in Bengaluru: ₹1,78,5608 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Android, Swift, Objective C, C
  • Common job titles:  iOS Developer, iOS Engineer, Software Engineer
  • Where to learn this skill: Free Udacity course on iOS Development.

#20 – UI/UX Design

#21 – Django

  • Average pay: ₹7,08,395 INR
  • Average pay in Bengaluru: ₹9,90,549 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: Python
  • Common job titles: Software engineer, backend engineer, full stack developer, python developer
  • Where to learn this skill: Getting started with Django

#22 – Machine learning/artificial intelligence

  • Average pay: ₹10,00,724 INR
  • Average pay in Bengaluru: ₹12,90,819 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: Python, Data Science, R
  • Common job titles:  Data Scientist, Machine Learning Engineer
  • Where to learn this skill: A beginners guide to AI/ML.

#23 – SEO/SEM

  • Average pay: ₹6,02,923 INR
  • Average pay in Bengaluru: ₹13,26,952 INR
  • Top 10% pay in Bengaluru: ₹15,00,000 INR
  • Other must-have skills: Social media marketing, digital marketing, content marketing, social media, Facebook advertising.
  • Common job titles: Marketing manager, Digital Marketing Manager
  • Where to learn this skill: Lots of courses on SEO/SEM at Coursera and Udemy

#24 – Graphic Design

  • Average pay: ₹4,13,981
  • Average pay in Bengaluru: ₹8,01,472
  • Top 10% pay in Bengaluru: ₹12,00,000
  • Other must-have skills: Photoshop, Illustrator
  • Common job titles: Graphic designer, Visual designer, UI/UX Designer
  • Where to learn this skill: Format magazine has a list of free graphic design resources.

#25 – Restful services

  • Average pay: ₹6,53,306 INR
  • Average pay in Bengaluru: ₹12,12,926 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills:…
  • Common job titles: Software engineer, Backend Developer, Full Stack Developer, Android Developer.
  • Where to learn this skill: Build a restful service. 

#26 – Git

  • Average pay: ₹6,00,780 INR
  • Average pay in Bengaluru: ₹12,23,537 INR
  • Top 10% pay in Bengaluru: ₹17,00,000 INR
  • Other must-have skills: Social media marketing, Facebook advertising, SEO/SEM
  • Common job titles: Marketing manager, digital marketing manager
  • Where to learn this skill: Learn git.

#27 – AWS

  • Average pay: ₹8,90,736 INR
  • Average pay in Bengaluru: ₹10,43,135 INR
  • Top 10% pay in Bengaluru: ₹25,00,000 INR
  • Other must-have skills: Python, Node.js, SQL, Linux, Docker
  • Common job titles: DevOps Engineer, Software Engineer
  • Where to learn this skill: Tutorial.

#28 – Ruby on rails

  • Average pay: ₹9,04,340 INR
  • Average pay in Bengaluru: ₹11,72,269 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: Javascript, SQL, MySQL, Postgresql
  • Common job titles:  Full Stack Developer, Software Engineer, Ruby on Rails Developer.
  • Where to learn this skill: Free Ruby on Rails tutorial.

#29 – C++

  • Average pay: ₹6,03,105 INR
  • Average pay in Bengaluru: ₹10,28,021 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: C
  • Common job titles:  Software Engineer, Senior Software Engineer
  • Where to learn this skill: Google for Education ++ Course.

#30 – Swift

  • Average pay: ₹7,17,202 INR
  • Average pay in Bengaluru: ₹9,00,520 INR
  • Top 10% pay in Bengaluru: ₹25,00,000 INR
  • Other must-have skills: iOS development, Objective C, C
  • Common job titles: iOS Developer, iOS Engineer,
  • Where to learn this skill: Download the swift ebook.

#31 – Customer service

  • Average pay: ₹2,96,569 INR
  • Average pay in Bengaluru: ₹3,58,271 INR
  • Top 10% pay in Bengaluru: ₹10,00,000 INR
  • Other must-have skills: Relationship management
  • Common job titles:  Customer support executive, customer success manager
  • Where to learn this skill: Free online customer service courses.

#32 – WordPress

  • Average pay: ₹3,01,301 INR
  • Average pay in Bengaluru: ₹5,06,385 INR
  • Top 10% pay in Bengaluru: ₹13,00,000 INR
  • Other must-have skills: Web developer, PHP Developer, WordPress Developer
  • Common job titles: Marketing director, Digital Marketing Manager
  • Where to learn this skill: learn.wordpress.com

#33 – Linux

  • Average pay: ₹7,21,078
  • Average pay in Bengaluru: ₹9,27,770 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: Linux, Python, AWS, Docker
  • Common job titles: DevOps Engineer, Software Engineer, Site Reliability Engineer.
  • Where to learn this skill: Free Linux Courses

#34 – Data analysis

  • Average pay: ₹7,21,932 INR
  • Average pay in Bengaluru: ₹7,49,230 INR
  • Top 10% pay in Bengaluru: ₹20,00,000 INR
  • Other must-have skills: Excel, R, Python
  • Common job titles:  Data scientist, product manager, business analyst, data analyst
  • Where to learn this skill: Free online data analysis curriculum.

#35 – Product management

  • Average pay: ₹12,02,527 INR
  • Average pay in Bengaluru: ₹14,19,094 INR
  • Top 10% pay in Bengaluru: ₹30,00,000 INR
  • Other must-have skills: Leadership, product development.
  • Common job titles: Product manager
  • Where to learn this skill: Lots of courses on product management. Carnegie Mellon also has a degree.

#36 – R

  • Average pay: ₹9,93,660  INR
  • Average pay in Bengaluru: ₹18,27,404  INR
  • Top 10% pay in Bengaluru: ₹30,00,000  INR
  • Other must-have skills: Python, Machine Learning
  • Common job titles: Data Scientist
  • Where to learn this skill: List of free R tutorials also try this comprehensive course for R and Python.

#37 – Project Management

  • Average pay: ₹7,71,636 INR
  • Average pay in Bengaluru: ₹9,34,728 INR
  • Top 10% pay in Bengaluru: ₹27,00,000 INR
  • Other must-have skills: Leadership, Agile
  • Common job titles:  Product manager, Project manager
  • Where to learn this skill: Free edX project management course.

#37 – Business operations

  • Average pay: ₹7,32,297 INR
  • Average pay in Bengaluru: ₹8,99,961 INR
  • Top 10% pay in Bengaluru: ₹15,00,000 INR
  • Other must-have skills:
  • Common job titles: Operations Manager, Business Development Executive, Business Development Associate.
  • Where to learn this skill: Here’s a Coursera course in Operations Management.

#38 – Business strategy

  • Average pay: ₹4,84,705 INR
  • Average pay in Bengaluru: ₹6,41,516 INR
  • Top 10% pay in Bengaluru: ₹16,00,000 INR
  • Other must-have skills: Business development
  • Common job titles: Business development manager
  • Where to learn this skill: Business strategy courses on Coursera.

#39 – Growth hacking

  • Average pay: ₹6,33,359 INR
  • Average pay in Bengaluru: ₹5,97,012 INR
  • Top 10% pay in Bengaluru: ₹15,00,000 INR
  • Other must-have skills: Social media marketing, SEO/SEM, Facebook advertising, Email marketing, Digital Marketing.
  • Common job titles: Growth Hacker, Head of Growth, Marketing Manager, Head of Marketing
  • Where to learn this skill:  Coursera has a specialization in social media marketing.

#40 – Sketch

  • Average pay: ₹5,71,719 INR
  • Average pay in Bengaluru: ₹6,00,821 INR
  • Top 10% pay in Bengaluru: ₹15,00,000 INR
  • Other must-have skills: UI/UX Design, Photoshop, Illustrator
  • Common job titles: Product designer, UI/UX Designer, Designer
  • Where to learn this skill: Check out this intro to Sketch post on Medium.

#41 – C

  • Average pay: ₹5,96,184.5 INR
  • Average pay in Bengaluru: ₹8,98,375 INR
  • Top 10% pay in Bengaluru: ₹25,00,000 INR
  • Other must-have skills: iOS Development, C++, Swift, Objective C
  • Common job titles: iOS Developer, Software Engineer
  • Where to learn this skill: Wikibooks about C programming.

 

44 tools and resources for social scientists

Over the years I’ve gotten great tips from colleagues and students about the tools that have helped them become more productive researchers.  Below is a list of the 44 tools and resources that have changed how I do research.

Get teched up

Excellent technical skills are the bedrock of a successful research career. Today, publishing requires both the understanding of theory and the ability to tease out meaningful insights from complex data sets. Even before you start your Ph.D., tool up. Here are four resources to get you started

  • R: is perhaps the only statistical programming language you really need to know. It is free, comprehensive (you can do visualization, machine learning, traditional econometrics, or write your own custom algorithms.) Even if you don’t use R that often, it is one language that all social scientists must learn.
  • R Studio: will make it easier to use R, and for some use cases, R studio is substantially faster. For academics, R studio is free so its worth a shot.
  • Machine Learning A to Z: Hands-on Python and R In Data Science: OK, you so you don’t know R and can’t figure out where to start. Start here. This is a comprehensive online course to get you up-to-date with some of the major functionality in R and Python (the other language of data science).
  • Stata SE or MP: If you are an economist, or generally estimate Y = B0 + B1X type equations with worries about clustering standard errors or endogeneity. R may sometimes feel like overkill. Stata is my go-to software for most of my analysis. I learned Stata by working with a collaborator. With great Stata code, you can go from your raw data to publication quality tables with a press of the “do” button.
  • The complete web developer bootcamp: So you want to do online field experiments, but you can’t build a website? Fret no longer. I recently completed this Udemy course called “The Complete Web Developer Bootcamp” and was up and running with an excellent quality web application in just two weeks. If I can learn it, you can too. In this class, you’ll learn about some cool environments like c9, mLab, and Heroku that can get you started on a slick web application with little setup time.

Communicating more effectively

Academics have two products. They write and they present. By honing these two skills, you can become a star. Polish your writing and presentations skills with these resources. I’m always on the lookout for great resources to help me improve my writing.

  • The Art of Styling Sentences is the book I recommend to all my Ph.D. students. Like most skills, you can improve your writing dramatically by following a few simple rules. Check out this book if you want to have prettier sentences.
  • Ninja writing:  The Four Levels of Writing Mastery: Mark Twain once said that the difference between the right word and nearly the right word is the same as the difference between lightning and the lightning bug. Shani Raja’s Ninja Writing and Writing With Flair have given me some handy tools for editing my academic writing.
  • Writing With Flair: How To Become An Exceptional Writer: This is a superb class, and worth every penny.
  • Hire a copy editor on Upwork: When I got my first “conditional accept” at a top journal, the editors asked me to get my paper professionally copy-edited. I was offended. Today, I almost always send my paper to a copy editor before I submit it to a journal, and always before I send the final version in for publication. I’ve used several copy editors throughout the years, but if you are looking to experiment with finding a copy editor, try Upwork.
  • How to make a great presentation and TED’s secret to great public speaking: When I was a graduate student at Carnegie Mellon, I was fortunate enough to take a class on public speaking taught by the late Pamela Lewis. Her insights on how to create a powerpoint presentation, how to present ideas, and how to make your ideas stick have been invaluable. Today, you can find excellent resources on public speaking online. The founders of the TED conference, where slick presentations abound, have great resources to help you improve your presentation skills.

Develop a writing workflow

Building a process for clear and well-produced writing is paramount for success. Here are a few tools and resources that can help you improve the quality of your written work.

  • Latex: Robert Hall, the Stanford economist in an article about becoming a professional economist said: “Pay close attention to the appearance and dissemination of your work. I hold the following controversial view that my economist wife thinks betrays a lack of spiritual development: There is a separating equilibrium between researchers who put out nicely typeset papers in Latex and those who struggle with the infirmities of Microsoft Word.”  Learn Latex, your readers will appreciate it.
  • Overleaf: Once you learn Latex, start using Overleaf. It is like GoogleDocs for Latex and helps you write beautiful latex manuscripts in a collaborative environment. Version 2 is even nicer with the ability to add comments.
  • Grammarly: I am a fast typer and sometimes I forget to include words in my writing. Grammarly is a bit pricey, but I use it all the time so it has been worth it for me.Screen Shot 2018-07-10 at 10.12.26 AM.png
  • Ulysses: I started using Ulysses a few years ago to organize my writing. Ulysses is the app I use when I want to start on a project or work on a revision. Its a great tool for breaking up a long project into manageable chunks.
  • esttab: I can’t believe I used to create regression tables manually. If you are still creating tables by cutting and pasting your regression output into MS Word tables, please join us in the 21st century. Esttab will help you make beautiful Latex tables and the best part is, you can link these directly to your latex files using the \include command so every time you update your regression your manuscript updates automatically too!
  • Grammark.org: Another automated (but free) grammar tool. It is worth a try and it is especially useful for writers who struggle with wordiness.
  • Endnote: Some people still use endnote to organize their bibliographies. I don’t.
  • Mendeley: Mendeley is a free bibliography tool. The best part of Mendeley is its integration with Overleaf.
  • GoogleScholar BibTex export: I write in Latex, and Google Scholar has a BibTex output feature that lets you cut-and-paste a BibTeX bib from GoogleScholar. Beware though, sometimes things missing or journal titles are awkwardly capitalized. But if you develop a good process, you only need to fix the errors once.

Streamline processes

Improve the processes around your work. Get the technology that reduces duplication, idle time, and inefficient

  • Dropbox: 8 years ago, I used to email my in-progress manuscripts to myself at the end of the day so I could work with them on my home computer. Today, Dropbox has been the singular technology that has reduced the amount of digital shipping waste that I create. I usually keep one copy of a file—whether it is my data, notes, or code. I can work on these assets from anywhere. How amazing.
  • Google Docs: Writing a collaborative proposal? Responding to a reviewer letter? GoogleDocs has helped speed up these collaborative tasks.
  • Sublime: Need a simple text editor with lots of power? Sublime is amazing and I find myself using it every day for the little bits of text work that I have to do.
  • Master your email and calendar: A few years ago I realized that I sometimes checked 4 email addresses a day: an old Yahoo address, my work email, my new Gmail account, and a Gmail address for subscriptions or junk mail. Now, I’ve got one email address. I probably got 30 minutes of time back.

Learn to delegate

The best professors know how to break up their work into modular chunks and delegate it to others. This frees up time to do more important things. Sure, it might be instructive the first few times to clean your data, to do a preliminary search of the literature, or develop a website for your research project. But its worth learning how to delegate these tasks so you can put your energies to more creative uses.

  • Hire someone: Learn to break up your work into modular pieces and delegate the stuff that is probably not worth your time. Delegating is perhaps the master skill of the productive academic. If you want to learn how to delegate better work with someone who does this well. Start by hiring an undergraduate research assistant and get them to work on a small project. Remember to manage the work effectively, ask yourself these four questions.
  • Hire someone on Upwork: You can hire people to do almost task on Upwork. I’ve gotten people to copy edit my articles, build a citation database on a topic I wanted to learn about and to scrape data from a website. Start with small projects and build your delegation skills on a platform like this.

Get data and develop a process around it

Every great recipe needs great ingredients. Data gathering is a first-order skill that every social scientist should master.

  • ICPSR: An easy way to get data is to download that someone has already spent the time, effort and money gathering and cleaning. ICPSR is a great starting point on your data gathering journey.
  • Compustat: If you study publicly traded firms, you should learn how to use Compustat.
  • Qualtrics: Learn how to run a survey on Qualtrics. You can launch a survey in just a few minutes and start collecting responses using a platform such as MechanicalTurk.
  • MechanicalTurk: You need a quick and cheap subject pool? Try MechanicalTurk. There are some good tutorials online. 
  • SurveyMonkey
  • Google Customer Survey: Google also has a survey service that you can use to ask questions from a nationally representative sample of Americans.
  • Learn to scrape a website: Learn how to scrape
  • Talk to people: The best data is often not easily available online. Talk to people in your field or in the real world at companies. You might find a gem that turns into a great research paper.

Create a comfortable workspace

A laptop is all you really need to be a great social scientist. But a good workplace can definitely improve your productivity.

  • MacBook Pro: I stopped using PCs about a decade ago and my go-to computer is a MacBook Pro. The MacbookAir is a good entry point for a social scientist looking for a computer that can handle most of the software you need to do statistical analysis and academic writing.
  • RAM is a barrier: If you can afford it, go for a computer with a great CPU and lots of RAM. But it is worthwhile buying more RAM for your computer as your data sets increase in size.
  • Good monitor: Monitors are cheap. You can find large and high-quality monitors anywhere really.
  • Get another monitor.
  • Keyboard: Get a comfortable keyboard. I use the Microsoft Sculpt Ergonomic Wireless keyboard at work.

Keep learning

Here are two hacks to help you get up-to-date on the academic literature and also find interesting ideas in the popular literature.

  • How to read an academic paper: You should be spending more time writing than reading. But having a good process for reading academic papers is key. Check out these tips from Science magazine about reading academic papers.  You could also have a computer read the papers to you by using software like NaturalReaders.
  • Audible: If you go to the gym, have a commute, or want to learn something new at the end of the day, start listening to audio books. A few years ago I got a subscription to Audible. I’ve been able to learn a ton of new things about topics I didn’t know much about on my drive home. The selection of audiobooks available today is remarkable and you will surely find books that relate to your existing research area or a new area you would like to explore.

Life

Finally, there is more to life than research. Here are some resources that may be useful for young social scientists beginning their career.

  • Personal finance: Be good with money. Stanford CS has a great class on personal finance called cs007. There are a lot of great personal finance blogs out there. My favorite is the Financial Samurai. I like it because it has lots of facts and figures and helps me benchmark where I should be at my career stage.
  • Experiment: Try little things and see where they take you.
  • Meditate: I recently listened to a great (and funny) audiobook on Buddhist meditation. It got me to try it out meditation and now I frequently use the headspace app.

4 questions every founder should ask themselves

A few years ago we asked 100 startup founders 4 questions about how they manage their companies:

How often do you…

“…develop shared goals in your team?”
“…measure employee performance using 360 reviews, interviews, or one-on-ones?”
“…provide your employees with direct feedback about their performance?”
“…set clear expectation around project outcomes and project scope?”

Founders could respond “never,” “yearly,” “monthly,” “weekly,” or “daily.”

The average founder answered monthly, but plenty of founders said they “never” developed shared goals or gave employees feedback.

Some founders, however, were active managers: they “checked in” with their employees least a few times a month.

Our data showed that active managers were more likely to run larger and more successful startups themselves, irrespective of their education or experience.

But this was just a correlation. So we asked ourselves, could getting advice about people management from active managers lead startups to perform better?

We conducted a randomized control trial where founders gave advice to one of their peers about how to grow their companies. For two years, we tracked their performance.

We found some surprising results.

Founders who received advice from active managers (e.g., those who instituted regular meetings, set goals consistently, and provided frequent feedback) had vastly better outcomes over the next two years: they were 28% larger and 10% less likely to fail than those who got advice from passive founders.

Screen Shot 2018-07-01 at 7.24.26 AM

Advice from active managers was especially useful for founders who lacked formal management training (e.g., an MBA) or were not part of an accelerator already.

Bottom line: If you are looking to grow your startup, get advice from people who actively lead their teams.

If you are interested in the nitty gritty, check out our full article here:

[https://papers.ssrn.com/sol3/papers.cfm?abstract\_id=2964249]

Where do networks come from?

The key assumption underlying both the peer effects and structural approaches to network effects assume some degree of exogeneity in the existence and structure of network ties.

Exogeneity is both a theoretical claim as well as an empirical assumption. All reasonable theories are built on a set of axioms that assume some primitive or exogenous features of the world or of the target system which is being analyzed.  Many models in economics, for instance, assume that preferences are exogenous. From these preferences, we are then able to then derive things like behavior, choice, “roles” as well as the structure of social relationships.

Screen Shot 2017-05-10 at 10.31.25 AM.png

Similarly, some sociological and anthropological traditions start with axioms that assume that “roles” are exogenous. These roles—e.g., the position a individual occupies in a social structure—govern behavior, preferences, as well as social relationships.

Screen Shot 2017-05-10 at 10.31.32 AM

Much of the network analysis we’ve been conducting or discussing thus far also has an exogeneity assumption built in. The primitives are social relationships and their structure. All other things we observe such as behavior, preferences and roles emerge from the pattern of exogenous network ties. In the lectures on structural holes, status and peer effects, we argue that the pattern of social relationships cause in differences in behavior, preferences, as well as roles and not vice versa.

Screen Shot 2017-05-10 at 10.31.38 AM

The challenge of network formation

However, a challenge for the social relationships first perspective is that networks are unlikely to be fully “exogenous.” They form and evolve through certain processes that make some people more likely to connect to each other, and make some people less likely to do so.

Network scholars have spent considerable time on trying to understand how networks form and change. At a broad conceptual level, we can think about five factors that shape whether a tie between two individuals—e.g., ego and alter—forms.

Screen Shot 2017-05-10 at 11.08.33 AM.png

The logic behind most models of network formation is simple. At one end, there are “benefits” whether actual or perceived as well as pecuniary and non-pecuniary/psychic  for connecting with someone. At the other end, there are “costs” which make it either easier or harder to form a relationship with someone, either because searching for them, coordinating with them, or potentially dealing with them is more costly than with someone else. Relatedly, some individuals may have a lower cost of building a network than others and/or it may be lower cost (relative to benefit) to connect with someone.

Factor 1: Characteristics of Ego, the sender.

Characteristics encapsulated in “Factor 1” include a range of factors that make it easier for certain types of people (e.g., those who have a certain characteristics themselves) to connect with many others. This characteristic may include things that either make it easier for these people (relative to others) to make many connections or perhaps provide them greater benefit from doing so. Research in this stream has found a substantial range of characteristics that vary at the individual level, that also predict an increased or decreased propensity to have a certain type of network surrounding them. These things include:

  • Personality: Some work has found that differences in personality traits are correlated with network structure. For instance, individuals who have many ties are also likely to have Extroverted personalities. Relatedly, those who are high in “self monitoring” also have a greater likelihood of being “brokers” or occupying “structural holes” in a social network.
  • Other factors that may also be related to larger networks include:
    • Strategic intent
    • Intelligence
    • Physical characteristics (e.g., beauty or height)
    • Age
  • Some factors may be describe an individual at a certain point in time:
    • After the loss of a job
    • After being promoted to a new role
  • Other factors may be socially constructed, but describing the Ego in a given context:
    • Caste
    • Religion

One can reason about the various ways in which these characteristics of Ego either lower their costs of making ties or increase the benefit they get. Can you come up with other individual-level factors that might matter?

Factor 2: Characteristics of Alter, the receiver.

A related set of arguments can be made about the characteristics of an alter or alters. For instance, one could theorize about the following characteristics of alter(s) that may make them more likely to receive connections from others.

  • Personality
  • Intelligence
  • Skill
  • Wealth
  • Social standing
  • Formal role in the organization

Like the Ego-centric perspective, one could logically use a “cost” and “benefit” perspective for reasoning about why some Alter may have more advice seekers (e.g., they are smart) or more friends (e.g., they are helpful). In purely altercentric models, we ignore the characteristics of Ego.

Factor 3: The interaction of Ego/Alter characteristics (e.g., homophily)

The 3rd Factor is one related to the “Ego-Alter” interaction. In such models, there is something about the characteristics of Ego and Alter together that predict an increased or decreased propensity to have network ties. The most common theme in these models is homophily or the tendency for individuals who are similar to each other to have a higher propensity to connect. Research has found that individuals who are similar in the following characteristics are more likely to connect with each other, relative to the alternatives:

  • Race and ethnicity
  • Gender
  • Age
  • Formal organizational position
  • Occupation
  • Religion

There are many theories about why such a preference exists. On one hand, social contexts (e.g., communities, neighborhoods, etc.) are often organized by these characteristics. This makes it much easier to connect with people who are similar to you. There is also an element of choice. Individuals who are similar to you are likely have similar experiences, share similar values, and like and dislike similar things. As a consequence, the costs of interacting with similar people is likely to be less than interacting with people who are different.

However, the type of relation may matter here. In mating networks you are more likely to see heterophily than homophily. This might also be true of mentoring relationships, where individuals are more likely to be mentored by those of a different level of senority than them.

What other factors at this level might increase or decrease the cost of interaction or raise its benefits?

Factor 4: Social and Physical Context

The fourth factor can broadly be thought of as the social or physical context within which individuals are forming social networks. A simple example is office or neighborhood layout. A substantial amount of research has found that physical distance has a substantial effect on whether two individuals form ties. Scientists who are nearby, for instance, are more likely to collaborate and their research trajectories also become rather similar.

Research has found that there is a exponential relationship between physical distance and the propensity to connect. This effect is called propinquity. Individuals who are physically proximate are substantially more likely to interact, followed by steep declines in the rates of interaction as distance increases.

In addition to propinquity, other aspects of the social context are also likely to affect the extent of tie formation. These factors could be the reorganization of roles, task inter-dependencies, as well as cultural or organizational norms regarding competition or collaboration. Incentives are also important in determining what the shape of the network might be. The challenge with many of these effects are that they are often “absorbed” into the intercept of the model. That is, they are only able to be detected when looking across contexts, but not within context.

Factor 5: Endogenous Network Processes

 

Finally, the structure of one part of the network may affect the structure of another. Consider a simple example: Reciprocity. If I consider you a friend. There is a social-psychological as well as a sociological process that also increases the likelihood that I consider you a friend. This is akin to tit-for-tat. If you give me a gift, I will give you one in return. Networks exhibit this property with substantial regularity (but not always!). In this context, the emergence of a network tie, the reciprocal one, is endogenous to the network. That is, it emerges from within the network structure and not outside of it.

Similarly, there are other endogenous network processes that others have detected in networks. These include factors such as transitivity. For instance, a friend of a friend is often a friend. Heiderian balance theory, for example, argues that individuals desire balance in their relationships. The situation of being friend’s with your friend’s enemy is unsustainable according to balance theory (why?). Because it is, that structure will endogenously change into something else–either the enemies become friends or  the network splits.

Other forces include preferential attachment. New entrants into a network are proportionally more likely to connect to individuals based on the size of their degree centrality. This process gives some networks a power law distribution, rather than a binomial/normal distribution that would be expected if the network was formed through a purely random process.

 

Image result for power law distribution

Power law distribution

 

 

Image result for normal distribution

Normal Distribution

 

 

Empirical considerations

Though the theoretical ideas behind network formation are quite straightforward, disentangling the differential impact of these effects remains quite challenging. In a subsequent post, we will discuss the various approaches to estimating these models.

 

 

Seeing the networks in your company

Thus far we have assumed that we had network data. But data like the “Professionals” was gathered using a survey in a real organization. In this post I will walk you through the process of creating a simple network survey in SurveyMonkey (a web based survey application) and analyzing the responses from the survey using R. Lets begin by going to www.surveymonkey.com.  Here is the landing page (as of May 5, 2017). You will need to purchase a basic subscription to download the data (I purchased an educator subscription for $18).

Screen Shot 2017-05-05 at 8.31.33 AM.png

I’ve signed up for a free account (for now). After I complete all my signup information. Here is the screen that I get, asking me to start by creating a survey.

Screen Shot 2017-05-05 at 8.35.15 AM

I will call my survey, “Simple Network Survey.” I enter this into the text box, and then press + Add Questions. Pressing this takes me to a new screen.

 

Screen Shot 2017-05-05 at 8.37.27 AM

In order to create the appropriate network data (where we know who considers whom a friend, advice giver, etc.), we will need to begin by asking people who they are. I prefer to do this first using a dropdown menu where an individual can select just one option. The question I ask is: What is your name? Please select from the dropdown menu.  Make sure that the question type is “Dropdown”

Screen Shot 2017-05-05 at 8.39.27 AM.png

Once I have this, I would like to enter the names of the people who will be taking the survey. My list (of fake people) include: Alice, Bob, Chris, Dina, Elena, Frank, and Greg. I add these using the “Add Answers in Bulk” option:

Screen Shot 2017-05-05 at 8.42.24 AM.png

Once I click save, I move to the Options tab, and I check off “Require an Answer to This Question.” Next I click DONE. 

I now create a new page (+ New Page). This is where I will place the network survey.

Screen Shot 2017-05-05 at 8.44.24 AM.png

For the purposes of this example, I will only ask two questions about people’s networks. What questions shall we ask?

Perhaps one of the things that hardest to teach about network analysis is determining the right types of questions to ask people. The questions should reveal something people and their social networks that we might not have been able to assess if we hadn’t asked them those questions.

We can think about kinds of questions in terms of a 2×2 — on one dimension we have questions about networks that provide people with resources (Instrumental) and on the other, we have questions about more personal/social relationships (e.g., Expresssive).  On the other dimension we have questions that are either “Enduring or qualitative” or “Event based.” The table below summarizes some examples.

Enduring/Qualitative Event Based
Instrumental Advice

Task

Information

Asked for advice in the past week.
Expressive Friendship

Trust

Social support

Informally go to Lunch

Talked about important personal matters

Here are some examples:

Questions about who you know:

Below is a list of names of your colleagues at [firm name]. Some of them you may (1) know well, others you (2) may be acquainted with, and still others (3) you may not know at all. Please check the box next to the names of those individuals who are in categories (1) or (2).

Advice (Work-related)

Sometimes it is useful to get help or advice from your colleagues on performing some aspect of doing your work well. Please check the box next to the names of those individuals who you would approach for help or advice on such work related issues.

Advice (Work related) Reciprocal

There also may be people who come to you seeking help or advice about doing their own work well. Please check the box next to the names of those individuals who might typically come to you for help or advice on work related issues.

Advice (Career and Success)

Sometimes it is useful to seek advice from colleagues at work about more than just how to do your work well. For example, you may be interested in “how things work” around here, or how to optimize your chances for a successful career here. If you needed help along these lines, who would you go to for help or advice regarding these issues?  Please check the box next to the names of those individuals who you would approach for help or advice on these non-technical related issues.

Advice (Career and Success) Reciprocal

There also may be people who come to you seeking help or advice about such non-task related issues. Please check the box next to the names of those individuals who might typically come to you for help or advice along these dimensions.

Friendship

Sometimes during the course of interactions at the workplace, friendships form. We are interested in whether you have people at [firm name] who you consider to be friends of yours. Please check the box next to the names of the individuals who you think of as friends here at [firm name].

Event based questions:

Lunch

Below you will find a list of people who work at [firm name]. Please check the names of the individuals with whom you have met with for lunch at least once during the past 30 days.

Event based advice

Below you will find a list of people who work at [firm name]. Please check the names of the individuals from whom you’ve sought out advice about work related matters at least once during the past 30 days.

The problem of recall: People are highly inaccurate when you ask them to recall specific interaction events. They are much more accurate when you ask them to recall enduring and qualitatively meaningful relationships.  Events are highly informative when you know what happens during that event, but otherwise they are harder to generalize from.

Now that we have some examples of questions, lets add one two the survey. I typically recommend having 2 questions, one expressive (e.g., friendship) and one instrumental (e.g., advice). They usually provide different information.

Lets, for the sake of example, add an advice network question to Page 2. We will create a “Multiple Choice” question where the answers are the names of the people in the organization (e.g., Alice, etc.). The question we ask is:

Sometimes it is useful to get help or advice from your colleagues on performing some aspect of doing your work well. Please check the box next to the names of those individuals who you would approach for help or advice on such work related issues.

We will also add a short note telling people not to select their own name and to check as few or as many names as appropriate. Below the options, also check “Allow more than one answer to this question (use checkboxes).

Screen Shot 2017-05-05 at 9.00.21 AM

Let us now save this question by clicking save.

I will now add one more question, this can be our “Dependent variable” which measures the extent to which co-workers have a positive or negative impact.

Screen Shot 2017-05-05 at 9.55.18 AM.png

After all the questions are in, click “Next” at the top and lets begin collecting responses.

Screenshot 2017-05-05 10.39.16.png

We will use the “Get Web Link” option. The web link for the survey I made is:

https://www.surveymonkey.com/r/QZ5KG3S

Lets quickly fill out the survey. I will also fill in responses for everyone in the roster.

Screenshot 2017-05-05 10.42.01.png

After all the responses are in for all the people in the organization (e.g., Alice…) we can download the data. I have downloaded the excel file. It comes as a zip file and a resulting csv file with the data. These are respectively attached here and here.

The raw CSV file that is exported from Survey Monkey looks like this:

Screenshot 2017-05-05 19.26.35.png

Lets clean this up so that we get a 7×7 matrix. Note that there is an ordered list of names on the left (Alice…Greg on the rows) and a similarly ordered list of names at the top (columns). The rows are the respondents (senders) and the columns are the people with whom they do and do not have a relationship. With the names, the matrix looks like:

Screenshot 2017-05-05 19.30.34.png

Without the names, it looks like:

Screenshot 2017-05-05 19.37.20.png

Try to match it up to the survey response in our original file. The matrix is now saved as surveyexample.csv.

The following code imports the data (the cleaned up version above) and plots the network:

# This file provides some simple code to get you started on your Network Analysis Journey

library(data.table)

library(curl)

library(sna)

#(Q0) “who do you know or know of at [the firm]”,

#Load the “Survey Monkey” network data from Dropbox.

survey <- fread(https://www.dropbox.com/s/nd13m6szn8d8lto/surveyexample.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be

#analyzed using the sna package.

survey = as.matrix(survey)

# this creates the no

names = c(“Alice”, “Bob”, “Chris”,“Dina”,“Elena”,“Frank”, “Greg”)

# Rename all the rows

rownames(survey) = names

# Rename all the columns

colnames(survey) = names

# Plot the survey network

gplot(survey, label = names)

Here is the resulting network.

Screenshot 2017-05-05 20.44.55.png

 

We can calculate each person’s centrality and also correlate the network positions with the final question we asked. We need to first convert it into a numeric and then import it into R.

# This file provides some simple code to get you started on your Network Analysis Journey

library(data.table)
library(curl)
library(sna)

#(Q0) “who do you know or know of at [the firm]”,

#Load the “Survey Monkey” network data from Dropbox.
survey <- fread(‘https://www.dropbox.com/s/nd13m6szn8d8lto/surveyexample.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.
survey = as.matrix(survey)

# this creates the no
names = c(“Alice”, “Bob”, “Chris”,”Dina”,”Elena”,”Frank”, “Greg”)

# Rename all the rows
rownames(survey) = names

# Rename all the columns
colnames(survey) = names

# Plot the survey network
gplot(survey, label = names)

#Load the “Survey Monkey” network data from Dropbox.
surveyoutcome <- fread(‘https://www.dropbox.com/s/we2dvevfejte8ov/surveyoutcome.csv?dl=1&#8217;)

#Convert the data.table objects into matrix format so they can be
#analyzed using the sna package.
surveyoutcome = as.matrix(surveyoutcome)

# rename rownames and create a variable which is the integer
# version of the numeric response
colnames(surveyoutcome) = c(“name”,”response”,”respval”)
respval = as.integer(surveyoutcome[,3])

# Calculate outdegree for the survey response
survey.outdegree = degree(survey, cmode = “outdegree”)

# Estimate a model regressing the respval on the outdgree
m.0 = lm(respval ~ survey.outdegree)
summary(m.0)

Here is the regression outcome:

Screenshot 2017-05-05 21.01.51.png

 

The above walk-through should give you a way to collect network data, and then analyze it using R.

Before, I conclude I want to discuss the various survey approaches used by network analysts

Types of Network Surveys

Roster based surveys: Roster based methods are perhaps the most common approach. This is what we just completed above. With roster surveys, you provide the respondent with a list of names of people or organizations. Then you ask them to indicate (by checking off the boxes next to the names) which of these people they have a certain relationship with. The nice thing about roster based surveys is that they tend to be quite accurate because people don’t have to recall the names out of the blue. Further, the roster allows you get longer network lists than if people had to recall names from memory. The down-side of this is that if the organization has too many people (say in the 1000s) it would be too hard to make people go through a list of 1000 or even worse, 2000 people.

List based surveys: The other type of survey is a list survey. Here you ask the question and then request that your respondents list the names of people in the organization that they have this relationship with. What might be some concerns with a survey method like this? 

Ego-network surveys:  This is a slightly modified version of the list-based survey. Here you ask the people to list up to five people (or k people) that they have a certain relationship with. Then you ask them to indicate whether the people listed also have a relationship of a certain type with each other. 

Position generator surveys: This is perhaps the least structural of the network surveys. Here what you do is the following: You provide a list of the “positions” that people can potentially occupy – so in an organization you list the different functional areas, levels of seniority, etc.  And then ask people whether they have a no relationship with someone in such a position, acquaintance in that position, a friend in that position, etc.  This is a very indirect measure of networks, but it provides a broad understanding of the “range” of a persons network.

In addition to these classical approaches to collecting network data, organizations have more modern methods available to figure out potential sources of interaction between their employees. These include:

Email:  IT administrators know every email you send to everyone else and what it contains. This is true in most cases in the vast majority of organizations. Scary, yes. True, yes. But this is information that everyone knows exists and some organizations are using it to understand informal interaction and trying to make better decisions with this information.

Mailing list/Groups activity: Another source of information about networks and interaction are the mailing lists that people are a part of.

RFID:  Most of our ID cards have RFID these days – we use these cards to enter/exit buildings. RFID censors can also be placed in strategic locations to understand interactions that are face-to-face between people. Conference organizers are also using RFID tags to understand interaction among attendees.

Online data sources:

LinkedIn —  LinkedIn has a massive economic graph. Their data include where people got their degrees, where they worked, who they worked with, etc.

Facebook: This is the largest social network in the world. Period.

About firms:  The websites of Venture capital firms tell you who their partners, etc. are and where they attended college and when they graduated.  It also tells you that some may be investing in similar projects.

 

More: In a future post, I will walk through how to create “network” data using text in documents. The “ties” here are measures of similarity between the text descriptions of entities.

 

Peer effects, knowledge transfer and social influence

The structural approach to social networks is inherently beautiful as a representational approach. I am always in awe of the fact that we can learn so much about how human beings act or their outcomes based merely on the pattern of their social ties. The idea is both simple and profound.

The structural approach is built on assumptions regarding information transfer across a simpler unit of analysis: the dyad. In the world of dyads, new complications arise and different theories must be developed and tested.

Let us take the Professionals data we have been analyzing as an example. Here is the advice network among these professionals.

Screen Shot 2017-05-04 at 10.45.24 AM.png

In the prior analyses, we have focused on analyzing the structure of each node’s connections.  For example, each node has a specific number of incoming connections, its outdegree:

Screen Shot 2017-05-04 at 10.47.03 AM.png

The beauty of the structural approach to social networks is that we can learn a lot about the outcomes of individuals and organizations by merely looking at the pattern of their relationships. Recall our prior analysis. There is information in indegree. We were able to explain 6.5% of the variation in our measure of whether a person has the “knowledge to succeed” just by looking at the count of their incoming connections! While indegree may capture or reflect other processes and might not be causal, it is nevertheless information rich.

However, an Ego’s alters (e.g., the people that a focal node is connected to) are not all the same—as we sometimes implicitly assume in our models. As a note, I don’t believe that researchers actually believe that all the people we are connected to are the same. Indeed, betweenness, closeness, eigenvector centrality, all assume that not all connections are the same by their very construction. However, the heterogeneity in alter characteristics is implicit rather than explicit because we never specify in our theories or models, exactly how these individuals vary.

The peer effects framework on the other had often ignores variation in structure, but emphasizes variation in the characteristics of connections.

Below, I walk through some examples of this approach.

A simple model of peer effects

The “peer effects” framework is called as such because it is based on a line of research in the economics of education where scholars were attempting to understand the impact of classroom peers on academic outcomes. Hence, peer effects.

Let us start with a simple setup. Let us assume there are 100 students in a classroom. The teacher has decided that everyone in the class will have a study partner, so he asks each of the students to pair up into groups of two. There are now 50 pairs, each with two people. The teacher wonders, whether having a smart peer (i.e., alter) increases the performance of for a focal student (e.g. Ego). Visually, he is interested in understanding this influence process:

Screen Shot 2017-05-04 at 1.20.36 PM.png

At the end of the class, all of the students take a standardized exam. This exam is scored on a 100 point scale, and students can get anywhere from a score of 0 to 100. The teacher takes this score and runs the following regression with 100 observations, 1 for each student. She’s also good with standard errors, so she clusters standard errors at the level of the dyad:

score_{i} = \beta_{0} + \beta_{1} score_{j} + \epsilon 

After running the regression, she finds a large and statistically significant coefficient for \beta_{1}. How should she interpret it?

A naive causal interpretation is: for every unit increase in score_{j} there is a corresponding \beta_{1} increase in score_{i}. Or, by having a study partner with a certain score, there is a corresponding increase/decrease in the performance of the focal student. This interpretation is naive for a reason, because is probably (though not definitely) wrong.

But before we dive into why it is probably wrong, it is useful to reiterate that this “peer effects” representation is quite general. For example these outcomes might be determined in part by the influence of peers (however defined).

 

  • Finance: Putting money away into a retirement savings account, adopting a microfinance product, etc.
  • Health behaviors: Obesity, Happiness, use of HIV/AIDS test, etc.
  • Academic performance: Getting good grades, choosing a major.
  • Entrepreneurship: Becoming an entrepreneur; deciding against becoming an entrepreneur.
  • Careers: Quitting; moving to a new company.
  • Adoption of products: Prescribing a drug, buying a car.
  • Adoption of behaviors: Smoking, drinking, sexual events.
  • Adoption of ideas: Learning from patents.
  • Organizational behavior:  Adoption of corporate practices and policies.

The basic idea is simple: We observe some level or change in the behavior or characteristics of an alter (or alters) and we see whether these are correlated to the behaviors or outcomes of Ego.

 

This apparently simple process is much more nuanced and complicated than it appears. There are dozens of “mechanisms” that can lead to the correlation we might observe (or that the teacher observes. Here are some examples of a few reasons why we might observe a correlation, either positive or negative. Consider the case of product adoption.

 

 

  Name Definition
1 Direct transfer of specific information. Alter tells me about a product, but nothing more.
2 Persuasion Effects Alter tells me about the product, and forcefully persuades me to adopt it.
3 Direct transfer of general information. Alter tells me about a website that reviews products, and on this page a list is produced where the product that I adopt is listed first.
4 Role-modeling / Imitation I see Alter doing something, I copy it.
5 Install Base Effects  I see many Alters adopting a product (i.e. buying an iPad, I adopt the iPad)
6 Threshold Effects I only buy an iPad if at least 10 people I know own it, once the 10th person adopts, I decide to adopt.
7 Snob effects I see an Alter(s) doing something, I avoid doing it myself.
8 Simultaneous Alter helps me out and I help her out, and together we perform better than either one would alone, because we, by talking through a problem for example, figure it out together.
9 Reverse causality The Alter does not affect Ego; but rather the Ego affects the Alter.
10 Contextual Effects We are both in the same neighborhood, and because we get exposed to the same billboard, we see the same advertisement for a project, and thus we adopt it.
11 Induced Environmental Effects Having a high achieving peer results in a teacher who teaches at a higher level, thus the student learns more not because of greater transfer of information from her peer, but because teaching quality improves.
12 Selection bias I become friends with people who already own iPads. I become friends with people who like technology, and because they like technology, they also own iPads.
13 Homophily Effects I like iPads and because I do, I become friends with iPads.

Can you think of more mechanisms?

 

Which mechanism is actually at play in a specific context?

This question is a hard one. Because we have several potential mechanisms that we must work with, how do we rule out some of them? Some mechanisms are easier to rule out then others, but most are actually quite difficult to conclusively confirm or deny.

To deal with this issue (which is VERY common during the review process) I have come up with a two part classification. The first set of mechanisms are what I call “pseudo-mechanisms.” Pseudo-mechanisms are alternative explanations of the correlation that have nothing to do with social influence of the type we care about: influence flowing from the peer to the focal individual. Charles Manski, in a famous paper has defined these as the reflection problem and the selection problem. 

Reflection problem: The reflection problem asks you to imagine a mirror. You see two objections moving. And if it is unclear to you that you are looking at a mirror, then you can’t tell which one is the actual person who is moving and which one is the mirror image. More formally, imagine that we have two sets of variables, let us call them  x and y; let x be the measurement of the characteristics of individual ’s peers’ characteristics at time t and let y be the measurement of the focal individual ’s characteristics at time t. Now, because of the simultaneous measurement, we are unable to tell whether the change in x’s characteristics has caused a change in y’s characteristic, or vice versa. And this indeterminacy exists for each observation.

Furthermore, we are unable to tell whether each of these actors was exposed to some environmental shock (advertising, etc. at the same time, which make their adoption correlated). The only way that we can insure that the reflection problem is not an issue is by measuring the traits and characteristics of the xs prior to measuring those of y.

However, solving the doing this does not resolve the issue of causality. Thus, it is a necessary, but insufficient condition.

Another important, and much more difficult condition now has to be met in order for the effect to have the title “Causal.”  This is the selection problem. The set of conditions that solves the selection problem are twofold:

  1. Either you know all the reasons why two people were paired together (i.e. why person y is friends with, shares a room with, enters the college as, with x).
  2. OR the two individuals are randomly assigned, and thus breaking the correlation between the characteristics of x and y.

Assume for a moment that we have ruled out reflection and selection effects by (1) using a lagged measure of peer consumption or action, and (2) the ego and alter are randomly paired, we have only ruled out a handful of possible “mechanisms” producing the peer effects. We can rule out the “pseudo-mechanisms” #8 – #13 (except for #11), but that leaves us with 8 possible mechanisms.

Imagine a doctor telling you that “Yes, we’ve ruled out the fact that you are faking your symptoms, but there are 8 or more possible viruses that could be causing your infection!”

So, we need to now try and distinguish between these.

This is hard, even harder than resolving the reflection and selection problems.  The reflection and selection problems are interesting in that they are hard problems to solve, but we know how to solve them. Not to make too many medical analogies, but this like separating conjoined twins. Hard, but someone can do it and has done it.

So how do we distinguish between different mechanisms, say #1 – #7?

This will depend a lot on context, and a lot on the data that you have available.

Let us examine a very simple situation where we have two students. Let us call the first student “Ego” and let us call the second student “Alter.” Assume for a moment that we have completely alleviated the problems of reflection and selection.

 

Screen Shot 2017-05-04 at 2.31.58 PM.png

Let us say that really there are two contender mechanisms.  (This is probably not true; but, for a moment assume that it is true.)

Mechanism 1: A student learns general study habits from his/her peer (alter) and this why his performance increases.

Mechanism 2: A student interacts a lot with his/her peer (alter) and they study together, and the peer helps the student learn the material.

How would we go about designing a test that would distinguish between these two mechanisms?

  1. For instance, if what the student is getting from her peer is increased motivation, that should have a positive effect on various subjects.
  2. On the other hand, if the student is learning something rather specific (like how to do an integral), then the effects should be subject specific.

Assume you do this test, and you find out that there are effects across subjects, what can you say about the mechanisms? Can you say anything?

How to conduct the estimation in R

Standard peer effects estimations are quite straightforward. This is especially true when you have randomization in the pairing of focal individuals to peers and longitudinal data so you can lag the characteristics of the peer.

score_{i,t+1} = \beta_{0} + \beta_{1} score_{j,t} + \epsilon 

Here is a synthetic peer effects dataset in which 2000 individuals have been randomly paired: peer_effects.csv.

Let us examine the extent to which there are peer effects.

The model we want to estimate is:

postself_{i,t+1} = \beta_{0} + \beta_{1} prepeer{j,t} + \epsilon 

Estimating this equation in R with this data results in:

Screen Shot 2017-05-04 at 3.28.39 PM.png

If the randomization is proper, this coefficient should be stable if we control for the focal individuals own pretreatment score.

Screen Shot 2017-05-04 at 3.30.22 PM.png

Another worry we have is whether this effect of the peer (captured by the pre-treatment characteristics) is homogeneous or heterogeneous. That is, does it depend on the characteristics of the focal individual or does it apply to everyone? To test this, we include a main effect of the characteristics of the focal individual (self_char) and an interaction term (pre_peer * self_char).

Screen Shot 2017-05-04 at 3.33.01 PM.png

Here, we see that the peer effects depends on the characteristic of the focal individual. If the focal individual has this characteristic (e.g., willingness to listen), the peer effect is larger.

This is only a simple demonstration of the complexity of peer effects, there are likely to be many interactional factors that turn peer effects “on” or “off” or modulate them in some important way. One could imagine the following contingencies, where peer effects depend on characteristics of:

  • the focal individual
  • the environment
  • the alter/peer
  • personalities of both

 

Ideas on entrepreneurship, innovation and social networks

Here you will find some of my research, summaries of recent trends and topics in business research, and educational materials I’ve used or developed for my MBA and PhD classes. I focus on social network analysis, innovation and entrepreneurship. Here are some relevant posts:

Social Network Analysis

Class Syllabi

R/Methodological Tutorials

Conceptual lectures 

Analyzing Networks in R: Centrality and Graphing

One important procedure in network analysis is determining the centrality of a node within a social network. In this post, I will show you how to do four things:

  1. Calculate four centrality measures
    • Closeness centrality
    • Betweenness centrality
    • Degree centrality (indegree and outdegree)
    • Eigenvector centrality
  2. Symmetrize social networks
  3. Plot social networks using the gplot function in R.
  4. Correlate centrality measures to outcomes or dependent variables.

The Krackhardt Kite Network

Below is a stylized network, called the “Kite Network” developed by Professor David Krackhardt of Carnegie Mellon University.Screen Shot 2017-04-25 at 2.24.39 PM.png

The kite network has nodes that are more powerful than others. Which node is the most powerful in the kite network?

Screen Shot 2017-04-25 at 2.27.11 PM.png

One possible answer is node D. The reason is that it has the most number of connections. Indeed, is powerful. It has a type of centrality in the network called Popularity centrality or Degree centrality. If you want to get many people on board with an organizational change, or organize a party, D is your node. You can calculate degree centrality by merely counting the number of connections that a node has.

Screen Shot 2017-04-25 at 2.30.09 PM.png

Another answer is either F or G. The centrality of these nodes is a bit harder to see. They have what is called Farness centrality. If you count up the number of “hops” on the network it takes to get from one node (say, A) to all other nodes (B … to … J) and take the average, you get farness centrality. F and G have the lowest farness (or highest closeness) which means it takes a lot less time for information (or disease) to get from F and G to everyone else. Research has shown that Farness/Closeness is correlated to how fast ideas, knowledge, information spread out from a starting point.

Screen Shot 2017-04-25 at 2.42.06 PM

Finally, H has what we call Betweenness centrality. Betweenness measures the extent to which information must travel over a certain node in order to get somewhere else in the network. In other words, nodes high in betweenness are bridges that connect otherwise disconnected parts of the network.  There is a extremely large body of research showing that individuals who are high in betweenness have access to diverse information in their organizations and are often the source of creative ideas, have greater bargaining power, and experience superior career outcomes.

Representing Networks

The Kite Network provides a very simple introduction to the idea of centrality. The the starting point for thinking about network analysis is invariably a graph like the one above. Graphs are fundamental to network analysis, we can understand lot from just a graph. Some people, for instance, when they’ve seen enough graphs can tell how a network formed as well as what actions that individuals can engage in and so on and so forth.

 

The problem with graphs, however, is that as graphs grow larger and more dense. They reveal a lot less information just through pure visualization.

For example, lets compare the three graphs below:

 

 

With the small graph (with 10 nodes and 10% of the edges existing), it is rather easy to spin a story about who has power and who is marginal. The second graph (on the upper right) has only 50 nodes and 10% of the ties exist. Things are beginning to get messy. Once we move to 100 nodes and 10% ties, it is basically a hairball and little insight can be provided by just looking at it.

Due to the limited use of standard visualization techniques for networks, scholars have developed a wider and more flexible set of representations for networks and ways to reason about them.

The Starting Point for all Network Representations: Nodes and Edges

Recall that networks are made up of nodes and edges. These two elements are also the basic units of representation for the other methods we will use.

An important feature of all the representation strategies we will discuss is that they all represent almost exactly the same information as the graph above. Further, we can, with ease move from one representation to another in a few steps.

Matrices 

Let us begin with trying now to represent the Kite Network that we drew above as a matrix.  How do we go about doing this? I have created a csv file with the kite network that you can download here: kite.csv.  You can use the code from R-SNA-Kite.R to import the Kite network into R, and plot it.

# This provides some basic analysis of the kite network

library(data.table)

library(curl)

library(sna)

# Load the kite network

kite <- fread(https://www.dropbox.com/s/c7f6q7nn2w34o1c/kite.csv?dl=1&#8217;)

# Change the format to a matrix

kite = as.matrix(kite)

# Create a vector from A to J which will become the row and column names

names = c(“A”,“B”,“C”,“D”,“E”,“F”,“G”,“H”,“I”,“J”)

# Change the row names

# Rename all the rows

rownames(kite) = names

# Rename all the columns

colnames(kite) = names

# Display the kite network matrix

kite

# Plot the kite network

gplot(kite, label = rownames(kite)

 

 

 Lists

 We can also represent networks as lists instead of matrices.  Lists are exceptionally useful since there is “junk” information stored in matrices. This junk information primarily wastes spaces and adds clutter to the representation. The “0” values are junk in the sense that – although it is important to know that a tie is missing, we do not need to explicitly state it.

 Edge Lists

The edge list representation merely lists all the dyads which consist of the “1”’s in the matrix. We can easily do this for the Kite network by listing the edges:

A->B

A->C

A->D

Node Lists

 Node lists are similar to edge lists in that they are lists, but they are organized around the node and the connections that the node has to other nodes.

A               B C D F

B               A D E G

The beauty of all three representations (matricies, edge lists, node lists) is that they can represent exactly the same binary networks. There are slight differences that arise which we will discuss a bit later.

Directionality and Value in networks

 Undirected Networks

We have been working undirected networks. That is, networks that lack direction in their edges.  There are some phenomenon and interactions that inherently lack directionality.  The assumption of undirected ties has at least three implications:

  1. One implication is that you have less network data to represent.
  2. You don’t know exactly – or assume implicitly or explicitly – that the flow of information is equivalent regardless of direction across the network.
  3. Your graph does not include arrows.

What are some examples of “naturally” undirected relationships?

  • Shared-memberships
  • Co-authorships
  • Marriage

Directed Networks

Although we have not been using them in our reasoning, directed networks are an important representational tool in many contexts. In directed networks we assume a direction to the flow of “stuff” in the network. This direction of flow is represented graphically by the use of arrows at the end of the edges in the network.

Directionality increases data. Having directions to edges essentially doubling the amount of information we need to store about each edge.

Values in Networks

Another relaxation in our representation of networks is to add values to edges. Edges represent much more than just 0’s and 1’s. Networks can be valued – so that a dyad can have a value like 1..2..3..4 or .23 etc. What might be some examples of “valued” networks?

Although valued networks are more reflective of real social relationships than dichotomized networks, they are less commonly used. Part of the reason is that valued networks are harder to work with mathematically. Thus, people do not to use them as much as their dichotomized siblings.

Centrality

Now that we have the basics of representation down, let us try to extract some insight from the network. Let’s do network analysis.

The most common and often most useful way to analyze a social network is to look at the centrality of the nodes in the network. Centrality is a way to assess the relative importance of a node in a graph or a social network. Several different measures of centrality exist. Each measure has different properties and theoretical interpretations.

Measures of centrality can be classified into two types: (a) local and (b) global.

Local measures of centrality focus on a focal node (the focal node is the node that is currently the focus of attention) and the immediate features of the network surrounding that node. Local measures of centrality such as degree are often easy to calculate, but have as a limitation that they do not capture important features of the whole network.

Global measures on the other hand, take into account the larger network and incorporate features that are not limited to the focal actors immediate network.  Global measures such as closeness, Eigenvector centrality, or betweenness centrality are often much more difficult to calculate (especially by hand) but provide very rich information about the position of an actor in a social network. Global measures often take into account the network ties of all other entities in the larger network as well.

 Local Measures of Centrality

The simplest measure of centrality in a social network is degree. There are two types of degree centrality – indegree and outdegree.

  • Indegree is the count of the total number of incoming connections to a node. In the language of friendship, indegree can be thought of as “popularity” centrality. The node is popular because many other nodes nominate it as a node with whom they have a certain kind of relationship.
  • Outdegree is the total number of outgoing connections from a node. Outdegree can be thought of as the level of gregariousness of a node. Nodes with high outdegrees have many outgoing connections. In directed graphs indegree and outdegree can be distinguished, but in a undirected graph (no arrows) we can simply measure degree centrality.

 

Indegree and Outdegree

Outdegree_{i} = \sum_{j} N_{ij} 

In the equation above, we can think of N_{ij} as the value of the cell with the row index i and column index j  in a network matrix N .

Bob James Jill Jane
Bob 0 1 1 0
James 0 0 1 1
Jill 0 1 0 0
Jane 1 1 0 0

In the network represented by the matrix above, Bob has an outdegree of 2, but so does James, Jill and Jane.  However, if we calculate indegree, represented as:

Indegree_{i} = \sum_{j} N_{ji} 

We find that Bob has an indegree of 1, James 3, and Jill and Jane each have an indegree of 2.

Degree centrality is often a useful first cut at estimating the overall position of an entity in a social network. Although degree centrality is usually correlated with other more global measures of centrality, the correlation is not perfect and the information captured by the other centrality measures is sometimes as useful if not more useful than the humble degree centrality.

Global measures of centrality

Although indegree and outdegree are useful they are closer to “intuition” measures that rely on local and heuristic information about the actor than true position in the larger social network.

To really capture an actor’s position in a social network we will need to learn how to calculate more global measures.  Scholars have developed a variety of global measures of centrality, but three global measures are most commonly used. Interestingly, they also have a lot of technological applications and as one can imagine they are difficult to calculate by hand.

 Closeness centrality

The first measure we will cover is called closeness centrality. There are other names for it as well; sometimes it is called access centrality.  Simply put, closeness centrality captures the average distance from the focal node to all other nodes in the social network.  The mathematical representation of closeness is as follows:

Closeness_{i} = \left( \frac{\sum_{\forall j,-i D_{ij}}}{n-1} \right)/1

 

This formula can be easily interpreted.

The formula can be easily interpreted. We are trying to calculate the closeness of the node  to all other nodes in the network; thus, Closeness, . The numerator is the sum of all the pairwise distances between node i and all other nodes j (excluding i). That sum of distances is then divided by the total number of nodes in the network n subtracted by 1 (to adjust the count to exclude node i). We now have farness, which is the average distance of node i to all other nodes in the network. Taking the reciprocal gets us closeness.

Let us try and calculate closeness centrality using the Kite network. Focusing on node D, let us begin by calculating the distance between node D and all other nodes. It will take node D only one step to reach nodes A, B, C, E, G, and F. Two steps are required to reach node H. Three steps are required to reach node I and four steps are required to reach node J. Farness can be calculated using the following arithmetic:

 \frac{1+1+1+1+1+1+2+3+4}{9} = 1.67 

The farness centrality for node D is approximately 1.67. This means that on average, node D is less than two steps away from information in the network. Try and calculate the closeness centrality for all other nodes in network. Farness can easily be converted into closeness by taking the reciprocal (or some other scaling). Is the node that had the highest degree the one with the highest closeness?

The entities in a network that are high in closeness centrality are often the most appropriate choices for spreading information through the network.

Betweenness centrality

We now move to betweenness centrality. Betweenness is perhaps one of the most powerful measures of centrality and is tightly related to the idea of structural holes. Betweenness can be calculated as:

Betweenness_{i} = \sum_{\forall j,k} \frac{s_{j,k}(i)}{s_{jk}} 

The idea behind betweenness is simple. Betweenness measures the extent to which a node acts as a bridge between other nodes in the network. It is computed by looking at all pairs of nodes in the network and examining how frequently i, the focal node, exists on the shortest paths between nodes j and k.

  • The term s_{j,k}(i) in the equation  is the number of shortest paths originating at j and ending at k that must go through i.
  • The term s_{jk} is the total number of shortest paths going from j to k.
  • Thus \frac{s_{j,k}(i)}{s_{jk}} is the proportion of shortest paths between j and k that must go through i.
  • If we sum this term over all pairs of nodes excluding i in the network we have betweenness centrality.

Betweenness centrality calculations are quite difficult.

Most times a computer is required to do these calculations. However, we are in luck. Recent research indicates that local betweenness centrality, defined as:

  • Betweenness calculated based on only on the network consisting of a focal node’s contacts and the connections between them

is highly correlated with the larger betweenness measure.

Let us try to calculate betweenness on a very simple graph consisting of three nodes – A, B, and C. In calculating the betweenness of B we look at the number of shortest baths between A and C and C and A.

A—B—C

Since this is an undirected graph we can consider AC and CA to be the same. As we can see, there is only one shortest path between A and C. Thus, the denominator is 1. Of these shortest paths, one of them must go through B. Therefore, B’s betweenness is Betweenness(B) = 1/1 = 1. Similarly, we can see that in  computing A’s betweenness we evaluate the number of shortest paths between B and C. We find that there is 1 shortest path and none of these shortest paths goes through A  since B and C are directly connected. Thus, A’s betweenness centrality is Betweenness(A) = 0/1 = 0

If you like, try and calculate betweenness centrality scores for the kite  network. Who has the highest betweenness? Is it the same node with the highest degree or closeness?

Eigenvector centrality

The final measure of centrality is Eigenvector centrality. Think of Eigenvector (EV) centrality as degree centrality on Redbull. The basic intuition behind EV centrality is that it is not sufficient to have a large network, but your network contacts should also have a large network, and their network contacts should also have a large network, and so should their network contacts, etc.

Thus a recursive measure of centrality which is based not only on your degree, but the degree of your contacts, their contacts, and so on. Thus, two people with degree of 6 would have equivalent centrality even if one of those people was connected to people who were not connected to anyone else and the other was connected to six people who themselves were also connected to many other people.

It is generally not possible to calculate Eigenvector centrality by hand – except on the most trivial networks.

However, most network analysis packages have routines to calculate Eigenvector centrality quite efficiently.

Calculating Centrality, Symmetrizing Matricies and Plotting Networks

Now that we have a basic grasp of measures of centrality, let us use the professionals data we worked with in the prior lecture to calculate centrality for the “advice network.” The analysis file can be found here at RSNAcentrality.R.  You must load the data first, up until the centrality calculations. 

# Create a “weak” and “strong” symmetrized version of the advice network (q1)

q1.weak = symmetrize(q1,rule = “weak”) # a tie exists between ij and ji if ij == 1 OR ji == 1

q1.strong = symmetrize(q1,rule = “strong”) # a tie exists between ij and ji if ij == 1 AND ji == 1

# Calculate degree centrality for q1

q1.indegree = degree(q1, cmode = “indegree”)

q1.outdegree = degree(q1, cmode = “outdegree”)

# Calculate betweenness centrality

q1.betweenness = betweenness(q1)

# Calculate eigenvector centrality (we will need to do this for an undirected network, lets use weak)

q1.evcent.weak = evcent(q1.weak)

# Calculate closeness centrality, lets do this again with the weak symmetrized network

q1.closeness.weak = closeness(q1.weak)

# plot histograms of each of the centrality measures

par(mfrow = c(3,2))

hist(q1.indegree)

hist(q1.outdegree)

hist(q1.betweenness)

hist(q1.evcent.weak)

hist(q1.closeness.weak)

 

Screen Shot 2017-05-03 at 4.09.38 PM.png

 

Let us take a look at the scatter plots comparing these measures.

# What is the correlation between these centrality measures? Lets look at scatter plots.

pairs(~q1.indegree+q1.outdegree+q1.betweenness+q1.evcent.weak+q1.closeness.weak)

Screen Shot 2017-05-03 at 4.12.48 PMFinally, lets test a simple hypothesis. That more more “close” you are to others in a social network, the more likely you feel like you have the knowledge to succeed.

# Examine if there is a correlation between closeness centrality in the advice network whether

# they feel like they have the knowledge to succeed.

m.0 <- lm(attr$success ~ q1.closeness.weak)

summary(m.0)

# Plot the regression and the data points.

plot(q1.closeness.weak,attr$success)

abline(m.0)

Screen Shot 2017-05-03 at 4.17.34 PM.png

The first order correlation holds. Is this a real effect? How can we tell?