3.1. The Basic Concepts of Social Network Analysis#
3.1.1. Basic Terminology#
Visualized social networks are often studied through graphs, or visual representations of relationships in a network. These graphs today stem often from graph theory, a branch of mathematics. In graph theory, mathematics is used to study graph-based problems. Believe it or not, we are all beneficiaries of this discipline. Have you ever used Google Maps to go from point A
to point B
? This is graph theory at play. Behind the scenes is a complex set of relationships that allow Google to recommend certain paths over others to ensure that you have the fastest (or least expensive) route.
This chapter will not cover all the complexities of SNA, rather give enough of the basic terminology and concepts so that all readers can learn how to leverage Python to perform SNA. The chief goal of this chapter is to introduce the process by which we can perform SNA through structured data.
In a graph, there are a collection of nodes. These are dots in the graph that represent each piece of data. In a social network graph, each node would be a person or some kind of entity that a researcher wishes to map. Other entities may include things like businesses or agencies.
In order to understand how different nodes relate to one another, we represent the relationships between them with edges. In a graph, these edges look like lines.
Because graphs are often drawn with mathematics, it is important to know the force of a graph. Force in network theory is the direction of movement between two nodes. If we are mapping how characters move to different places in a graph, all people and places would receive a node in a graph. We would then inflict force with the person doing the movement towards the place. This would position the nodes in a particular way in a graph.
In a network graph, we can often map multi-modal networks, or networks where different types of relationships are overlapping. Often, we can do this in a graph by representing each type of relationship as a separate edge color.
Graphs are a useful way to explore complex relationships because in a single image, we can glean information that would otherwise be missed. We can examine our data quantitatively, meaning we can see the frequency that certain nodes appear in our data and, more importantly, the frequency with which that node relates to other nodes in the graph. We could, of course, extract this information for a node relatively easily with Pandas, but a graph let’s us see many different relationships between many different pieces of data all in a single image. For these reasons, it is often useful to be familiar with SNA generally and generally how to map nodal relationships in Python.
3.1.2. SNA Libraries in Python#
Python has several libraries for performing SNA. In this chapter, we will look closely at two: NetworkX (and Matplotlib to visualize the graph) and PyVis. Each has its strengths and weaknesses.
NetworkX is a powerful library that allows users to hold complex graph-based data, such as nodes and edges, in a single class. It also allows us to perform basic mathematical calculations to discover things like centrality of a node in a graph, a concept we will learn about in the next section. NetworkX is designed to work alongside Matplotlib for plotting those graphs. As we shall see, there are some limitations to this approach. PyVis is another graph visualization library and while you can create graphs entirely independently of NetworkX, some workflows may benefit from creating the graph data in NetworkX and then passing that data to PyVis for visualization. You can download all of these libraries with pip
To install matplotlib, use the following command in your terminal:
pip install matplotlib
For NetworkX, you can use this command:
pip install networkx
And for PyVis, you can use this command:
pip install pyvis
Note that each of these libraries are all in lowercase when we install them via pip.