2.2. Introduction to Data Structures#

2.2.1. Data Structures#

In the last section, we met strings, integers, floats, and booleans. Each of these were types of data. Strings, for example, allowed us to work with text and numbers allowed us to work with integers and floats. In this chapter, we will begin working with data structures. Data structures are ways of storing multiple kinds of data in a systematic way. In Python, these are created as objects that can be stored in memory and called later in a script. They are divided into two categories: mutable and immutable. We encountered these terms in the last chapter, but we will explore what they mean in more depth below.

Throughout this section, we will learn about some of the key types of data structures, how they are different, and how they can be used. We will only cover these in a cursory manner. Throughout this textbook, we will use these data structures as we write code and perform data cleaning and data analysis tasks. To keep things simple for now, we will focus on four types of data structures: lists, tuples, sets and dictionaries. There are other types of data structures in Python, but these are the core four that you will use most frequently.

2.2.2. Lists#

The first data structure we will work with is known as a list. Lists are precisely what they sound like, a list of data. As we will see below, there are multiple ways of storing information in a list-like manner in Python, such as with tuples and sets, but the way we create lists and way we interact with lists is distinct.

Lists and tuples are identical with one major exception: lists are mutable. This means that you can create a list object and then alter it in memory. This allows for you to do very powerful things to lists that you cannot do to tuples. And these are going to be one of the key data structures you use in all digital humanities projects. The reason? We often need to adjust data while working with it.

As with data, we can create a list object in memory by creating a variable followed by an equal sign. To tell Python that the specific type of object we are creating is a list, we use an open and a close bracket. Each item in the list will be separated by a comma. Lists can store any type of data. To see this in action, let’s create our first list.

first_list = [1, 1.0, "one"]
print(first_list)
[1, 1.0, 'one']

2.2.2.1. Indexing a List#

In Python, we will frequently need to access a piece of data within a list or some other data structure. This is known as indexing. The way in which we index a list is with an open and a close bracket within which we place the position at which the data sits that we want to access. It is important to note that Python is a zero-index language. This means that we always begin with the number 0 and then count upward, so the item that sits in the first position in our list is index 0.

Let’s grab the item at index 0 in first_list.

print(first_list[0])
1

In the cell below try to grab the string “one” from first_list.

print(first_list)
[1, 1.0, 'one']

Notice that we have printed off successfully the number 1. Often times, though, it is important to index multiple items in a list. If we want to do this, we use [ ] again. Within the brackets we will have a start position and an end position. The end position will be the point after we want to grab. These will be separated by a :. In code, it would look something like this:

index_item[start:end]

Let’s say, we wanted to grab the first 3 items from the list, we would want to do something like this.

print(first_list[0:2])
[1, 1.0]

We can also work backwards with indexing. We can, for example, use a -1 to grab the final item in the list.

print(first_list[-1])
one

We can also use range indexing to grab the final three items. In Python, if you index a list with no end point, it will grab everything up to the end of that list. We can see this in the two examples below.

print(first_list[-2:])
[1.0, 'one']

We can likewise do the same in reverse by grabbing all indices up to the first index. In other words, the item in index 0.

print(first_list[:1])
[1]

In the Trinket application below, try to create a list and then index it in different places.

from IPython.display import IFrame
IFrame('https://trinket.io/embed/python3/3fe4c8f3f4', 700, 500)

2.2.3. Tuples#

Tuples are lists of data that cannot be changed. When we look at lists above, we will see that lists are the exact same thing as tuples, except they can be changed. We can distinguish tuples from lists by the way in which they are formed. While lists use square brackets, tuples use parentheses. We create a tuple, like the example below. Our tuple object is a_tuple and the tuple consist of three items: an integer 1, a float 1.0, and a string of “one”. Lists and tuples can contain all three of these types of data. The way in which we separate items in a tuple is with a comma.

first_tuple = (1, 1.0, "one")
print(first_tuple)
(1, 1.0, 'one')

In the Tricket application above, try to create a tuple and index it.

2.2.4. Mutability vs Immutability#

As noted above, tuples are immutable which means they cannot be changed. Let’s see precisely what this means in practice. Say, we wanted to add to a list. We can do this with the .append() method. This will take one argument, or piece of information placed between the parentheses. You will learn about arguments later when we discuss functions and methods in greater depth. For now, understand that the information passed between the parentheses tells the method or function what is needed to perform the function. In this case, .append() allows us to append, or add, something to a list. The argument that we pass, “one”, tells what we want to append. In this case, the string, “one”.

first_list.append("one")
print(first_list)
[1, 1.0, 'one', 'one']

Notice that we do not have an error. This is because our list is mutable, or changeable. This means that we can add to it, delete items from it, and other operations that allow us to change how it is stored in memory. Tuples, on the other hand, are immutable, or unchangeable. Let’s try and perform the same method on the tuple and see what happens.

first_tuple.append("one")
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Input In [27], in <cell line: 1>()
----> 1 first_tuple.append("one")

AttributeError: 'tuple' object has no attribute 'append'

Notice that we get an AttributeError. This means that a tuple does not have the ability to use the append method. This does not exist for tuples because they are immutable or unchangeable. The only way to alter the object name, tuple1 is to entirely replace it in memory.

In the Trinket application above, experiment with append by creating a list and appending data to it.

2.2.5. Sets (Bonus Data Structure)#

There is one other data structure similar to lists and tuples and I include it here as a bonus data structure. This is the set. A set is identical to a list. It is mutable, meaning we can update it, but unlike a list, it cannot contain duplicates. This is useful in niche circumstances, such as when you need to remove all duplicates from a list. I include it here just so that you are aware that other types of data structures do exist.

first_set = {1, 1.1, "one", "one"}
print (first_set)
{1, 'one', 1.1}

2.2.6. Dictionaries#

Like tuples and lists, dictionaries are a data structure in Python. Like lists, dictionaries are mutable, meaning they can be changed in memory. Unlike tuples and lists, dictionaries are not lists of data. Instead, they have two components: keys and values. These two components are separated by a colon. All of this is contained within squiggly brackets. In the example below, we have a dictionary, a_dict, with a key of “name” and a value of “William”.

names = {
        "first_name": "William",
         "last_name": "Mattingly"
        }
print(names)
{'first_name': 'William', 'last_name': 'Mattingly'}

In digital humanities projects, dictionaries are particularly useful for structuring complex data that you may have in Excel with each key being an Excel column and each value being its corresponding value. The dictionary name could be the name of the individual to whom the row corresponds. Like lists and tuples, you can embed data structures within a Python dictionary.

While we could realistically store our data in a list such as the one below (name_list), we would need to be consistent and always place the first name in index 0 and last name in index 1. This introduces potential issues later in a project. Imagine if one programmer left a project. Without good documentation, there is nothing inherent in this list that equates index 0 to first name and index 1 to last name. It is entirely up to the reader of the data to make sense of this data structure.

Remember, in programming it is always best to be explicit and produce readable code that others can understand. The dictionary allows us to create keys that indicate with greater specificity about the type of data with which we are working. We know from the names dictionary above that “William” is a first name and “Mattingly” is a last name without having to think about which index each string resides. We can do this because the keys of the dictionary are explicit.

name_list = ["William", "Mattingly"]
print (name_list)
['William', 'Mattingly']

2.2.6.1. Indexing Dictionaries#

In Python, we will frequently need to index a dictionary. Dictionaries, remember, are a bit different from lists and tuples. Rather than being a sequence of items in a list, a dictionary is a collection of keys and corresponding values. To index a dictionary, therefore, we need to work a bit differently. Rather than indexing at a specific point, we index dictionaries at a specific key.

To understand this, it is best to see it in practice, so let’s go ahead and try to grab the first name in our dictionary names.

print (names["first_name"])
William

In the Trinket application below, create a dictionary and practice indexing it at different keys.

IFrame('https://trinket.io/embed/python3/3fe4c8f3f4', 700, 500)

2.2.7. Quiz#

Hide code cell source
quiz = '''
# What kind of data structure is made with [ ]?
mc
* List
-c t
-f

* Tuple
-c f
-f

* Dictionary
-c f
-f

# What kind of data structure is made with ()?
mc
* List
-c f
-f

* Tuple
-c t
-f

* Dictionary
-c f
-f

# What kind of data structure is made with {}?
mc
* List
-c f
-f

* Tuple
-c f
-f

* Dictionary
-c t
-f

# What are the two parts of a dictionary?
mc
* Key
-c t
-f

* Value
-c t
-f

* Index
-c f
-f

# What index does a Python list start at?
mc
* 0
-c t
-f

* 1
-c f
-f
'''
from jupyterquiz import display_quiz
import md2json
import json
myquiz = md2json.convert(quiz)
myquiz = json.loads(myquiz)
display_quiz(myquiz)