{ "cells": [ { "cell_type": "markdown", "id": "aboriginal-vampire", "metadata": {}, "source": [ "# Filter and Querying" ] }, { "cell_type": "markdown", "id": "corresponding-account", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "id": "classified-treasury", "metadata": {}, "source": [ "In this chapter, we will meet two advanced ways of filtering or searching (querying) our data. These are aptly named Filter() and Query() functions. These two functions allow us to do some fairly advanced things within a narrow scope. By narrow scope, I mean the questions that we want to pose. Whenever you are manipulating or probing data, it is always best to think about the task as simply asking a question. In essence, this is precisely what you are doing. You are asking the database a question. In order to ask the question correctly, as is the case with any language, you need to know the correct syntax and when that particular question is the right one to ask. In this chapter, we explore how to frame specific questions with Filter() and Query().\n", "\n", "Each function is used in particular circumstances. Filter() is useful for getting a large data down to a smaller size, based on the questions you want to ask. Query(), on the other hand, is useful for phrasing questions that use comparison operators (less than, equal to, greater than, etc.). Let's explore each in turn, but first, let's import our Titanic dataset." ] }, { "cell_type": "code", "execution_count": 1, "id": "educational-present", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88688702Montvila, Rev. Juozasmale27.00021153613.0000NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

891 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", ".. ... ... ... \n", "886 887 0 2 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "886 Montvila, Rev. Juozas male 27.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "886 0 211536 13.0000 NaN S \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[891 rows x 12 columns]" ] }, "execution_count": 1, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pandas as pd\n", "df = pd.read_csv(\"../data/titanic.csv\")\n", "df" ] }, { "cell_type": "markdown", "id": "infinite-playlist", "metadata": {}, "source": [ "## The Filter Function" ] }, { "cell_type": "markdown", "id": "passing-registration", "metadata": {}, "source": [ "The filter function is a great way to grab only the relevant columns. The syntax of filter is a bit easier to use. It takes a single argument, a list of strings. These strings correspond to the columns.\n", "\n", "**When to use Filter()**
\n", "Use filter() when you want to get a quick sense of your dataset or, as we shall see, create a new dataframe based on the columns you want. It is particularly useful if your dataset has many columns. You can also use it to reorder your columns in a more desired way.\n", "\n", "Let's say that we are interested in just studying the names of the passengers of the Titanic. It does not make sense to work with the entire DataFrame. We can, therefore, use filter to just grab the names column, like so." ] }, { "cell_type": "code", "execution_count": 2, "id": "regulation-power", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Name
0Braund, Mr. Owen Harris
1Cumings, Mrs. John Bradley (Florence Briggs Th...
2Heikkinen, Miss. Laina
3Futrelle, Mrs. Jacques Heath (Lily May Peel)
4Allen, Mr. William Henry
......
886Montvila, Rev. Juozas
887Graham, Miss. Margaret Edith
888Johnston, Miss. Catherine Helen \"Carrie\"
889Behr, Mr. Karl Howell
890Dooley, Mr. Patrick
\n", "

891 rows × 1 columns

\n", "
" ], "text/plain": [ " Name\n", "0 Braund, Mr. Owen Harris\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th...\n", "2 Heikkinen, Miss. Laina\n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel)\n", "4 Allen, Mr. William Henry\n", ".. ...\n", "886 Montvila, Rev. Juozas\n", "887 Graham, Miss. Margaret Edith\n", "888 Johnston, Miss. Catherine Helen \"Carrie\"\n", "889 Behr, Mr. Karl Howell\n", "890 Dooley, Mr. Patrick\n", "\n", "[891 rows x 1 columns]" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.filter([\"Name\"])" ] }, { "cell_type": "markdown", "id": "postal-economy", "metadata": {}, "source": [ "This is great, but what if I also want to see the ages of these passengers. No problem. I can add an additional column to the list." ] }, { "cell_type": "code", "execution_count": 3, "id": "successful-cinema", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
NameAge
0Braund, Mr. Owen Harris22.0
1Cumings, Mrs. John Bradley (Florence Briggs Th...38.0
2Heikkinen, Miss. Laina26.0
3Futrelle, Mrs. Jacques Heath (Lily May Peel)35.0
4Allen, Mr. William Henry35.0
.........
886Montvila, Rev. Juozas27.0
887Graham, Miss. Margaret Edith19.0
888Johnston, Miss. Catherine Helen \"Carrie\"NaN
889Behr, Mr. Karl Howell26.0
890Dooley, Mr. Patrick32.0
\n", "

891 rows × 2 columns

\n", "
" ], "text/plain": [ " Name Age\n", "0 Braund, Mr. Owen Harris 22.0\n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... 38.0\n", "2 Heikkinen, Miss. Laina 26.0\n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) 35.0\n", "4 Allen, Mr. William Henry 35.0\n", ".. ... ...\n", "886 Montvila, Rev. Juozas 27.0\n", "887 Graham, Miss. Margaret Edith 19.0\n", "888 Johnston, Miss. Catherine Helen \"Carrie\" NaN\n", "889 Behr, Mr. Karl Howell 26.0\n", "890 Dooley, Mr. Patrick 32.0\n", "\n", "[891 rows x 2 columns]" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.filter([\"Name\", \"Age\"])" ] }, { "cell_type": "markdown", "id": "living-anaheim", "metadata": {}, "source": [ "What if I want age to come before name? I can rearrange the order." ] }, { "cell_type": "code", "execution_count": 4, "id": "pending-inspector", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeName
022.0Braund, Mr. Owen Harris
138.0Cumings, Mrs. John Bradley (Florence Briggs Th...
226.0Heikkinen, Miss. Laina
335.0Futrelle, Mrs. Jacques Heath (Lily May Peel)
435.0Allen, Mr. William Henry
.........
88627.0Montvila, Rev. Juozas
88719.0Graham, Miss. Margaret Edith
888NaNJohnston, Miss. Catherine Helen \"Carrie\"
88926.0Behr, Mr. Karl Howell
89032.0Dooley, Mr. Patrick
\n", "

891 rows × 2 columns

\n", "
" ], "text/plain": [ " Age Name\n", "0 22.0 Braund, Mr. Owen Harris\n", "1 38.0 Cumings, Mrs. John Bradley (Florence Briggs Th...\n", "2 26.0 Heikkinen, Miss. Laina\n", "3 35.0 Futrelle, Mrs. Jacques Heath (Lily May Peel)\n", "4 35.0 Allen, Mr. William Henry\n", ".. ... ...\n", "886 27.0 Montvila, Rev. Juozas\n", "887 19.0 Graham, Miss. Margaret Edith\n", "888 NaN Johnston, Miss. Catherine Helen \"Carrie\"\n", "889 26.0 Behr, Mr. Karl Howell\n", "890 32.0 Dooley, Mr. Patrick\n", "\n", "[891 rows x 2 columns]" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.filter([\"Age\", \"Name\"])" ] }, { "cell_type": "markdown", "id": "tough-calvin", "metadata": {}, "source": [ "Note that we are not bound to the order of the DataFrame. This is particularly useful if we want to make a new DataFrame." ] }, { "cell_type": "code", "execution_count": 5, "id": "nonprofit-latter", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeName
022.0Braund, Mr. Owen Harris
138.0Cumings, Mrs. John Bradley (Florence Briggs Th...
226.0Heikkinen, Miss. Laina
335.0Futrelle, Mrs. Jacques Heath (Lily May Peel)
435.0Allen, Mr. William Henry
.........
88627.0Montvila, Rev. Juozas
88719.0Graham, Miss. Margaret Edith
888NaNJohnston, Miss. Catherine Helen \"Carrie\"
88926.0Behr, Mr. Karl Howell
89032.0Dooley, Mr. Patrick
\n", "

891 rows × 2 columns

\n", "
" ], "text/plain": [ " Age Name\n", "0 22.0 Braund, Mr. Owen Harris\n", "1 38.0 Cumings, Mrs. John Bradley (Florence Briggs Th...\n", "2 26.0 Heikkinen, Miss. Laina\n", "3 35.0 Futrelle, Mrs. Jacques Heath (Lily May Peel)\n", "4 35.0 Allen, Mr. William Henry\n", ".. ... ...\n", "886 27.0 Montvila, Rev. Juozas\n", "887 19.0 Graham, Miss. Margaret Edith\n", "888 NaN Johnston, Miss. Catherine Helen \"Carrie\"\n", "889 26.0 Behr, Mr. Karl Howell\n", "890 32.0 Dooley, Mr. Patrick\n", "\n", "[891 rows x 2 columns]" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "new_df = df.filter([\"Age\", \"Name\"])\n", "new_df" ] }, { "cell_type": "markdown", "id": "express-platinum", "metadata": {}, "source": [ "Now, we can only examine the data that we actually need. This will make our code faster and require examining fewer data for each row. Filter's big limitation is in the fact that it cannot filter the data further. We cannot, for example, add an extra argument to filter() that would only return the Names with \"Miss.\", but we can tack on additional arguments to filter, such as those that we saw in the previous chapter on searching strings.\n", "\n", "Let's try that now." ] }, { "cell_type": "code", "execution_count": 6, "id": "divided-influence", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0 False\n", "1 False\n", "2 True\n", "3 False\n", "4 False\n", " ... \n", "886 False\n", "887 True\n", "888 True\n", "889 False\n", "890 False\n", "Name: Name, Length: 891, dtype: bool" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.filter([\"Age\", \"Name\"]).Name.str.contains(\"Miss\\.\")" ] }, { "cell_type": "markdown", "id": "wired-compiler", "metadata": {}, "source": [ "Note that we now have a list of True False statements. These tell us if the word \"Miss.\" is in the column Name. If it is there, we see a True. If it is not, we see a False. Let's say we want to know how many passengers have the title \"Miss.\", we stack .value_counts() into the chain. Note the plural of counts." ] }, { "cell_type": "code", "execution_count": 7, "id": "continental-expense", "metadata": {}, "outputs": [ { "data": { "text/plain": [ "False 709\n", "True 182\n", "Name: Name, dtype: int64" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.filter([\"Age\", \"Name\"]).Name.str.contains(\"Miss\\.\").value_counts()" ] }, { "cell_type": "markdown", "id": "white-evans", "metadata": {}, "source": [ "Now, let's say we were interested in ONLY seeing the rows that contain \"Miss.\" in them. We need to structure that filtering as a list. But note that if we wrap the whole thing in a list, we don't filter out just the Age and Name columns. Instead, we get the entire DataFrame." ] }, { "cell_type": "code", "execution_count": 8, "id": "constitutional-fiction", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
111211Bonnell, Miss. Elizabethfemale58.00011378326.5500C103S
141503Vestrom, Miss. Hulda Amanda Adolfinafemale14.0003504067.8542NaNS
222313McGowan, Miss. Anna \"Annie\"female15.0003309238.0292NaNQ
.......................................
86686712Duran y More, Miss. Asuncionfemale27.010SC/PARIS 214913.8583NaNC
87587613Najib, Miss. Adele Kiamie \"Jane\"female15.00026677.2250NaNC
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
\n", "

182 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "2 3 1 3 Heikkinen, Miss. Laina \n", "10 11 1 3 Sandstrom, Miss. Marguerite Rut \n", "11 12 1 1 Bonnell, Miss. Elizabeth \n", "14 15 0 3 Vestrom, Miss. Hulda Amanda Adolfina \n", "22 23 1 3 McGowan, Miss. Anna \"Annie\" \n", ".. ... ... ... ... \n", "866 867 1 2 Duran y More, Miss. Asuncion \n", "875 876 1 3 Najib, Miss. Adele Kiamie \"Jane\" \n", "882 883 0 3 Dahlberg, Miss. Gerda Ulrika \n", "887 888 1 1 Graham, Miss. Margaret Edith \n", "888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "2 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S \n", "10 female 4.0 1 1 PP 9549 16.7000 G6 S \n", "11 female 58.0 0 0 113783 26.5500 C103 S \n", "14 female 14.0 0 0 350406 7.8542 NaN S \n", "22 female 15.0 0 0 330923 8.0292 NaN Q \n", ".. ... ... ... ... ... ... ... ... \n", "866 female 27.0 1 0 SC/PARIS 2149 13.8583 NaN C \n", "875 female 15.0 0 0 2667 7.2250 NaN C \n", "882 female 22.0 0 0 7552 10.5167 NaN S \n", "887 female 19.0 0 0 112053 30.0000 B42 S \n", "888 female NaN 1 2 W./C. 6607 23.4500 NaN S \n", "\n", "[182 rows x 12 columns]" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.filter([\"Age\", \"Name\"]).Name.str.contains(\"Miss\\.\") == True]" ] }, { "cell_type": "markdown", "id": "seasonal-search", "metadata": {}, "source": [ "This is because of where our filter occurs in the chain of commands. Note that filter occurs within the brackets where we are setting up our parameters. This means that we are filtering under the conditions of how the list is created, but that once a row is processed, the DataFrame is unfiltered. If we want our filter to work, we need to place it after the conditions have been sorted. Notice that the filter is outside of our brackets now." ] }, { "cell_type": "code", "execution_count": 9, "id": "gorgeous-doctrine", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
AgeName
226.0Heikkinen, Miss. Laina
104.0Sandstrom, Miss. Marguerite Rut
1158.0Bonnell, Miss. Elizabeth
1414.0Vestrom, Miss. Hulda Amanda Adolfina
2215.0McGowan, Miss. Anna \"Annie\"
.........
86627.0Duran y More, Miss. Asuncion
87515.0Najib, Miss. Adele Kiamie \"Jane\"
88222.0Dahlberg, Miss. Gerda Ulrika
88719.0Graham, Miss. Margaret Edith
888NaNJohnston, Miss. Catherine Helen \"Carrie\"
\n", "

182 rows × 2 columns

\n", "
" ], "text/plain": [ " Age Name\n", "2 26.0 Heikkinen, Miss. Laina\n", "10 4.0 Sandstrom, Miss. Marguerite Rut\n", "11 58.0 Bonnell, Miss. Elizabeth\n", "14 14.0 Vestrom, Miss. Hulda Amanda Adolfina\n", "22 15.0 McGowan, Miss. Anna \"Annie\"\n", ".. ... ...\n", "866 27.0 Duran y More, Miss. Asuncion\n", "875 15.0 Najib, Miss. Adele Kiamie \"Jane\"\n", "882 22.0 Dahlberg, Miss. Gerda Ulrika\n", "887 19.0 Graham, Miss. Margaret Edith\n", "888 NaN Johnston, Miss. Catherine Helen \"Carrie\"\n", "\n", "[182 rows x 2 columns]" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df[df.Name.str.contains(\"Miss\\.\") == True].filter([\"Age\", \"Name\"])" ] }, { "cell_type": "markdown", "id": "institutional-algorithm", "metadata": {}, "source": [ "## The Query Function" ] }, { "cell_type": "markdown", "id": "mysterious-scout", "metadata": {}, "source": [ "The Pandas Query() method is a fantastic way to filter and query data. Unlike other Pandas methods, it uses a string argument that functions rather similar to SQL syntax.\n", "\n", "**When to use Query**
\n", "You should only use Query() when your question (query) can be posed as greater than, less than, equal to, or not equal to (or some combination of these). Let me demonstrate. If we wanted to filter out all the rows where the Pclass was equal to 3, we could use the following string." ] }, { "cell_type": "code", "execution_count": 16, "id": "potential-istanbul", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
5603Moran, Mr. JamesmaleNaN003308778.4583NaNQ
7803Palsson, Master. Gosta Leonardmale2.03134990921.0750NaNS
.......................................
88288303Dahlberg, Miss. Gerda Ulrikafemale22.000755210.5167NaNS
88488503Sutehall, Mr. Henry Jrmale25.000SOTON/OQ 3920767.0500NaNS
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

491 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "0 1 0 3 Braund, Mr. Owen Harris \n", "2 3 1 3 Heikkinen, Miss. Laina \n", "4 5 0 3 Allen, Mr. William Henry \n", "5 6 0 3 Moran, Mr. James \n", "7 8 0 3 Palsson, Master. Gosta Leonard \n", ".. ... ... ... ... \n", "882 883 0 3 Dahlberg, Miss. Gerda Ulrika \n", "884 885 0 3 Sutehall, Mr. Henry Jr \n", "885 886 0 3 Rice, Mrs. William (Margaret Norton) \n", "888 889 0 3 Johnston, Miss. Catherine Helen \"Carrie\" \n", "890 891 0 3 Dooley, Mr. Patrick \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "0 male 22.0 1 0 A/5 21171 7.2500 NaN S \n", "2 female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S \n", "4 male 35.0 0 0 373450 8.0500 NaN S \n", "5 male NaN 0 0 330877 8.4583 NaN Q \n", "7 male 2.0 3 1 349909 21.0750 NaN S \n", ".. ... ... ... ... ... ... ... ... \n", "882 female 22.0 0 0 7552 10.5167 NaN S \n", "884 male 25.0 0 0 SOTON/OQ 392076 7.0500 NaN S \n", "885 female 39.0 0 5 382652 29.1250 NaN Q \n", "888 female NaN 1 2 W./C. 6607 23.4500 NaN S \n", "890 male 32.0 0 0 370376 7.7500 NaN Q \n", "\n", "[491 rows x 12 columns]" ] }, "execution_count": 16, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query(\"Pclass == 3\")" ] }, { "cell_type": "markdown", "id": "35c1fa09-26d9-4103-a98c-4d35ab5a3058", "metadata": {}, "source": [ "Query can also look for if a column contains any item in a list. We can do this by setting Pclass `== [list]`" ] }, { "cell_type": "code", "execution_count": 17, "id": "c15c56d1-e426-43cc-b5b8-be334b1cd550", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
0103Braund, Mr. Owen Harrismale22.010A/5 211717.2500NaNS
1211Cumings, Mrs. John Bradley (Florence Briggs Th...female38.010PC 1759971.2833C85C
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
3411Futrelle, Mrs. Jacques Heath (Lily May Peel)female35.01011380353.1000C123S
4503Allen, Mr. William Henrymale35.0003734508.0500NaNS
.......................................
88588603Rice, Mrs. William (Margaret Norton)female39.00538265229.1250NaNQ
88788811Graham, Miss. Margaret Edithfemale19.00011205330.0000B42S
88888903Johnston, Miss. Catherine Helen \"Carrie\"femaleNaN12W./C. 660723.4500NaNS
88989011Behr, Mr. Karl Howellmale26.00011136930.0000C148C
89089103Dooley, Mr. Patrickmale32.0003703767.7500NaNQ
\n", "

707 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "0 1 0 3 \n", "1 2 1 1 \n", "2 3 1 3 \n", "3 4 1 1 \n", "4 5 0 3 \n", ".. ... ... ... \n", "885 886 0 3 \n", "887 888 1 1 \n", "888 889 0 3 \n", "889 890 1 1 \n", "890 891 0 3 \n", "\n", " Name Sex Age SibSp \\\n", "0 Braund, Mr. Owen Harris male 22.0 1 \n", "1 Cumings, Mrs. John Bradley (Florence Briggs Th... female 38.0 1 \n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "3 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 \n", "4 Allen, Mr. William Henry male 35.0 0 \n", ".. ... ... ... ... \n", "885 Rice, Mrs. William (Margaret Norton) female 39.0 0 \n", "887 Graham, Miss. Margaret Edith female 19.0 0 \n", "888 Johnston, Miss. Catherine Helen \"Carrie\" female NaN 1 \n", "889 Behr, Mr. Karl Howell male 26.0 0 \n", "890 Dooley, Mr. Patrick male 32.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "0 0 A/5 21171 7.2500 NaN S \n", "1 0 PC 17599 71.2833 C85 C \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "3 0 113803 53.1000 C123 S \n", "4 0 373450 8.0500 NaN S \n", ".. ... ... ... ... ... \n", "885 5 382652 29.1250 NaN Q \n", "887 0 112053 30.0000 B42 S \n", "888 2 W./C. 6607 23.4500 NaN S \n", "889 0 111369 30.0000 C148 C \n", "890 0 370376 7.7500 NaN Q \n", "\n", "[707 rows x 12 columns]" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query(\"Pclass == [1,3]\")" ] }, { "cell_type": "markdown", "id": "seven-spine", "metadata": {}, "source": [ "We can even stack questions together within this string. Let's say, I am interested in all who were in Pclass 3 and survived. I could write the following string argument." ] }, { "cell_type": "code", "execution_count": 18, "id": "sought-console", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
2313Heikkinen, Miss. Lainafemale26.000STON/O2. 31012827.9250NaNS
8913Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg)female27.00234774211.1333NaNS
101113Sandstrom, Miss. Marguerite Rutfemale4.011PP 954916.7000G6S
192013Masselmani, Mrs. FatimafemaleNaN0026497.2250NaNC
222313McGowan, Miss. Anna \"Annie\"female15.0003309238.0292NaNQ
.......................................
83883913Chip, Mr. Changmale32.000160156.4958NaNS
85585613Aks, Mrs. Sam (Leah Rosen)female18.0013920919.3500NaNS
85885913Baclini, Mrs. Solomon (Latifa Qurban)female24.003266619.2583NaNC
86987013Johnson, Master. Harold Theodormale4.01134774211.1333NaNS
87587613Najib, Miss. Adele Kiamie \"Jane\"female15.00026677.2250NaNC
\n", "

119 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "2 3 1 3 \n", "8 9 1 3 \n", "10 11 1 3 \n", "19 20 1 3 \n", "22 23 1 3 \n", ".. ... ... ... \n", "838 839 1 3 \n", "855 856 1 3 \n", "858 859 1 3 \n", "869 870 1 3 \n", "875 876 1 3 \n", "\n", " Name Sex Age SibSp \\\n", "2 Heikkinen, Miss. Laina female 26.0 0 \n", "8 Johnson, Mrs. Oscar W (Elisabeth Vilhelmina Berg) female 27.0 0 \n", "10 Sandstrom, Miss. Marguerite Rut female 4.0 1 \n", "19 Masselmani, Mrs. Fatima female NaN 0 \n", "22 McGowan, Miss. Anna \"Annie\" female 15.0 0 \n", ".. ... ... ... ... \n", "838 Chip, Mr. Chang male 32.0 0 \n", "855 Aks, Mrs. Sam (Leah Rosen) female 18.0 0 \n", "858 Baclini, Mrs. Solomon (Latifa Qurban) female 24.0 0 \n", "869 Johnson, Master. Harold Theodor male 4.0 1 \n", "875 Najib, Miss. Adele Kiamie \"Jane\" female 15.0 0 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "2 0 STON/O2. 3101282 7.9250 NaN S \n", "8 2 347742 11.1333 NaN S \n", "10 1 PP 9549 16.7000 G6 S \n", "19 0 2649 7.2250 NaN C \n", "22 0 330923 8.0292 NaN Q \n", ".. ... ... ... ... ... \n", "838 0 1601 56.4958 NaN S \n", "855 1 392091 9.3500 NaN S \n", "858 3 2666 19.2583 NaN C \n", "869 1 347742 11.1333 NaN S \n", "875 0 2667 7.2250 NaN C \n", "\n", "[119 rows x 12 columns]" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query(\"Pclass == 3 & Survived == 1\")" ] }, { "cell_type": "markdown", "id": "right-turkey", "metadata": {}, "source": [ "Let's make the question even more complex. We want to now find the number of these individuals who were over the age of 40." ] }, { "cell_type": "code", "execution_count": 19, "id": "standard-suffering", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
33833913Dahl, Mr. Karl Edwartmale45.00075988.0500NaNS
41441513Sundman, Mr. Johan Julianmale44.000STON/O 2. 31012697.9250NaNS
48348413Turkula, Mrs. (Hedwig)female63.00041349.5875NaNS
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name Sex Age \\\n", "338 339 1 3 Dahl, Mr. Karl Edwart male 45.0 \n", "414 415 1 3 Sundman, Mr. Johan Julian male 44.0 \n", "483 484 1 3 Turkula, Mrs. (Hedwig) female 63.0 \n", "\n", " SibSp Parch Ticket Fare Cabin Embarked \n", "338 0 0 7598 8.0500 NaN S \n", "414 0 0 STON/O 2. 3101269 7.9250 NaN S \n", "483 0 0 4134 9.5875 NaN S " ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query(\"Pclass == 3 & Survived == 1 & Age > 40\")" ] }, { "cell_type": "markdown", "id": "functional-convergence", "metadata": {}, "source": [ "I now have a list of 3 individuals who met all criteria. I can also use my Or operator (|), rather than &. Let's see if we can achieve what we want. (*Note this is an intentional mistake. Look below for why*)." ] }, { "cell_type": "code", "execution_count": 13, "id": "sunset-copying", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
7803Palsson, Master. Gosta Leonardmale2.003134990921.0750NaNS
101113Sandstrom, Miss. Marguerite Rutfemale4.0011PP 954916.7000G6S
161703Rice, Master. Eugenemale2.004138265229.1250NaNQ
242503Palsson, Miss. Torborg Danirafemale8.003134990921.0750NaNS
434412Laroche, Miss. Simonne Marie Anne Andreefemale3.0012SC/Paris 212341.5792NaNC
.......................................
82782812Mallet, Master. Andremale1.0002S.C./PARIS 207937.0042NaNC
83183212Richards, Master. George Sibleymale0.83112910618.7500NaNS
85085103Andersson, Master. Sigvard Harald Eliasmale4.004234708231.2750NaNS
85285303Boulos, Miss. Nourelainfemale9.0011267815.2458NaNC
86987013Johnson, Master. Harold Theodormale4.001134774211.1333NaNS
\n", "

65 rows × 12 columns

\n", "
" ], "text/plain": [ " PassengerId Survived Pclass Name \\\n", "7 8 0 3 Palsson, Master. Gosta Leonard \n", "10 11 1 3 Sandstrom, Miss. Marguerite Rut \n", "16 17 0 3 Rice, Master. Eugene \n", "24 25 0 3 Palsson, Miss. Torborg Danira \n", "43 44 1 2 Laroche, Miss. Simonne Marie Anne Andree \n", ".. ... ... ... ... \n", "827 828 1 2 Mallet, Master. Andre \n", "831 832 1 2 Richards, Master. George Sibley \n", "850 851 0 3 Andersson, Master. Sigvard Harald Elias \n", "852 853 0 3 Boulos, Miss. Nourelain \n", "869 870 1 3 Johnson, Master. Harold Theodor \n", "\n", " Sex Age SibSp Parch Ticket Fare Cabin Embarked \n", "7 male 2.00 3 1 349909 21.0750 NaN S \n", "10 female 4.00 1 1 PP 9549 16.7000 G6 S \n", "16 male 2.00 4 1 382652 29.1250 NaN Q \n", "24 female 8.00 3 1 349909 21.0750 NaN S \n", "43 female 3.00 1 2 SC/Paris 2123 41.5792 NaN C \n", ".. ... ... ... ... ... ... ... ... \n", "827 male 1.00 0 2 S.C./PARIS 2079 37.0042 NaN C \n", "831 male 0.83 1 1 29106 18.7500 NaN S \n", "850 male 4.00 4 2 347082 31.2750 NaN S \n", "852 female 9.00 1 1 2678 15.2458 NaN C \n", "869 male 4.00 1 1 347742 11.1333 NaN S \n", "\n", "[65 rows x 12 columns]" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query(\"Pclass == 3 & Survived == 1 & Age > 40 | Age < 10\")" ] }, { "cell_type": "markdown", "id": "hourly-separation", "metadata": {}, "source": [ "Woops! Something has gone seriously wrong here. We have all different kinds of Pclasses, not just 3s. We have people who survived and did not. And, most problematically, we have people only under the age of 10. What has gone wrong here!? The answer lies in a perhaps forgotten part of math from when we were children, the order of operations. If you recall from those lessons, the order of operations determines the way in which you process the problem. 4 + 7 x 2 is very different from (4+7) x 2. The former is 18 and the latter is 22 because the latter has parentheses which tell the reader to do that operation first. Because programming sits on top of mathematics (especially Boolean algebra), the syntax of mathematics is often embedded in programming.\n", "\n", "Let's use the order of operations correctly and rephrase our query. Note the parentheses now before Age > 40 and after Age < 10. Note also that the & is before the parentheses." ] }, { "cell_type": "code", "execution_count": 14, "id": "liked-canyon", "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
PassengerIdSurvivedPclassNameSexAgeSibSpParchTicketFareCabinEmbarked
101113Sandstrom, Miss. Marguerite Rutfemale4.0011PP 954916.7000G6S
16516613Goldsmith, Master. Frank John William \"Frankie\"male9.000236329120.5250NaNS
17217313Johnson, Miss. Eleanor Ileenfemale1.001134774211.1333NaNS
18418513Kink-Heilmann, Miss. Luise Gretchenfemale4.000231515322.0250NaNS
23323413Asplund, Miss. Lillian Gertrudfemale5.004234707731.3875NaNS
26126213Asplund, Master. Edvin Rojj Felixmale3.004234707731.3875NaNS
33833913Dahl, Mr. Karl Edwartmale45.000075988.0500NaNS
34834913Coutts, Master. William Loch \"William\"male3.0011C.A. 3767115.9000NaNS
38138213Nakid, Miss. Maria (\"Mary\")female1.0002265315.7417NaNC
41441513Sundman, Mr. Johan Julianmale44.0000STON/O 2. 31012697.9250NaNS
44844913Baclini, Miss. Marie Catherinefemale5.0021266619.2583NaNC
46947013Baclini, Miss. Helene Barbarafemale0.7521266619.2583NaNC
47948013Hirvonen, Miss. Hildur Efemale2.0001310129812.2875NaNS
48348413Turkula, Mrs. (Hedwig)female63.000041349.5875NaNS
48949013Coutts, Master. Eden Leslie \"Neville\"male9.0011C.A. 3767115.9000NaNS
64464513Baclini, Miss. Eugeniefemale0.7521266619.2583NaNC
69169213Karun, Miss. Mancafemale4.000134925613.4167NaNC
75175213Moor, Master. Meiermale6.000139209612.4750E121S
77777813Emanuel, Miss. Virginia Ethelfemale5.000036451612.4750NaNS
78878913Dean, Master. Bertram Veremale1.0012C.A. 231520.5750NaNS
80380413Thomas, Master. Assad Alexandermale0.420126258.5167NaNC
86987013Johnson, Master. Harold Theodormale4.001134774211.1333NaNS
\n", "
" ], "text/plain": [ " PassengerId Survived Pclass \\\n", "10 11 1 3 \n", "165 166 1 3 \n", "172 173 1 3 \n", "184 185 1 3 \n", "233 234 1 3 \n", "261 262 1 3 \n", "338 339 1 3 \n", "348 349 1 3 \n", "381 382 1 3 \n", "414 415 1 3 \n", "448 449 1 3 \n", "469 470 1 3 \n", "479 480 1 3 \n", "483 484 1 3 \n", "489 490 1 3 \n", "644 645 1 3 \n", "691 692 1 3 \n", "751 752 1 3 \n", "777 778 1 3 \n", "788 789 1 3 \n", "803 804 1 3 \n", "869 870 1 3 \n", "\n", " Name Sex Age SibSp \\\n", "10 Sandstrom, Miss. Marguerite Rut female 4.00 1 \n", "165 Goldsmith, Master. Frank John William \"Frankie\" male 9.00 0 \n", "172 Johnson, Miss. Eleanor Ileen female 1.00 1 \n", "184 Kink-Heilmann, Miss. Luise Gretchen female 4.00 0 \n", "233 Asplund, Miss. Lillian Gertrud female 5.00 4 \n", "261 Asplund, Master. Edvin Rojj Felix male 3.00 4 \n", "338 Dahl, Mr. Karl Edwart male 45.00 0 \n", "348 Coutts, Master. William Loch \"William\" male 3.00 1 \n", "381 Nakid, Miss. Maria (\"Mary\") female 1.00 0 \n", "414 Sundman, Mr. Johan Julian male 44.00 0 \n", "448 Baclini, Miss. Marie Catherine female 5.00 2 \n", "469 Baclini, Miss. Helene Barbara female 0.75 2 \n", "479 Hirvonen, Miss. Hildur E female 2.00 0 \n", "483 Turkula, Mrs. (Hedwig) female 63.00 0 \n", "489 Coutts, Master. Eden Leslie \"Neville\" male 9.00 1 \n", "644 Baclini, Miss. Eugenie female 0.75 2 \n", "691 Karun, Miss. Manca female 4.00 0 \n", "751 Moor, Master. Meier male 6.00 0 \n", "777 Emanuel, Miss. Virginia Ethel female 5.00 0 \n", "788 Dean, Master. Bertram Vere male 1.00 1 \n", "803 Thomas, Master. Assad Alexander male 0.42 0 \n", "869 Johnson, Master. Harold Theodor male 4.00 1 \n", "\n", " Parch Ticket Fare Cabin Embarked \n", "10 1 PP 9549 16.7000 G6 S \n", "165 2 363291 20.5250 NaN S \n", "172 1 347742 11.1333 NaN S \n", "184 2 315153 22.0250 NaN S \n", "233 2 347077 31.3875 NaN S \n", "261 2 347077 31.3875 NaN S \n", "338 0 7598 8.0500 NaN S \n", "348 1 C.A. 37671 15.9000 NaN S \n", "381 2 2653 15.7417 NaN C \n", "414 0 STON/O 2. 3101269 7.9250 NaN S \n", "448 1 2666 19.2583 NaN C \n", "469 1 2666 19.2583 NaN C \n", "479 1 3101298 12.2875 NaN S \n", "483 0 4134 9.5875 NaN S \n", "489 1 C.A. 37671 15.9000 NaN S \n", "644 1 2666 19.2583 NaN C \n", "691 1 349256 13.4167 NaN C \n", "751 1 392096 12.4750 E121 S \n", "777 0 364516 12.4750 NaN S \n", "788 2 C.A. 2315 20.5750 NaN S \n", "803 1 2625 8.5167 NaN C \n", "869 1 347742 11.1333 NaN S " ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "df.query(\"Pclass == 3 & Survived == 1 & (Age > 40 | Age < 10)\")" ] }, { "cell_type": "markdown", "id": "vanilla-north", "metadata": {}, "source": [ "This is why query is such a powerful function in Pandas. You can do a lot with a single string. There are other ways to achieve this same result, but if your question can be entirely phrased as a series of comparison operators (equal to, less than, etc.), then Query is likely the best option." ] }, { "cell_type": "code", "execution_count": null, "id": "d99a2aa1-1a99-4b7f-a728-8e23a8d8ec1f", "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" } }, "nbformat": 4, "nbformat_minor": 5 }