{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "dominant-celebration",
   "metadata": {},
   "source": [
    "# Events Analysis"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "075538b3-d372-4025-b0d6-808882086c0b",
   "metadata": {},
   "source": [
    "The only output file that details event data is the .tokens file. As a result, this file will be the focus of this chapter. Each section of this chapter will analyze the .tokens file in a deeper way to identify and extract event data. At the end of the chapter, we will bring everything together with a single function that can recreate these results on any BookNLP output .tokens file."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dependent-greeting",
   "metadata": {},
   "source": [
    "## Exploring the Tokens File"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "30d6d992-ae58-4a43-b28c-ad5c95121671",
   "metadata": {},
   "source": [
    "Let's first go ahead and open up the .tokens file and take a look at it so we can remember precisely what the .tsv file looks like. If you remember from chapter 3, we can analyze the files a bit more easily, if we use pandas, a tabular data analysis library in Python."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "cultural-medicine",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>paragraph_ID</th>\n",
       "      <th>sentence_ID</th>\n",
       "      <th>token_ID_within_sentence</th>\n",
       "      <th>token_ID_within_document</th>\n",
       "      <th>word</th>\n",
       "      <th>lemma</th>\n",
       "      <th>byte_onset</th>\n",
       "      <th>byte_offset</th>\n",
       "      <th>POS_tag</th>\n",
       "      <th>fine_POS_tag</th>\n",
       "      <th>dependency_relation</th>\n",
       "      <th>syntactic_head_ID</th>\n",
       "      <th>event</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>Mr.</td>\n",
       "      <td>Mr.</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>nmod</td>\n",
       "      <td>3</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>and</td>\n",
       "      <td>and</td>\n",
       "      <td>4</td>\n",
       "      <td>7</td>\n",
       "      <td>CCONJ</td>\n",
       "      <td>CC</td>\n",
       "      <td>cc</td>\n",
       "      <td>0</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>Mrs.</td>\n",
       "      <td>Mrs.</td>\n",
       "      <td>8</td>\n",
       "      <td>12</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>compound</td>\n",
       "      <td>3</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>3</td>\n",
       "      <td>Dursley</td>\n",
       "      <td>Dursley</td>\n",
       "      <td>13</td>\n",
       "      <td>20</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>nsubj</td>\n",
       "      <td>12</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>4</td>\n",
       "      <td>4</td>\n",
       "      <td>,</td>\n",
       "      <td>,</td>\n",
       "      <td>20</td>\n",
       "      <td>21</td>\n",
       "      <td>PUNCT</td>\n",
       "      <td>,</td>\n",
       "      <td>punct</td>\n",
       "      <td>3</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99251</th>\n",
       "      <td>2995</td>\n",
       "      <td>6172</td>\n",
       "      <td>10</td>\n",
       "      <td>99251</td>\n",
       "      <td>Dudley</td>\n",
       "      <td>Dudley</td>\n",
       "      <td>438929</td>\n",
       "      <td>438935</td>\n",
       "      <td>PROPN</td>\n",
       "      <td>NNP</td>\n",
       "      <td>pobj</td>\n",
       "      <td>99250</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99252</th>\n",
       "      <td>2995</td>\n",
       "      <td>6172</td>\n",
       "      <td>11</td>\n",
       "      <td>99252</td>\n",
       "      <td>this</td>\n",
       "      <td>this</td>\n",
       "      <td>438936</td>\n",
       "      <td>438940</td>\n",
       "      <td>DET</td>\n",
       "      <td>DT</td>\n",
       "      <td>det</td>\n",
       "      <td>99253</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99253</th>\n",
       "      <td>2995</td>\n",
       "      <td>6172</td>\n",
       "      <td>12</td>\n",
       "      <td>99253</td>\n",
       "      <td>summer</td>\n",
       "      <td>summer</td>\n",
       "      <td>438941</td>\n",
       "      <td>438947</td>\n",
       "      <td>NOUN</td>\n",
       "      <td>NN</td>\n",
       "      <td>npadvmod</td>\n",
       "      <td>99245</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99254</th>\n",
       "      <td>2995</td>\n",
       "      <td>6172</td>\n",
       "      <td>13</td>\n",
       "      <td>99254</td>\n",
       "      <td>....</td>\n",
       "      <td>....</td>\n",
       "      <td>438947</td>\n",
       "      <td>438951</td>\n",
       "      <td>PUNCT</td>\n",
       "      <td>.</td>\n",
       "      <td>punct</td>\n",
       "      <td>99243</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99255</th>\n",
       "      <td>2995</td>\n",
       "      <td>6172</td>\n",
       "      <td>14</td>\n",
       "      <td>99255</td>\n",
       "      <td>\\t</td>\n",
       "      <td>438951</td>\n",
       "      <td>438952</td>\n",
       "      <td>PUNCT</td>\n",
       "      <td>''</td>\n",
       "      <td>punct</td>\n",
       "      <td>99243</td>\n",
       "      <td>O</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>99256 rows × 13 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       paragraph_ID  sentence_ID  token_ID_within_sentence  \\\n",
       "0                 0            0                         0   \n",
       "1                 0            0                         1   \n",
       "2                 0            0                         2   \n",
       "3                 0            0                         3   \n",
       "4                 0            0                         4   \n",
       "...             ...          ...                       ...   \n",
       "99251          2995         6172                        10   \n",
       "99252          2995         6172                        11   \n",
       "99253          2995         6172                        12   \n",
       "99254          2995         6172                        13   \n",
       "99255          2995         6172                        14   \n",
       "\n",
       "       token_ID_within_document     word    lemma  byte_onset byte_offset  \\\n",
       "0                             0      Mr.      Mr.           0           3   \n",
       "1                             1      and      and           4           7   \n",
       "2                             2     Mrs.     Mrs.           8          12   \n",
       "3                             3  Dursley  Dursley          13          20   \n",
       "4                             4        ,        ,          20          21   \n",
       "...                         ...      ...      ...         ...         ...   \n",
       "99251                     99251   Dudley   Dudley      438929      438935   \n",
       "99252                     99252     this     this      438936      438940   \n",
       "99253                     99253   summer   summer      438941      438947   \n",
       "99254                     99254     ....     ....      438947      438951   \n",
       "99255                     99255       \\t   438951      438952       PUNCT   \n",
       "\n",
       "      POS_tag fine_POS_tag dependency_relation syntactic_head_ID event  \n",
       "0       PROPN          NNP                nmod                 3     O  \n",
       "1       CCONJ           CC                  cc                 0     O  \n",
       "2       PROPN          NNP            compound                 3     O  \n",
       "3       PROPN          NNP               nsubj                12     O  \n",
       "4       PUNCT            ,               punct                 3     O  \n",
       "...       ...          ...                 ...               ...   ...  \n",
       "99251   PROPN          NNP                pobj             99250     O  \n",
       "99252     DET           DT                 det             99253     O  \n",
       "99253    NOUN           NN            npadvmod             99245     O  \n",
       "99254   PUNCT            .               punct             99243     O  \n",
       "99255      ''        punct               99243                 O   NaN  \n",
       "\n",
       "[99256 rows x 13 columns]"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "import pandas as pd\n",
    "df = pd.read_csv(\"data/harry_potter/harry_potter.tokens\", delimiter=\"\\t\")\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ethical-tourist",
   "metadata": {},
   "source": [
    "We have approximately 99,000 rows and 13 columns of data. Throughout this chapter, we will focus on only four columns in particular:\n",
    "\n",
    "- sentence_ID\n",
    "- word\n",
    "- lemma\n",
    "- event\n",
    "\n",
    "As such, let's go ahead and remove all the extra data for now so that we can just view the columns we care about."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "86db149b-10d1-4ae3-bf5f-b4acdd904077",
   "metadata": {},
   "source": [
    "df = df[[\"sentence_ID\", \"word\", \"lemma\",  \"event\"]]\n",
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7c0324bf-5c65-4c77-94d8-5b9bb5c6d815",
   "metadata": {},
   "source": [
    "Excellent! Now we can analyze this event column a bit more easily."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b2615c17-bdd0-450c-9e25-ee1a105d69e3",
   "metadata": {},
   "source": [
    "## Grabbing the Events"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e1f476a5-4633-40a6-aa55-fc323f2e9052",
   "metadata": {},
   "source": [
    "One of the things we can see above is that some event columns contain NaN. Ideally, we want to ignore these entirely. We can do this in pandas by using the isnull() method."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 57,
   "id": "secure-radiation",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sentence_ID</th>\n",
       "      <th>word</th>\n",
       "      <th>lemma</th>\n",
       "      <th>event</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Mr.</td>\n",
       "      <td>Mr.</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>and</td>\n",
       "      <td>and</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>Mrs.</td>\n",
       "      <td>Mrs.</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>Dursley</td>\n",
       "      <td>Dursley</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>,</td>\n",
       "      <td>,</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99250</th>\n",
       "      <td>6172</td>\n",
       "      <td>with</td>\n",
       "      <td>with</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99251</th>\n",
       "      <td>6172</td>\n",
       "      <td>Dudley</td>\n",
       "      <td>Dudley</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99252</th>\n",
       "      <td>6172</td>\n",
       "      <td>this</td>\n",
       "      <td>this</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99253</th>\n",
       "      <td>6172</td>\n",
       "      <td>summer</td>\n",
       "      <td>summer</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99254</th>\n",
       "      <td>6172</td>\n",
       "      <td>....</td>\n",
       "      <td>....</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>94498 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       sentence_ID     word    lemma event\n",
       "0                0      Mr.      Mr.     O\n",
       "1                0      and      and     O\n",
       "2                0     Mrs.     Mrs.     O\n",
       "3                0  Dursley  Dursley     O\n",
       "4                0        ,        ,     O\n",
       "...            ...      ...      ...   ...\n",
       "99250         6172     with     with     O\n",
       "99251         6172   Dudley   Dudley     O\n",
       "99252         6172     this     this     O\n",
       "99253         6172   summer   summer     O\n",
       "99254         6172     ....     ....     O\n",
       "\n",
       "[94498 rows x 4 columns]"
      ]
     },
     "execution_count": 57,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "events = df[~df['event'].isnull()]\n",
    "events"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dedfaeec-a371-449b-b830-4245d5b6f0c5",
   "metadata": {},
   "source": [
    "As we can see this eliminated roughly 5,000 rows. Let's take a closer look at the column event and see what kind of data we can expect to see here."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "f2e3507b-01bc-4a08-8287-3a52d3feb37b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'EVENT', 'O'}\n"
     ]
    }
   ],
   "source": [
    "event_options = set(events.event.tolist())\n",
    "print (event_options)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "palestinian-movement",
   "metadata": {},
   "source": [
    "By converting this column to a list and then to a set (which eliminates the duplicates), we can see that we have two types of data in the event column:\n",
    "\n",
    "- EVENT\n",
    "- O\n",
    "\n",
    "If a row has \"EVENT\" in the column then it means the corresponding word was identified by the BookNLP pipeline as being an event-triggering word. Now that we know this, let's take a look at only the rows that have EVENT in the event column."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "id": "db43a6d1-20ab-4028-a8ed-5e3912713b26",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sentence_ID</th>\n",
       "      <th>word</th>\n",
       "      <th>lemma</th>\n",
       "      <th>event</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>242</th>\n",
       "      <td>9</td>\n",
       "      <td>shuddered</td>\n",
       "      <td>shudder</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>308</th>\n",
       "      <td>12</td>\n",
       "      <td>woke</td>\n",
       "      <td>wake</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>346</th>\n",
       "      <td>13</td>\n",
       "      <td>hummed</td>\n",
       "      <td>hum</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>349</th>\n",
       "      <td>13</td>\n",
       "      <td>picked</td>\n",
       "      <td>pick</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>361</th>\n",
       "      <td>13</td>\n",
       "      <td>gossiped</td>\n",
       "      <td>gossip</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99152</th>\n",
       "      <td>6167</td>\n",
       "      <td>hung</td>\n",
       "      <td>hang</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99185</th>\n",
       "      <td>6169</td>\n",
       "      <td>said</td>\n",
       "      <td>say</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99209</th>\n",
       "      <td>6170</td>\n",
       "      <td>said</td>\n",
       "      <td>say</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99215</th>\n",
       "      <td>6170</td>\n",
       "      <td>surprised</td>\n",
       "      <td>surprised</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99218</th>\n",
       "      <td>6170</td>\n",
       "      <td>grin</td>\n",
       "      <td>grin</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>6029 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       sentence_ID       word      lemma  event\n",
       "242              9  shuddered    shudder  EVENT\n",
       "308             12       woke       wake  EVENT\n",
       "346             13     hummed        hum  EVENT\n",
       "349             13     picked       pick  EVENT\n",
       "361             13   gossiped     gossip  EVENT\n",
       "...            ...        ...        ...    ...\n",
       "99152         6167       hung       hang  EVENT\n",
       "99185         6169       said        say  EVENT\n",
       "99209         6170       said        say  EVENT\n",
       "99215         6170  surprised  surprised  EVENT\n",
       "99218         6170       grin       grin  EVENT\n",
       "\n",
       "[6029 rows x 4 columns]"
      ]
     },
     "execution_count": 58,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "real_events = events.loc[df[\"event\"] == \"EVENT\"]\n",
    "real_events"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "fe03c383-35f1-4488-b4e1-e588a3cfb8ed",
   "metadata": {},
   "source": [
    "We now have only 6,029 rows to analyze!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "previous-physiology",
   "metadata": {},
   "source": [
    "## Analyzing Events Words and Lemmas"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "black-frame",
   "metadata": {},
   "source": [
    "Let's dig a little deeper. Let's try to analyze the words and lemmas of these rows to see how many unique words and lemmas we have."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 59,
   "id": "animal-governor",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1501"
      ]
     },
     "execution_count": 59,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "event_words = set(real_events.word.tolist())\n",
    "len(event_words)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 60,
   "id": "3bfb56e0-4290-4368-b1e6-126229788675",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "1021"
      ]
     },
     "execution_count": 60,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "event_lemmas = list(set(real_events.lemma.tolist()))\n",
    "event_lemmas.sort()\n",
    "len(event_lemmas)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 61,
   "id": "b578c827-81c6-48b8-be34-5487a0756e2c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "['BOOM', 'Bludger', 'Pompously', 'Scowling', 'Smelting', 'Whispers', 'aback', 'accept', 'ache', 'act']\n"
     ]
    }
   ],
   "source": [
    "print (event_lemmas[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "bcf20a20-e739-4004-8314-5b09672e2b08",
   "metadata": {},
   "source": [
    "While we have 1501 unique words, we only have 1021 unique lemmas. If we were interested in seeing the type of event words and lemmas appear in Harry Potter, we can now do that, but something I notice quickly is that some lemmas are capitalized. Let's eliminate all duplicates by lowering all lemmas."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "id": "bb396dd1-abe5-4307-b6c0-cfdddc2a7b34",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1020\n",
      "['boom', 'bludger', 'pompously', 'scowling', 'smelting', 'whispers', 'aback', 'accept', 'ache', 'act']\n"
     ]
    }
   ],
   "source": [
    "final_lemmas = []\n",
    "for lemma in event_lemmas:\n",
    "    lemma = lemma.lower()\n",
    "    if lemma not in final_lemmas:\n",
    "        final_lemmas.append(lemma)\n",
    "        \n",
    "print(len(final_lemmas))\n",
    "print(final_lemmas[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "513694a5-aad7-4c10-8e05-9d29079a554d",
   "metadata": {},
   "source": [
    "We eliminated only one duplicate."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "floral-antique",
   "metadata": {},
   "source": [
    "## Grabbing Event Sentences"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "hybrid-milwaukee",
   "metadata": {},
   "source": [
    "Now that we know how to grab individual event-triggering words, what about the sentences that contain events? To analyze this, we can use the sentence_ID column which contains a unique number for each sentence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 63,
   "id": "plain-chester",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[9, 12, 13, 13, 13, 13, 13, 14, 15, 15]\n",
      "['shuddered', 'woke', 'hummed', 'picked', 'gossiped', 'wrestled', 'screaming', 'flutter', 'picked', 'pecked']\n"
     ]
    }
   ],
   "source": [
    "sentences = real_events.sentence_ID.tolist()\n",
    "events = real_events.word.tolist()\n",
    "print (sentences[:10])\n",
    "print (events[:10])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "critical-capital",
   "metadata": {},
   "source": [
    "We can see that some sentences appear multiple times. This is because they contain multiple words that are event-triggering.\n",
    "\n",
    "Let's take a look at our initial DataFrame once again."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 64,
   "id": "fantastic-provincial",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sentence_ID</th>\n",
       "      <th>word</th>\n",
       "      <th>lemma</th>\n",
       "      <th>event</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>0</td>\n",
       "      <td>Mr.</td>\n",
       "      <td>Mr.</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>0</td>\n",
       "      <td>and</td>\n",
       "      <td>and</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>0</td>\n",
       "      <td>Mrs.</td>\n",
       "      <td>Mrs.</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>0</td>\n",
       "      <td>Dursley</td>\n",
       "      <td>Dursley</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>0</td>\n",
       "      <td>,</td>\n",
       "      <td>,</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99251</th>\n",
       "      <td>6172</td>\n",
       "      <td>Dudley</td>\n",
       "      <td>Dudley</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99252</th>\n",
       "      <td>6172</td>\n",
       "      <td>this</td>\n",
       "      <td>this</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99253</th>\n",
       "      <td>6172</td>\n",
       "      <td>summer</td>\n",
       "      <td>summer</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99254</th>\n",
       "      <td>6172</td>\n",
       "      <td>....</td>\n",
       "      <td>....</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>99255</th>\n",
       "      <td>6172</td>\n",
       "      <td>\\t</td>\n",
       "      <td>438951</td>\n",
       "      <td>NaN</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>99256 rows × 4 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "       sentence_ID     word    lemma event\n",
       "0                0      Mr.      Mr.     O\n",
       "1                0      and      and     O\n",
       "2                0     Mrs.     Mrs.     O\n",
       "3                0  Dursley  Dursley     O\n",
       "4                0        ,        ,     O\n",
       "...            ...      ...      ...   ...\n",
       "99251         6172   Dudley   Dudley     O\n",
       "99252         6172     this     this     O\n",
       "99253         6172   summer   summer     O\n",
       "99254         6172     ....     ....     O\n",
       "99255         6172       \\t   438951   NaN\n",
       "\n",
       "[99256 rows x 4 columns]"
      ]
     },
     "execution_count": 64,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "parallel-element",
   "metadata": {},
   "source": [
    "Let's say we were interested in grabbing the first sentence from the first event, we can grab all rows that have a matching sentence_ID."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 65,
   "id": "headed-dollar",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>sentence_ID</th>\n",
       "      <th>word</th>\n",
       "      <th>lemma</th>\n",
       "      <th>event</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>240</th>\n",
       "      <td>9</td>\n",
       "      <td>The</td>\n",
       "      <td>the</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>241</th>\n",
       "      <td>9</td>\n",
       "      <td>Dursleys</td>\n",
       "      <td>Dursleys</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>242</th>\n",
       "      <td>9</td>\n",
       "      <td>shuddered</td>\n",
       "      <td>shudder</td>\n",
       "      <td>EVENT</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>243</th>\n",
       "      <td>9</td>\n",
       "      <td>to</td>\n",
       "      <td>to</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>244</th>\n",
       "      <td>9</td>\n",
       "      <td>think</td>\n",
       "      <td>think</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>245</th>\n",
       "      <td>9</td>\n",
       "      <td>what</td>\n",
       "      <td>what</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>246</th>\n",
       "      <td>9</td>\n",
       "      <td>the</td>\n",
       "      <td>the</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>247</th>\n",
       "      <td>9</td>\n",
       "      <td>neighbors</td>\n",
       "      <td>neighbor</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>248</th>\n",
       "      <td>9</td>\n",
       "      <td>would</td>\n",
       "      <td>would</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>249</th>\n",
       "      <td>9</td>\n",
       "      <td>say</td>\n",
       "      <td>say</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>250</th>\n",
       "      <td>9</td>\n",
       "      <td>if</td>\n",
       "      <td>if</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>251</th>\n",
       "      <td>9</td>\n",
       "      <td>the</td>\n",
       "      <td>the</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>252</th>\n",
       "      <td>9</td>\n",
       "      <td>Potters</td>\n",
       "      <td>Potters</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>253</th>\n",
       "      <td>9</td>\n",
       "      <td>arrived</td>\n",
       "      <td>arrive</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>254</th>\n",
       "      <td>9</td>\n",
       "      <td>in</td>\n",
       "      <td>in</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>255</th>\n",
       "      <td>9</td>\n",
       "      <td>the</td>\n",
       "      <td>the</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>256</th>\n",
       "      <td>9</td>\n",
       "      <td>street</td>\n",
       "      <td>street</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>257</th>\n",
       "      <td>9</td>\n",
       "      <td>.</td>\n",
       "      <td>.</td>\n",
       "      <td>O</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     sentence_ID       word     lemma  event\n",
       "240            9        The       the      O\n",
       "241            9   Dursleys  Dursleys      O\n",
       "242            9  shuddered   shudder  EVENT\n",
       "243            9         to        to      O\n",
       "244            9      think     think      O\n",
       "245            9       what      what      O\n",
       "246            9        the       the      O\n",
       "247            9  neighbors  neighbor      O\n",
       "248            9      would     would      O\n",
       "249            9        say       say      O\n",
       "250            9         if        if      O\n",
       "251            9        the       the      O\n",
       "252            9    Potters   Potters      O\n",
       "253            9    arrived    arrive      O\n",
       "254            9         in        in      O\n",
       "255            9        the       the      O\n",
       "256            9     street    street      O\n",
       "257            9          .         .      O"
      ]
     },
     "execution_count": 65,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "sentence1 = sentences[0]\n",
    "result = df[df[\"sentence_ID\"] == int(sentence)]\n",
    "result"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "747260e2-3d9a-4762-a997-829681692573",
   "metadata": {},
   "source": [
    "With this data, we can then grab all the words and reconstruct the sentence."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 46,
   "id": "convertible-arrow",
   "metadata": {},
   "outputs": [],
   "source": [
    "words = result.word.tolist()\n",
    "resentence = \" \".join(words)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 47,
   "id": "ongoing-determination",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street .\n"
     ]
    }
   ],
   "source": [
    "print (resentence)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "dcaeb98c-844b-4ce1-99cc-5f76be4adb1f",
   "metadata": {},
   "source": [
    "## Bringing Everything Together"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c8de1569-cd7c-400b-9dbd-f3fde0eed7fd",
   "metadata": {},
   "source": [
    "Let's now bring everything together we just learned in this chapter and make it into a function. This function will receive a file that corresponds to the .tokens file. It will find the relevant event rows and then reconstruct the sentences that correspond to each event word. The output will be a list of dictionaries that are event-centric. Each dictionary will have 3 keys:\n",
    "\n",
    "- event_word = the event-triggering word\n",
    "- event_lemma = the event_word's lemma\n",
    "- sentence = the sentence that the event-triggering word is in"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 53,
   "id": "de4726b0-167c-48ad-ae37-4a6ed18f4649",
   "metadata": {},
   "outputs": [],
   "source": [
    "def grab_event_sentences(file):\n",
    "    df = pd.read_csv(file, delimiter=\"\\t\")\n",
    "    real_events = df.loc[df[\"event\"] == \"EVENT\"]\n",
    "    sentences = real_events.sentence_ID.tolist()\n",
    "    event_words = real_events.word.tolist()\n",
    "    event_lemmas = real_events.lemma.tolist()\n",
    "    final_sentences = []\n",
    "    x=0\n",
    "    for sentence in sentences:\n",
    "        result = df[df[\"sentence_ID\"] == int(sentence)]\n",
    "        words = result.word.tolist()\n",
    "        resentence = \" \".join(words)\n",
    "        final_sentences.append({\"event_word\": event_words[x],\n",
    "                                \"event_lemma\": event_lemmas[x],\n",
    "                                \"sentence\": resentence\n",
    "                                   \n",
    "                               })\n",
    "        x=x+1\n",
    "    return final_sentences\n",
    "    \n",
    "    \n",
    "event_data = grab_event_sentences(\"data/harry_potter/harry_potter.tokens\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "15095738-6b42-4671-bca0-ff91fb5fd095",
   "metadata": {},
   "source": [
    "Let's take a look at the output now."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 54,
   "id": "a51d0ea6-2444-4c44-af10-4050fbbe1ff8",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'event_word': 'shuddered', 'event_lemma': 'shudder', 'sentence': 'The Dursleys shuddered to think what the neighbors would say if the Potters arrived in the street .'}\n"
     ]
    }
   ],
   "source": [
    "print (event_data[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c824f1bf-7065-4309-a151-fddc5f8e48ae",
   "metadata": {},
   "source": [
    "## Creating a .events File."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "de2211e1-39d2-48e9-bd92-4d7ee22b17a0",
   "metadata": {},
   "source": [
    "This allows us to now analyze the events identified in the BookNLP pipeline a bit more easily. Since we don't have a .events output file, this is currently one way that we can simulate the same result by creating a special events-centric output. With this data, we can now create a new DataFrame."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 66,
   "id": "d0ecc2d9-e4e9-44a2-9337-74e498a7a9d7",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_df = pd.DataFrame(event_data)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 67,
   "id": "393e714b-6e09-4eb4-89fc-7c2ae7fc442f",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>event_word</th>\n",
       "      <th>event_lemma</th>\n",
       "      <th>sentence</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>shuddered</td>\n",
       "      <td>shudder</td>\n",
       "      <td>The Dursleys shuddered to think what the neigh...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>woke</td>\n",
       "      <td>wake</td>\n",
       "      <td>When Mr. and Mrs. Dursley woke up on the dull ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>hummed</td>\n",
       "      <td>hum</td>\n",
       "      <td>Mr. Dursley hummed as he picked out his most b...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>picked</td>\n",
       "      <td>pick</td>\n",
       "      <td>Mr. Dursley hummed as he picked out his most b...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>gossiped</td>\n",
       "      <td>gossip</td>\n",
       "      <td>Mr. Dursley hummed as he picked out his most b...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>...</th>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "      <td>...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6024</th>\n",
       "      <td>hung</td>\n",
       "      <td>hang</td>\n",
       "      <td>Harry hung back for a last word with Ron and H...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6025</th>\n",
       "      <td>said</td>\n",
       "      <td>say</td>\n",
       "      <td>\\t \\t Hope you have -- er -- a good holiday , ...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6026</th>\n",
       "      <td>said</td>\n",
       "      <td>say</td>\n",
       "      <td>\\t Oh , I will , \\t said Harry , and they were...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6027</th>\n",
       "      <td>surprised</td>\n",
       "      <td>surprised</td>\n",
       "      <td>\\t Oh , I will , \\t said Harry , and they were...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6028</th>\n",
       "      <td>grin</td>\n",
       "      <td>grin</td>\n",
       "      <td>\\t Oh , I will , \\t said Harry , and they were...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>6029 rows × 3 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "     event_word event_lemma                                           sentence\n",
       "0     shuddered     shudder  The Dursleys shuddered to think what the neigh...\n",
       "1          woke        wake  When Mr. and Mrs. Dursley woke up on the dull ...\n",
       "2        hummed         hum  Mr. Dursley hummed as he picked out his most b...\n",
       "3        picked        pick  Mr. Dursley hummed as he picked out his most b...\n",
       "4      gossiped      gossip  Mr. Dursley hummed as he picked out his most b...\n",
       "...         ...         ...                                                ...\n",
       "6024       hung        hang  Harry hung back for a last word with Ron and H...\n",
       "6025       said         say  \\t \\t Hope you have -- er -- a good holiday , ...\n",
       "6026       said         say  \\t Oh , I will , \\t said Harry , and they were...\n",
       "6027  surprised   surprised  \\t Oh , I will , \\t said Harry , and they were...\n",
       "6028       grin        grin  \\t Oh , I will , \\t said Harry , and they were...\n",
       "\n",
       "[6029 rows x 3 columns]"
      ]
     },
     "execution_count": 67,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "new_df"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e8e7f25f-5b27-4145-9a33-db6202a81e2c",
   "metadata": {},
   "source": [
    "We can also output it to the same subdirectory as the other files."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 70,
   "id": "984b898a-0a19-4f24-a6e2-af0f4652bccc",
   "metadata": {},
   "outputs": [],
   "source": [
    "new_df.to_csv(\"data/harry_potter/harry_potter.events\", index=False)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d53e481f-aaa4-4f0e-aa5a-2d3ec07e8e7d",
   "metadata": {},
   "source": [
    "And now you have a .events file!"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "c586cbe3-d4f8-4b55-8bc1-074979c21d38",
   "metadata": {},
   "source": [
    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8e1914f2-1ba2-4ec7-ac6c-52557eaae17f",
   "metadata": {},
   "source": [
    "You should now have a basic understanding of BookNLP and what it can do. While the results will not be perfect, it will give you a great starting point for understanding the salient characters, extracting, quotes, and identifying the major events within a large work of fiction."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "99000543-38e7-48c5-b15f-5573cbc2ced1",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}