{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Creating LDA in Python" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ " In the last section, we learned a lot about the key idea and concepts behind LDA topic modeling. Let's now put those ideas into practice. In this section, we will be using the Gensim library to create our topic model and the PyLDAVis library to visualize it. You can install both libraries with `pip` with the following commands:\n", " \n", "```\n", "pip install gensim\n", "```\n", "\n", "and\n", "\n", "```\n", "pip install pyldavis\n", "```\n", "\n", "We will also need to install NLTK, or the Natural Language Toolkit, in order to get a list of stop words. We can install the library with pip:\n", "\n", "```\n", "pip install nltk\n", "```\n", "\n", "Once you have installed `NLTK`, you will need to download the list of English stop words. You can do so with the following command:\n", "\n", "```\n", "nltk.download('stopwords')\n", "```" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Importing the Required Libraries and Data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now that we have our libraries installed correctly, we can import everything." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import pandas as pd\n", "from nltk.corpus import stopwords\n", "import string\n", "import gensim.corpora as corpora\n", "from gensim.models import LdaModel\n", "import pyLDAvis.gensim_models\n", "pyLDAvis.enable_notebook()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This will import the requisite model from Gensim. For this notebook, we will be using the `LdaModel` class. This class allows us to create an LDA model. Before we can populate our model, however, we must first load and clean our data." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", " | Last | \n", "First | \n", "Description | \n", "
---|---|---|---|
0 | \n", "AARON | \n", "Thabo Simon | \n", "An ANCYL member who was shot and severely inju... | \n", "
1 | \n", "ABBOTT | \n", "Montaigne | \n", "A member of the SADF who was severely injured ... | \n", "
2 | \n", "ABRAHAM | \n", "Nzaliseko Christopher | \n", "A COSAS supporter who was kicked and beaten wi... | \n", "
3 | \n", "ABRAHAMS | \n", "Achmat Fardiel | \n", "Was shot and blinded in one eye by members of ... | \n", "
4 | \n", "ABRAHAMS | \n", "Annalene Mildred | \n", "Was shot and injured by members of the SAP in ... | \n", "
... | \n", "... | \n", "... | \n", "... | \n", "
20829 | \n", "XUZA | \n", "Mandla | \n", "Was severely injured when he was stoned by a f... | \n", "
20830 | \n", "YAKA | \n", "Mbangomuni | \n", "An IFP supporter and acting induna who was sho... | \n", "
20831 | \n", "YALI | \n", "Khayalethu | \n", "Was shot by members of the SAP in Lingelihle, ... | \n", "
20832 | \n", "YALO | \n", "Bikiwe | \n", "An IFP supporter whose house and possessions w... | \n", "
20833 | \n", "YALOLO-BOOYSEN | \n", "Geoffrey Yali | \n", "An ANC supporter and youth activist who was to... | \n", "
20834 rows × 3 columns
\n", "