# Top2Vec in Python¶

Dr. W.J.B. Mattingly
Smithsonian Data Science Lab and United States Holocaust Memorial Museum
February 2021

## Importing the Required Libraries¶

import pandas as pd

df = pd.read_json("data/vol7.json")
df

names descriptions
0 AARON, Thabo Simon An ANCYL member who was shot and severely inju...
1 ABBOTT, Montaigne A member of the SADF who was severely injured ...
2 ABDUL WAHAB, Zakier A member of QIBLA who disappeared in September...
3 ABRAHAM, Nzaliseko Christopher A COSAS supporter who was kicked and beaten wi...
4 ABRAHAMS, Achmat Fardiel Was shot and blinded in one eye by members of ...
... ... ...
21742 ZWENI, Ernest One of two South African Police (SAP) members ...
21743 ZWENI, Lebuti An ANC supporter who was shot dead by a named ...
21744 ZWENI, Louis Was shot dead in Tokoza, Transvaal, on 22 May ...
21745 ZWENI, Mpantesa William His home was lost in an arson attack by Witdoe...
21746 ZWENI, Xolile Milton A Transkei Defence Force (TDF) soldier who was...

21747 rows × 2 columns

docs = df.descriptions.tolist()
docs[0]

"An ANCYL member who was shot and severely injured by SAP members at Lephoi, Bethulie, Orange Free State (OFS) on 17 April 1991. Police opened fire on a gathering at an ANC supporter's house following a dispute between two neighbours, one of whom was linked to the ANC and the other to the SAP and a councillor."

from top2vec import Top2Vec

model = Top2Vec(docs)

2022-05-03 13:43:46,332 - top2vec - INFO - Pre-processing documents for training
2022-05-03 13:43:47,980 - top2vec - INFO - Creating joint document/word embedding
2022-05-03 13:44:24,486 - top2vec - INFO - Creating lower dimension embedding of documents
2022-05-03 13:44:32,712 - top2vec - INFO - Finding dense areas of documents
2022-05-03 13:44:35,024 - top2vec - INFO - Finding topics

topic_sizes, topic_nums = model.get_topic_sizes()
print (topic_sizes)

[705 336 231 230 215 208 206 204 203 199 199 199 188 186 178 175 171 166
164 164 157 154 151 147 143 138 133 132 131 130 126 124 124 122 122 119
119 119 118 118 115 114 113 112 112 111 110 109 109 107 106 103 103 102
101 100 100  97  95  95  95  94  93  92  92  91  91  90  90  89  89  88
88  87  87  87  87  86  85  83  83  83  83  82  82  82  81  81  81  80
80  80  79  78  78  78  77  77  77  76  76  76  76  76  75  75  75  75
74  74  73  73  72  72  71  70  70  68  68  68  68  68  66  65  65  65
65  65  64  64  64  63  62  62  62  61  61  61  61  61  60  60  59  59
59  58  58  58  58  57  57  57  56  56  56  56  55  55  55  55  55  55
55  54  54  54  54  54  54  53  53  53  52  52  52  52  51  51  51  51
50  50  50  49  49  49  49  49  49  49  49  49  49  49  48  48  48  48
48  48  48  48  47  47  47  47  46  46  45  45  45  45  44  44  44  44
44  44  44  44  44  44  44  43  43  43  42  41  40  40  40  39  39  39
39  39  39  39  38  38  37  37  37  37  37  36  36  36  35  34  34  34
33  33  33  33  33  33  32  32  31  31  31  31  30  30  29  29  29  29
29  28  28  27  27  27  27  27  27  26  25  25  24  24  23  22  21  20
16]

print (topic_nums)

[  0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17
18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35
36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53
54  55  56  57  58  59  60  61  62  63  64  65  66  67  68  69  70  71
72  73  74  75  76  77  78  79  80  81  82  83  84  85  86  87  88  89
90  91  92  93  94  95  96  97  98  99 100 101 102 103 104 105 106 107
108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143
144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179
180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197
198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215
216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233
234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251
252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269
270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287
288]

topic_words, word_scores, topic_nums = model.get_topics(1)

for words, scores, num in zip(topic_words, word_scores, topic_nums):
print (num)
print (f"Words: {words}")

0
Words: ['sonkombo' 'ndwedwe' 'chimora' 'arson' 'ekuthuleni' 'whose' 'down'
'burnt' 'intense' 'possessions' 'house' 'durban' 'kwazulu' 'inanda' 'red'
'attacks' 'had' 'destroyed' 'continuing' 'stronghold' 'green' 'near'
'bhambayi' 'ongoing' 'eshowe' 'supporters' 'home' 'her' 'conflict'
'intensifying' 'area' 'enseleni' 'run' 'ifp' 'kwamashu' 'lost'
'magabheni' 'she' 'between' 'march' 'umlazi' 'nqutu' 'see' 'factions'
'political' 'richmond' 'lindelani' 'stolen' 'homestead' 'richards']

documents, document_scores, document_ids = model.search_documents_by_topic(topic_num=0, num_docs=10)
for doc, score, doc_id in zip(documents, document_scores, document_ids):
print(f"Document: {doc_id}, Score: {score}")
print("-----------")
print(doc)
print("-----------")
print()

Document: 4729, Score: 0.9770890474319458
-----------
An ANC supporter whose house was burnt down in an arson attack in February 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 1025, Score: 0.9690396785736084
-----------
An IFP supporter whose home was burnt down at Sonkombo, Ndwedwe, KwaZulu, near Durban, on 16 March 1994 in intense conflict between IFP and ANC supporters. See Sonkombo arson attacks.
-----------

Document: 4508, Score: 0.9685753583908081
-----------
An ANC supporter whose house was burnt down by IFP supporters on 20 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 4644, Score: 0.9668073654174805
-----------
An ANC supporter whose house was burnt down by IFP supporters on 16 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 4600, Score: 0.9648160934448242
-----------
An ANC supporter whose house was burnt down by IFP supporters on 16 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 5624, Score: 0.9644435048103333
-----------
An ANC supporter who had his home burnt down by IFP supporters on 16 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 4707, Score: 0.9638468027114868
-----------
An ANC supporter whose house was burnt down by IFP supporters on 16 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 504, Score: 0.9632097482681274
-----------
Had her house burnt down by IFP supporters on 16 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 740, Score: 0.9629842042922974
-----------
His house was burnt down by IFP supporters on 20 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban, in intense political conflict in the area. See Sonkombo arson attacks.
-----------

Document: 2170, Score: 0.9621144533157349
-----------
An ANC supporter whose house was burnt down by IFP supporters on 20 March 1994 at Sonkombo, Ndwedwe, KwaZulu, near Durban. See Sonkombo arson attacks.
-----------