Skip to content

Commit 4a6b3c6

Browse files
authored
Update 03_Word2Vec_Example.ipynb
need to check token.lower() for membership in mystopwords, otherwise you will include tokens like 'The'
1 parent 8f80241 commit 4a6b3c6

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

Ch4/03_Word2Vec_Example.ipynb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -173,7 +173,7 @@
173173
" mystopwords = set(stopwords.words(\"english\"))\n",
174174
" def remove_stops_digits(tokens):\n",
175175
" #Nested function that lowercases, removes stopwords and digits from a list of tokens\n",
176-
" return [token.lower() for token in tokens if token not in mystopwords and not token.isdigit()\n",
176+
" return [token.lower() for token in tokens if token.lower() not in mystopwords and not token.isdigit()\n",
177177
" and token not in punctuation]\n",
178178
" #This return statement below uses the above function to process twitter tokenizer output further. \n",
179179
" return [remove_stops_digits(word_tokenize(text)) for text in texts]\n",

0 commit comments

Comments
 (0)