Fraud Detection in Python Chapter4
Fraud Detection in Python Chapter4
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
Let's practice!
DataCamp Fraud Detection in Python
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
1. Tokenization
Go from this...
DataCamp Fraud Detection in Python
To this...
DataCamp Fraud Detection in Python
exclude = set(string.punctuation)
stop = set(stopwords.words('english'))
stop_free = " ".join([word for word in text
if((word not in stop) and (not word.isdigit()))])
punc_free = ''.join(word for word in stop_free
if word not in exclude)
DataCamp Fraud Detection in Python
# Stem words
from nltk.stem.porter import PorterStemmer
porter= PorterStemmer()
cleaned_text = " ".join(porter.stem(token) for token in normalized.split())
print (cleaned_text)
['philip','going','street','curious','hear','perspective','may','wish',
'offer','trading','floor','enron','stock','lower','joined','company',
'business','school','imagine','quite','happy','people','day','relate',
'somewhat','stock','around','fact','broke','day','ago','knowing',
'imagine','letting','event','get','much','taken','similar',
'problem','hope','everything','else','going','well','family','knee',
'surgery','yet','give','call','chance','later']
DataCamp Fraud Detection in Python
Let's practice!
DataCamp Fraud Detection in Python
Topic modelling
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
# Create corpus
corpus = [dictionary.doc2bow(text) for text in cleaned_emails]
DataCamp Fraud Detection in Python
# Print the three topics from the model with top words
topics = ldamodel.print_topics(num_words=4)
for topic in topics:
print(topic)
Let's practice!
DataCamp Fraud Detection in Python
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python
pyLDAvis.display(lda_display)
DataCamp Fraud Detection in Python
Let's practice!
DataCamp Fraud Detection in Python
Fraud detection in
Python Recap
Charlotte Werger
Data Scientist
DataCamp Fraud Detection in Python