Module2.4 Text Processing
Module2.4 Text Processing
• Output
• ['Welcome', 'to', 'presidency', 'univerveity', 'bangalore', 'karnataka', '.']
Example 2
• import nltk
• from nltk.tokenize import word_tokenize
• word_tokenize("won’t")
• Output
• ['wo', "n't"]]
• Hint: nltk.download('punkt')
WordPunktTokenizer Class
An alternative word tokenizer that splits all punctuation into separate
tokens .
• # An alternative word tokenizer that splits all punctuation into
separate tokens.
• from nltk.tokenize import WordPunctTokenizer
• tokenizer = WordPunctTokenizer()
• tokenizer.tokenize(" I can't allow you to go home early")
• Outout
• ['I', 'can', "'", 't', 'allow', 'you', 'to', 'go', 'home', 'early']