NLP Assignment Anand1
NLP Assignment Anand1
text5.collocations()
Q2: Define a variable my_sent to be a list of words. Convert my_sent into string and then split it as list of
words.
>>>my_sent=[‘Anand’,’Prakash’]
>>>a=’ ’.join(my_sent)
>>>a
‘Anand Prakash’
>>>a.split(‘ ’)
[‘Anand’,’Prakash’]
>>>text9.index(‘sunset’)
629
running = set(sent1)
sorted(list(running))
Q5: What is the difference between the following two lines: >>> sorted(set([w.lower() for w in text1]))
>>> sorted([w.lower() for w in set(text1)])
>>> sorted(set([w.lower() for w in text1]))
In this firstly every word will convert in lower case then set will be created. So there will be no
repetition.
In this firstly set of words will be created. So, lower as well as upper case characters will also be
For example-
[‘ ’,’a’,’a’,’b’,’k’,’l’,’m’,’n’,’s’,’s’]
[‘ ’,’a’,’b’,’k’,’l’,’m’,’n’,’s’]
Q6: Write the slice expression that extracts the last two words of text2
text2[-2:]
In [1]: text2[-2:]
Q7: Find all the four-letter words in the Chat Corpus (text5). With the help of a frequency distribution
(FreqDist), show these words in decreasing order of frequency
f = FreqDist(text5)
list(reversed(sorted(reversed_pairs)))
Q8: Use a combination of for and if statements to loop over the words of the movie script for Monty
Python and the Holy Grail (text6) and print all the uppercase words
print i
Q9: Write expressions for finding all words in text6 that meet the following conditions. a. Ending in ize b.
Containing the letter z c. Containing the sequence of letters pt d. All lowercase letters except for an
initial capital (i.e., titlecase)
Out[1]: []
Out[1]:
['zhiv',
'zone',
'frozen',
'amazes',
'zoo',
'zoop',
'zoosh',
'AMAZING',
'ZOOT',
'Zoot',
'Fetchez']
Out[1]:
['Chapter',
'temptress',
'temptation',
'excepting',
'Thppt',
'Thppppt',
'Thpppt',
'ptoo',
'Thpppppt',
'aptly',
'empty']
Q10: Define sent to be the list of words ['she', 'sells', 'sea', 'shells', 'by', 'the', 'sea', 'shore']. Now write
code to perform the following tasks: a. Print all words beginning with sh. b. Print all words longer than
four characters
Q11: What does the following Python code do? sum([len(w) for w in text1]) Can you use it to work out
the average word length of a text?
Q12: Define a function called vocab_size(text) that has a single parameter for the text, and which
returns the vocabulary size of the text.
def vocab_size(text):
Q13: Define a function percent(word, text) that calculates how often a given word occurs in a text and
expresses the result as a percentage.
total = len(text)
occurs = text.count(word)