0% found this document useful (0 votes)

166 views

NLP Assignment Anand1

This document contains sample code and questions for an assignment on natural language processing (NLP). It includes code snippets and explanations for tasks like finding collocations in text, converting lists to strings and splitting strings into lists of words, finding word indices, computing word vocabularies across sentences, extracting word slices from text, finding words by length or characteristics, looping through words and applying conditions, defining functions for vocabulary size and word percentage frequency. The assignment covers a range of basic NLP concepts and techniques.

Uploaded by

naman

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

166 views

NLP Assignment Anand1

Uploaded by

naman

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

Assignment 1-NLP

Anand Prakash Singh

Coe2-101503027

Q1: Find the collocations in text5

text5.collocations()

sorted(set([i for i in text5 if i.startswith('b')]))

Q2: Define a variable my_sent to be a list of words. Convert my_sent into string and then split it as list of
words.

>>>my_sent=[‘Anand’,’Prakash’]

>>>a=’ ’.join(my_sent)

>>>a

‘Anand Prakash’

>>>a.split(‘ ’)

[‘Anand’,’Prakash’]

Q3: Find the index of the word sunset in text9.

>>>text9.index(‘sunset’)

629

Q4:Compute the vocabulary of the sentences sent1 ... sent8

running = set(sent1)

running.update(sent2, sent3, sent4, sent5, sent6, sent7, sent8)

running = set([w.lower() for w in running])

sorted(list(running))

Q5: What is the difference between the following two lines: >>> sorted(set([w.lower() for w in text1]))
>>> sorted([w.lower() for w in set(text1)])
>>> sorted(set([w.lower() for w in text1]))

In this firstly every word will convert in lower case then set will be created. So there will be no

repetition.

>>>sorted([w.lower() for w in set(text1)])

In this firstly set of words will be created. So, lower as well as upper case characters will also be

present. So, after conversion there can be repetition of words.

For example-

>>> a=’aaaaaa bbbbbbbbnnnnnnllllKKKKaAAAasSS mmmmmm’;

>>>sorted([w.lower() for w in set(a)])

[‘ ’,’a’,’a’,’b’,’k’,’l’,’m’,’n’,’s’,’s’]

>>>sorted(set([w.lower() for w in a]))

[‘ ’,’a’,’b’,’k’,’l’,’m’,’n’,’s’]

Q6: Write the slice expression that extracts the last two words of text2

text2[-2:]

In [1]: text2[-2:]

Out[1]: ['THE', 'END']

Q7: Find all the four-letter words in the Chat Corpus (text5). With the help of a frequency distribution
(FreqDist), show these words in decreasing order of frequency

a = set([word for word in text5 if len(word) == 4])

f = FreqDist(text5)

reversed_pairs = [(v, k) for k, v in f.items()]

list(reversed(sorted(reversed_pairs)))

Q8: Use a combination of for and if statements to loop over the words of the movie script for Monty
Python and the Holy Grail (text6) and print all the uppercase words

all_uppers = set([w for w in text6 if w.isupper()])

for i in all_uppers:

print i

Q9: Write expressions for finding all words in text6 that meet the following conditions. a. Ending in ize b.
Containing the letter z c. Containing the sequence of letters pt d. All lowercase letters except for an
initial capital (i.e., titlecase)

End with ize

In [1]: [w for w in text6 if len(w) > 4 and w[-3:] == ('ize')]

Out[1]: []

Containing the letter z

In [1]: list(set([w for w in text6 if w.lower().find('z') != -1]))

Out[1]:

['zhiv',

'zone',

'frozen',

'amazes',

'zoo',

'zoop',

'zoosh',

'AMAZING',

'ZOOT',

'Zoot',

'Fetchez']

c. Containing the sequence of letters pt

In [1]: list(set([w for w in text6 if w.lower().find('pt') != -1]))

Out[1]:

['Chapter',
'temptress',

'temptation',

'excepting',

'Thppt',

'Thppppt',

'Thpppt',

'ptoo',

'Thpppppt',

'aptly',

'empty']

d. All lowercase letters except for an initial capital (i.e., titlecase)

list(set([w for w in text6 if w[0].isupper() and w[1:].islower()]))

Q10: Define sent to be the list of words ['she', 'sells', 'sea', 'shells', 'by', 'the', 'sea', 'shore']. Now write
code to perform the following tasks: a. Print all words beginning with sh. b. Print all words longer than
four characters

In [1]: [w for w in sent if w[0:2] == 'sh']

Out[1]: ['she', 'shells', 'shore']

Q11: What does the following Python code do? sum([len(w) for w in text1]) Can you use it to work out
the average word length of a text?

It returns the sum total of the lengths of all "words" in text1.

Yes, we can Use it

avg_w_len = sum([len(w) for w in text1]) / float(len(text1))

Q12: Define a function called vocab_size(text) that has a single parameter for the text, and which
returns the vocabulary size of the text.

def vocab_size(text):

distinct = set([w.lower() for w in text])

return len(distinct)

Q13: Define a function percent(word, text) that calculates how often a given word occurs in a text and
expresses the result as a percentage.

def percent(word, text):

total = len(text)

occurs = text.count(word)

return 100 * occurs / floac(total)

Rod Acceleration: by Bruce Richards
No ratings yet
Rod Acceleration: by Bruce Richards
4 pages
Lenguaje de Procesamiento
No ratings yet
Lenguaje de Procesamiento
7 pages
Python Practice Exam 2
No ratings yet
Python Practice Exam 2
7 pages
FileHandringPractical Questions Python
No ratings yet
FileHandringPractical Questions Python
5 pages
Text File Question Bank Solutions
No ratings yet
Text File Question Bank Solutions
14 pages
CS_100_Fall_2017_Final
No ratings yet
CS_100_Fall_2017_Final
10 pages
Ch-5 - File Handling
No ratings yet
Ch-5 - File Handling
15 pages
Batch 2
No ratings yet
Batch 2
13 pages
Final Pe1
No ratings yet
Final Pe1
6 pages
ANSHIKA'S PROJECT DO NOT TOUCH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
No ratings yet
ANSHIKA'S PROJECT DO NOT TOUCH!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
15 pages
TSA Student
No ratings yet
TSA Student
20 pages
anshika cs project
No ratings yet
anshika cs project
20 pages
Python Practice Exam 1
No ratings yet
Python Practice Exam 1
6 pages
13 march tasks anwers
No ratings yet
13 march tasks anwers
3 pages
Xii Wsheet
No ratings yet
Xii Wsheet
3 pages
REPORT FILE - MAIN
No ratings yet
REPORT FILE - MAIN
20 pages
Latest Python Programs Solution Class XII
No ratings yet
Latest Python Programs Solution Class XII
6 pages
TEXT FILE HANDLING
No ratings yet
TEXT FILE HANDLING
4 pages
Code on python
No ratings yet
Code on python
18 pages
File Handing Practical
No ratings yet
File Handing Practical
18 pages
python programmes
No ratings yet
python programmes
14 pages
PROGRAMS
No ratings yet
PROGRAMS
4 pages
Ccs369 - Text and Speech Analysis - Lab Manual
100% (1)
Ccs369 - Text and Speech Analysis - Lab Manual
23 pages
Revision Text File - Answers
No ratings yet
Revision Text File - Answers
8 pages
Document 1 (2)
No ratings yet
Document 1 (2)
58 pages
Text Analysis With NLTK Cheatsheet
No ratings yet
Text Analysis With NLTK Cheatsheet
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Text Analysis With NLTK Cheatsheet PDF
No ratings yet
Text Analysis With NLTK Cheatsheet PDF
3 pages
Assignment Textfile 20230525210733459 22052024 083944
No ratings yet
Assignment Textfile 20230525210733459 22052024 083944
6 pages
PRACTICLfile
No ratings yet
PRACTICLfile
45 pages
STRINGS
No ratings yet
STRINGS
2 pages
COMPUTERSCIENCE_ASSIGNMENT
No ratings yet
COMPUTERSCIENCE_ASSIGNMENT
27 pages
Assignment2_Fall_2024
No ratings yet
Assignment2_Fall_2024
6 pages
Program Flowchart
No ratings yet
Program Flowchart
1 page
1
No ratings yet
1
13 pages
Assignment Text 2 Key
No ratings yet
Assignment Text 2 Key
4 pages
Assignment 3
No ratings yet
Assignment 3
10 pages
Programs Py
No ratings yet
Programs Py
17 pages
Text File (3 Mark)
No ratings yet
Text File (3 Mark)
16 pages
Computer Scinece Practical File
No ratings yet
Computer Scinece Practical File
52 pages
Class 12 Cs Final Prac
No ratings yet
Class 12 Cs Final Prac
68 pages
Python Ass 2
No ratings yet
Python Ass 2
7 pages
Data File Handling Abcd
No ratings yet
Data File Handling Abcd
9 pages
Revision Tour 2
No ratings yet
Revision Tour 2
26 pages
FDS2
No ratings yet
FDS2
6 pages
J.K. Institute of Applied Physics and Technology: Natural Language Processing Assignment
No ratings yet
J.K. Institute of Applied Physics and Technology: Natural Language Processing Assignment
22 pages
Lab Manual Ex 1-6 - 230508 - 182126
No ratings yet
Lab Manual Ex 1-6 - 230508 - 182126
14 pages
Sample Paper 11 (1)
No ratings yet
Sample Paper 11 (1)
4 pages
Python Session Questions 2
No ratings yet
Python Session Questions 2
2 pages
FEL 301 Activity 2
No ratings yet
FEL 301 Activity 2
2 pages
He He Ha Wha Wude My Ahhhhhhh
No ratings yet
He He Ha Wha Wude My Ahhhhhhh
1 page
File Handling Questions 2
No ratings yet
File Handling Questions 2
4 pages
Adobe Scan 10 Dec 2023
No ratings yet
Adobe Scan 10 Dec 2023
23 pages
Coding
No ratings yet
Coding
10 pages
Hangman Report1
No ratings yet
Hangman Report1
10 pages
05 - Dictionaries and Tuples
No ratings yet
05 - Dictionaries and Tuples
61 pages
Python Experiments
No ratings yet
Python Experiments
13 pages
Py
No ratings yet
Py
11 pages
Strings Problems
No ratings yet
Strings Problems
24 pages
Profound Python Data Science
From Everand
Profound Python Data Science
Onder Teker
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
Flight Ticket - Chandigarh To Ahmedabad: Passenger's Name Status 1. Miss Kanika Malhotra Confirmed
No ratings yet
Flight Ticket - Chandigarh To Ahmedabad: Passenger's Name Status 1. Miss Kanika Malhotra Confirmed
3 pages
Recommender System
No ratings yet
Recommender System
5 pages
StudentDetails 02jul20191956
No ratings yet
StudentDetails 02jul20191956
6 pages
A Typical Stream Cipher Encrypts Plaintext One Byte at A Time
No ratings yet
A Typical Stream Cipher Encrypts Plaintext One Byte at A Time
3 pages
Object Detection
No ratings yet
Object Detection
4 pages
Invigilator Diary
No ratings yet
Invigilator Diary
3 pages
Fundamental or Derived Different Types of Metrics
No ratings yet
Fundamental or Derived Different Types of Metrics
1 page
Assignment 2 (A)
No ratings yet
Assignment 2 (A)
4 pages
Word Level Analyis III
No ratings yet
Word Level Analyis III
24 pages
123assignment I
No ratings yet
123assignment I
1 page
Activity 1
No ratings yet
Activity 1
1 page
Method in One's Madness: The Last Straw
No ratings yet
Method in One's Madness: The Last Straw
1 page
Approximation Algorithms
No ratings yet
Approximation Algorithms
37 pages
3 Sol
No ratings yet
3 Sol
3 pages
2
No ratings yet
2
2 pages
Classroom of The Elite Vol. 5
88% (8)
Classroom of The Elite Vol. 5
372 pages
5.-How-to-expand-ideas-in-IELTS-Speaking-Strategies-and-Practice-Tests
No ratings yet
5.-How-to-expand-ideas-in-IELTS-Speaking-Strategies-and-Practice-Tests
72 pages
Hubungan Trombositopenia Dengan Manifestasi Klinis Perdarahan Pada Pasien Demam Berdarah Dengue Anak
No ratings yet
Hubungan Trombositopenia Dengan Manifestasi Klinis Perdarahan Pada Pasien Demam Berdarah Dengue Anak
7 pages
Discussion Guide TED Talks For Aspiring Student Leaders
No ratings yet
Discussion Guide TED Talks For Aspiring Student Leaders
4 pages
Analysis of Water For Mercury Using Light: 2008 Project Summary
No ratings yet
Analysis of Water For Mercury Using Light: 2008 Project Summary
1 page
O&m Vsa 800-1000-1200 - en
No ratings yet
O&m Vsa 800-1000-1200 - en
53 pages
Free Booklet Part1 PDF
No ratings yet
Free Booklet Part1 PDF
15 pages
Xtratherm Cavitytherm Brochure UK Web
No ratings yet
Xtratherm Cavitytherm Brochure UK Web
16 pages
Supplementary 92 I. Choose The Word That Best Completes Each Sentence
No ratings yet
Supplementary 92 I. Choose The Word That Best Completes Each Sentence
4 pages
Speaking Ideas For TOEFL
No ratings yet
Speaking Ideas For TOEFL
4 pages
END OF TERM 1 2025 TIMETABLE FINAL_114012
No ratings yet
END OF TERM 1 2025 TIMETABLE FINAL_114012
1 page
Documentation and Citation
No ratings yet
Documentation and Citation
7 pages
Time Management
No ratings yet
Time Management
22 pages
Emerging Markets
No ratings yet
Emerging Markets
35 pages
Final Exam Booklet - Grade 9 - Term 1
No ratings yet
Final Exam Booklet - Grade 9 - Term 1
15 pages
Lab Report 1 - PHD 226
No ratings yet
Lab Report 1 - PHD 226
4 pages
Stand Out of Our Light
100% (1)
Stand Out of Our Light
152 pages
Visionary Cities
No ratings yet
Visionary Cities
73 pages
Cheetah Xi50 PDF
No ratings yet
Cheetah Xi50 PDF
116 pages
7 Habits of Highly Effective People
100% (1)
7 Habits of Highly Effective People
4 pages
GE1 Module 6 Physical Self
No ratings yet
GE1 Module 6 Physical Self
10 pages
America and I
No ratings yet
America and I
2 pages
How To Improve A Thesis Statement
100% (3)
How To Improve A Thesis Statement
5 pages
2HSG
No ratings yet
2HSG
9 pages
ERGO II Nabila Gidan Kintan Faizal
No ratings yet
ERGO II Nabila Gidan Kintan Faizal
28 pages
Personal Statement
No ratings yet
Personal Statement
3 pages
Micah Josephson CV
No ratings yet
Micah Josephson CV
7 pages
02 Task Performance 1 - ARG GROUP-5-funda
No ratings yet
02 Task Performance 1 - ARG GROUP-5-funda
5 pages
2008 AJC H2 Economics Prelim Exam
No ratings yet
2008 AJC H2 Economics Prelim Exam
2 pages