Programming Assignment 3: Logistic Regression Instructions

This programming assignment involves sentiment classification using logistic regression. Students are asked to implement logistic regression from scratch using NumPy and SciPy for part 1, and with scikit-learn for part 2. The dataset contains 50,000 movie reviews split into separate training and test sets. Students must preprocess the data, extract features, train a logistic regression classifier on the training set, and evaluate its performance on the test set. The assignment is due by November 15th, 2020.

Uploaded by

Muneeb Nawaz

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

186 views

Programming Assignment 3: Logistic Regression Instructions

Uploaded by

Muneeb Nawaz

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

Programming Assignment 3: Logistic Regression

Instructions:
 The aim of this assignment is to give you an initial hands-on regarding real-life
machine learning application.
 Use separate training and testing data as discussed in class.
 You can only use Python programming language and Jupyter Notebook.
 There are two parts of this assignment. In part 1, you can only use numpy, scipy,
pandas, matplotlib and are not allowed to use NLTK, scikit-learn or any other
machine learning toolkit. However, you have to use scikit-learn in part 2.
 Carefully read the submission instructions, plagiarism and late days policy at
the end of assignment.
 Deadline to submit this assignment is: Sunday 15th November, 2020.

Problem:
The purpose of this assignment is to get you familiar with sentiment classification. By the
end of this assignment you will have your very own “Sentiment Analyzer”. You are given with
Large Movie Review Dataset that contains separate labelled train and test set. Your task is to
train a Logistic Regression classifier on train set and report accuracy on test set.

Dataset:
The core dataset contains 50,000 reviews split evenly into 25k train and 25k test sets. The
overall distribution of labels is balanced (25k pos and 25k neg). There are two top-level
directories [train/, test/] corresponding to the training and test sets. Each contains [pos/,
neg/] directories for the reviews with binary labels positive and negative. Within these
directories, reviews are stored in text files named following the convention [[id]_[rating].txt]
where [id] is a unique id and [rating] is the star rating for that review on a 1-10 scale. For
example, the file [test/pos/200_8.txt] is the text for a positive-labeled test set example with
unique id 200 and star rating 8/10 from IMDb.

Preprocessing:
In the preprocessing step you’re required to remove the stop words and punctuation marks
and other unwanted characters from the reviews and convert them to lower case. You may
find the string and regex module useful for this purpose. A stop word list is provided with
the assignment statement.

Feature Extraction:
In the feature extraction step you can you’ll represent each review by the 3 features 𝑥0, 𝑥1, 𝑥2
and 1 class label 𝑦 as shown in the table below:
Feature Definition Comment
𝑥0 1 bias term
𝑥1 count(positive words) ∈ review Positive lexicon is provided
𝑥2 count(negative words) ∈ review Negative lexicon is provided
𝑦 1 if positive, 0 otherwise Mentioned in directory name

Part 1:
Implement Logistic Regression from scratch keeping in view all the discussions from the
class lectures. Feel free to read Chapter 5 of Speech and Language Processing book to get in-
depth insight of Logistic Regression classifier. Specifically, you’ll need to implement the
following:

 Sigmoid function
 Cross-entropy loss function
 Batch Gradient Descent
 Prediction function that predict whether the label is 0 or 1 for test reviews using
learned logistic regression (use the decision threshold of 0.5)
 Evaluation function that calculates classification accuracy and confusion matrix on
test set (the expected accuracy on the test set is around 72%)
 Report plots with no. of iterations/ epochs on x-axis and training/ validation loss on
y-axis.
Use the procedural programming style and comment your code thoroughly (just like
programming assignment 2).

Part 2:
Use scikit-learn’s Logistic Regression implementation to train and test the logistic regression
on the provided dataset. Use scikit-learn’s accuracy_score function to calculate the accuracy
and confusion_matrix function to calculate confusion matrix on test set.

Submission Instructions:
Submit your code both as notebook file (.ipynb) and python script (.py) on LMS. The name
of both files should be your roll number. If you don’t know how to save .ipynb as .py see this.
Failing to submit any one of them will result in the reduction of marks.

Plagiarism Policy:
The code MUST be done independently. Any plagiarism or cheating of work from others or
the internet will be immediately referred to the DC. If you are confused about what
constitutes plagiarism, it is YOUR responsibility to consult with the instructor or the TA in a
timely manner. No “after the fact” negotiations will be possible. The only way to guarantee
that you do not lose marks is “DO NOT LOOK AT ANYONE ELSE'S CODE NOR DISCUSS IT
WITH THEM”.

Late Days Policy:

The deadline of the assignment is final. However, in order to accommodate all the 11th hour
issues there is a late submission policy i.e. you can submit your assignment within 3 days
after the deadline with 25% deduction each day.

Support Vector Machines: (Vapnik, 1979)
No ratings yet
Support Vector Machines: (Vapnik, 1979)
34 pages
Kabir Data Preprocessing Python
No ratings yet
Kabir Data Preprocessing Python
14 pages
08 Natural Language Processing in Tensorflow
No ratings yet
08 Natural Language Processing in Tensorflow
29 pages
Unbalanced Data Loading For Multi-Task Learning in PyTorch (Blog)
No ratings yet
Unbalanced Data Loading For Multi-Task Learning in PyTorch (Blog)
11 pages
Assignment 5 - NN
No ratings yet
Assignment 5 - NN
4 pages
MIC Assignment4
No ratings yet
MIC Assignment4
9 pages
Introduction To NLTK
No ratings yet
Introduction To NLTK
101 pages
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
No ratings yet
An Introduction To Supervised Learning With Scikit-Learn: Machine Learning: The Problem Setting
4 pages
Lab 1 - Machine Learning with Python - ML Engineering مهم
No ratings yet
Lab 1 - Machine Learning with Python - ML Engineering مهم
10 pages
Pytorch Lightning Readthedocs Latest
100% (1)
Pytorch Lightning Readthedocs Latest
421 pages
Scikit-Learn
No ratings yet
Scikit-Learn
8 pages
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
No ratings yet
Recognizing Handwritten Digits With Scikit-Learn: Punam Seal
21 pages
Fundamentals of Machine Learning Support Vector Machines, Practical Session
No ratings yet
Fundamentals of Machine Learning Support Vector Machines, Practical Session
4 pages
Data Preprocesing JavaPoint
No ratings yet
Data Preprocesing JavaPoint
19 pages
Applied Machine Learning in Python: Nikhil Sharma 1710991526 Data Science Batch
No ratings yet
Applied Machine Learning in Python: Nikhil Sharma 1710991526 Data Science Batch
27 pages
DIP Lab Manual No 02
No ratings yet
DIP Lab Manual No 02
24 pages
Implemented LeNet on PyTorch
100% (1)
Implemented LeNet on PyTorch
17 pages
Machine Learning - Python Libraries
No ratings yet
Machine Learning - Python Libraries
12 pages
Transformer Architecture
No ratings yet
Transformer Architecture
18 pages
Python SciKit Learn Tutorial _ DigitalOcean
No ratings yet
Python SciKit Learn Tutorial _ DigitalOcean
11 pages
Image Classification Using Backpropagation Algorithm (Presentation)
No ratings yet
Image Classification Using Backpropagation Algorithm (Presentation)
23 pages
08250771
No ratings yet
08250771
8 pages
Unit 2 ML
No ratings yet
Unit 2 ML
93 pages
423/723 Natural Language Processing: Assignment 1
No ratings yet
423/723 Natural Language Processing: Assignment 1
4 pages
Pytorch (Tabular) - Regression
No ratings yet
Pytorch (Tabular) - Regression
13 pages
Expert System Architecture
No ratings yet
Expert System Architecture
5 pages
LDA
No ratings yet
LDA
10 pages
Designing An ML-Minded Product and A Product-Minded ML System
No ratings yet
Designing An ML-Minded Product and A Product-Minded ML System
43 pages
Pytorch Seq2Seq
No ratings yet
Pytorch Seq2Seq
12 pages
Data Mining Final Exam
No ratings yet
Data Mining Final Exam
1 page
A Generative Adversari AL Network Based Deep Learning Method For Low Quality Defect Image Reconstruction and Recognition
No ratings yet
A Generative Adversari AL Network Based Deep Learning Method For Low Quality Defect Image Reconstruction and Recognition
4 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Comcast Writeup Ai
No ratings yet
Comcast Writeup Ai
1 page
Data Science
No ratings yet
Data Science
39 pages
Ain Shams University Faculty of Engineering
No ratings yet
Ain Shams University Faculty of Engineering
2 pages
Pytorch
No ratings yet
Pytorch
38 pages
Support Vector Machines
No ratings yet
Support Vector Machines
16 pages
SVM
No ratings yet
SVM
12 pages
Scikit
No ratings yet
Scikit
81 pages
Pytorch Paper
No ratings yet
Pytorch Paper
12 pages
Data Analysis Library: by Muthu Priya J 19MZ06
No ratings yet
Data Analysis Library: by Muthu Priya J 19MZ06
3 pages
Machine Learning With Scikit-Learn: George Boorman
No ratings yet
Machine Learning With Scikit-Learn: George Boorman
34 pages
Chapter 6 ML Classifications
No ratings yet
Chapter 6 ML Classifications
51 pages
Deep Learning With Tensorflow
No ratings yet
Deep Learning With Tensorflow
15 pages
A Recipe For Training Neural Networks
No ratings yet
A Recipe For Training Neural Networks
15 pages
Project 3 - Phishing Detector Using LR
No ratings yet
Project 3 - Phishing Detector Using LR
3 pages
Pyhton Lab Manual IS2132 July-Dec2023
No ratings yet
Pyhton Lab Manual IS2132 July-Dec2023
33 pages
Intro SVM New Example PDF
100% (1)
Intro SVM New Example PDF
56 pages
Scikit Learn
No ratings yet
Scikit Learn
10 pages
ML_Pipelines_AI_Community
No ratings yet
ML_Pipelines_AI_Community
53 pages
Pytorch Tutorial 1 Rev 1
No ratings yet
Pytorch Tutorial 1 Rev 1
48 pages
Applying LLMs To Threat Intelligence - by Thomas Roccia - Nov, 2023 - SecurityBreak
No ratings yet
Applying LLMs To Threat Intelligence - by Thomas Roccia - Nov, 2023 - SecurityBreak
25 pages
Intro To Scikit Learning
No ratings yet
Intro To Scikit Learning
18 pages
Literature Review On Feature Selection Methods For HighDimensional Data
No ratings yet
Literature Review On Feature Selection Methods For HighDimensional Data
9 pages
Panaversity Cloud Native Applied Generative AI Engineer
No ratings yet
Panaversity Cloud Native Applied Generative AI Engineer
36 pages
Supervised Learning: Andreas Müller
No ratings yet
Supervised Learning: Andreas Müller
43 pages
Course Plan Natural Language Processing
No ratings yet
Course Plan Natural Language Processing
5 pages
COMP-377 Lab2
No ratings yet
COMP-377 Lab2
3 pages
CSCI946 Assignment_1_task_sheet
No ratings yet
CSCI946 Assignment_1_task_sheet
4 pages
COMP 4650 6490 Assignment 3 2023-v1.1
No ratings yet
COMP 4650 6490 Assignment 3 2023-v1.1
6 pages
Java Simple Program
No ratings yet
Java Simple Program
7 pages
Bhargav Gudise Resume
No ratings yet
Bhargav Gudise Resume
2 pages
Custom Views
No ratings yet
Custom Views
11 pages
Chap 2
No ratings yet
Chap 2
11 pages
1 IntoOOP
No ratings yet
1 IntoOOP
17 pages
BSS Implementation: Technical Description
No ratings yet
BSS Implementation: Technical Description
926 pages
OS
No ratings yet
OS
20 pages
CHAPTER 2: Variables and Data Types
No ratings yet
CHAPTER 2: Variables and Data Types
7 pages
CS401 CH 3
No ratings yet
CS401 CH 3
24 pages
4.create Insurance Template Container-R14
No ratings yet
4.create Insurance Template Container-R14
22 pages
Problem Solving With MATLAB: CPET 190
No ratings yet
Problem Solving With MATLAB: CPET 190
32 pages
Zsdzs
No ratings yet
Zsdzs
35 pages
Python Project
No ratings yet
Python Project
12 pages
Oracle Queries
80% (5)
Oracle Queries
606 pages
Mean Stack Development Semester Important
No ratings yet
Mean Stack Development Semester Important
2 pages
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
No ratings yet
CH16 COA9e Instruction Level Parallelism and Superscalar Processors
20 pages
Lists Make #2 Lists, Loops, and Traversals ('22-'23)
No ratings yet
Lists Make #2 Lists, Loops, and Traversals ('22-'23)
1 page
Provisioning Service Overview: User Guide, Vol. 1. To Activate The NMC-RX Provisioning Service, You
No ratings yet
Provisioning Service Overview: User Guide, Vol. 1. To Activate The NMC-RX Provisioning Service, You
6 pages
Unit - IV - Pointers
No ratings yet
Unit - IV - Pointers
87 pages
Algorithms & Data Structures CS-IT Workbook
No ratings yet
Algorithms & Data Structures CS-IT Workbook
83 pages
Face Mask Detection New3
No ratings yet
Face Mask Detection New3
47 pages
Scilab and Scicos Revised
No ratings yet
Scilab and Scicos Revised
143 pages
2QL S4hana2022 BPD en Us
No ratings yet
2QL S4hana2022 BPD en Us
43 pages
Built in Data Type
No ratings yet
Built in Data Type
19 pages
DBMS Lab (18IS507) Manual With Solutions-1
No ratings yet
DBMS Lab (18IS507) Manual With Solutions-1
24 pages
Web Programming Lab Manual 26 May
No ratings yet
Web Programming Lab Manual 26 May
26 pages
Resume of Fariha Rahman
No ratings yet
Resume of Fariha Rahman
2 pages
Python Exercises Solutions
No ratings yet
Python Exercises Solutions
5 pages
Development of A Rfid Based Door Access System 1
No ratings yet
Development of A Rfid Based Door Access System 1
71 pages
Pygame Beginner Module
No ratings yet
Pygame Beginner Module
15 pages

Programming Assignment 3: Logistic Regression Instructions

Uploaded by

Programming Assignment 3: Logistic Regression Instructions

Uploaded by

Programming Assignment 3: Logistic Regression

Late Days Policy:

You might also like