3vc16cs034-k R Tejaswini
3vc16cs034-k R Tejaswini
3vc16cs034-k R Tejaswini
An
Internship Report
“SENTIMENT CLASSIFICATION USING N-GRAM
IDF ”
COMPUTER SCIENCE AND ENGINEERING
Submitted by
K R TEJASWINI
USN: 3VC16CS034
CERTIFICATE
Certified that Internship work entitled “SENTIMENT
CLASSIFICATION USING N-GRAM IDF” carried out by K R
Tejaswini bearing USN: 3VC16CS034, in partial fulfillment for the award
of Bachelor of Engineering in Computer Science and Engineering of the
Visvesvaraya Technological University, Belgaum during the year 2019-
2020. It is certified that all corrections/suggestions indicated for internal
assessment have been incorporated in the report deposited in the library. The
internship report has been approved as it satisfies the academic requirements
in respect of internship work prescribed for the said Degree.
…………………………..
…………………………
Signature Of Internship Co-Ordinator Signature of
HOD
Name of Examiners: Signature:
1)
2)
VEERASHAIVA VIDYAVARDAHKA SANGHA ’S
RAO BAHADUR Y MAHABALESWARAPPA ENGINEERING
COLLEGE
CANTONMENT, BALLARI-583104, KARNATAKA
2019 – 2020
DECLARATION
I would like to express our regards and acknowledgement to all those who helped
in making this seminar possible.
I am grateful to the Principal Dr. K Veeresh for providing facilities and untiring
zeal, which constantly inspired me towards the attainment of everlasting
knowledge throughout the course.
Computer Science and Engineering department for the constant guidance for the
Finally, I would like to thank all the staff members of Computer Science and
Engineering department for their guidance and support. I am also thankful to my
family and friends who continue to give me best support
NAME: K R TEJASWINI
USN: 3VC16CS034
TABLE OF CONTENTS
Chapter. Sub Serial Page
Content
NO Number Number
1 About Organization
1.1 Brief History 1
1.2 Company Services 2-3
1.3 Company Products 3
1.4 Company Management Directories 3
1.5 Company Domains 4-5
1.6 Client and Present Projects 5
2 About The Department
2.1 Introduction 6
3 Internship Domain
3.1 Introduction 7-10
3.2 Technology and Libraries Used 11
3.3 Tools Used 12-13
4 Task Performed
4.1 Technical Activities 14-16
4.2 Non Technical Activities 17
5 Internship Outcome
5.1 Project Title – Sentiment Classification 18
5.2 Internship Summary 18
5.3 Snapshots 20-24
5.4 Conclusion 24
Bibliography 25
CHAPTER 1
ABOUT THE ORGANIZATION
After the successful completion of the project, the team started approaching the clients
who were in need. The company got couple of good clients n started serving them. That
is how the company started generating the revenue. Since the team members were
experts in Java, python,Machine learning and Android, the company simultaneously
started to develop websites and few latest apps needed for the clients.
We not only design the online web but also we do web design and development adding
various features of multimedia technology into it and come up with the best and unique
low cost outcome at very short duration of time.
IT Service
Choosing the Information technology as one of the service area we provide all the
necessary service related to it.
Server Maintenance
Knowing the importance of maintenance of servers in the website hoisting and online
webs with multimedia we do have our own separate server and we do maintain them.
Pharma
Providing the product for on of the leading pharmaceutical to purchase the medicine
online.
As we need to following the trending the online food by developing one more new
product
IT Training
Knowing the importance of skilled trainers in the Industry we do offer Industrial Training
for the interested persons and train them with utmost knowledge and latest skill and
making them confident enough to face the challenges of the IT world.
Online Web and MultimediaWe not only design the online web but also we do online
web design and development adding various features of multimedia technology into it
and come up with the best and unique low cost outcome at very short duration of time.
Cloud Computing
We do Cloud Computing and related services which is the practice of using a network of
remote servers hosted on the Internet to store, manage, and process data, rather than a
local server or a personal computer.
Matrimonial SiteAs we know marriage is one of the main event in our lives, but in
market we have some existing apps because of slow service launching new app a smart
role in the development of any Business either small scale or large scale.
Project Management
1.3Company Products
UI Moderization
Using the cutting edge technologies to create the most powerful visual experiance to the
end users.
Security Systems
Security systems which are used in apartments, schools, collages etc. it provides security
in situations like, if any fire occurs or if smoke is generated then it detects and sounds an
alarm and it releases the water to stop the fire. This security system is related to
IOTThese type of sensors are very common and are found either wired directly to an
alarm control panel, or they can typically be found in wireless door or window contacts
as sub-components.
Food Ordering System
Now a days we have many applications for ordering food.It is quite different when
compared to existing systems. The main aim of this app is to collect the food which is
about to get wasted in any restaurants, hostels etc, and deliver that food to orphanages.In
this app we are using java as backend and in java they are using a technology called
spring boot micro services and for communications they are using android app and IOS
app.Food delivery riders do not usually get any insurance cover or sick pay, since they
are independent contractors. Deliveroo chose to give the riders insurance in the United
Kingdom.
HR Management
The Company has a separate HR department for training and Recruiting Purpose. We do
offer HR management skills for the needed.
Present Projects:
The company presently working for Next Power Systems Pvt Ltd to fulfill their
requirement with respect the advancement in Network Towers. The project
concentrates on providing 24/7 power supply, switching between Ac supply, and
Battery, Fault monitoring and RTC. The project is near in completion.The
company is also working on Embedded products and Android apps in the field of
E-Commerce.
CHAPTER 2
ABOUT THE DEPARTMENT
2.1 Introduction
3.1 Introduction
Machine learning is a method of data analysis that automates analytical model building.
It is inherently different rather than pushing the commands by programmer regarding
how to solve; it explains how to proceed towards learning to solve the problem on its
own.These technologies are widely used in projects including Spelling correction in web
search engines, Analysis of information from IOT devices, Real-time language
translation and much more.
Machine learning algorithms are replacing a large amount of the jobs across the world, in
the upcoming years. The algorithms can be broadly classified as Supervised,
Unsupervised, Reinforcement Learning and others on the basis of their different
categories.
Fig 3.1 Classification of Machine Learning
3.2.1 Python
3.2.2Numpy
NumPy is a very popular python library for large multi-dimensional array and matrix
processing, with the help of a large collection of high-level mathematical functions. It is
very useful for fundamental scientific computations in Machine Learning. It is
particularly useful for linear algebra, Fourier transform, and random number capabilities.
High-end libraries like TensorFlow uses NumPy internally for manipulation of Tensors.
3.2.3Flask
3.2.4 Matplotlib
Matpoltlib is a very popular Python library for data visualization. Like Pandas, it is not
directly related to Machine Learning. It particularly comes in handy when a programmer
wants to visualize the patterns in the data. It is a 2D plotting library used for creating 2D
graphs and plots. A module named pyplot makes it easy for programmers for plotting as
it provides features to control line styles, font properties, formatting axes, etc. It provides
various kinds of graphs and plots for data visualization, viz., histogram, error charts, bar
chats, etc.
3.2.5 TextBlob
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple
API for diving into common natural language processing (NLP) tasks such as part-of-
speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and
more.
3.2.6PyMySQL
3.2.7 MonkeyLearn
MonkeyLearn provides ready-to-use models for specific text analysis tasks such as
sentiment analysis, keyword extraction, or urgency detection.You just have to upload a
bunch of texts and tagging them manually. After you've fed your model a few examples,
it will start making predictions on its own.
3.2.8 CSV
A CSV file (Comma Separated Values file) is a type of plain text file that uses specific
structuring to arrange tabular data. Because it's a plain text file, it can contain only actual
text data—in other words, printable ASCII or Unicode characters. The structure of a CSV
file is given away by its name.
3.3 Tools Used
3.3.1 PYCHARM
A web browser is a computer program that is used to access the web (to view webpages).
A browser can also be used to download files, send and receive email or short messages
across the internet.
1.The DFD is also called as bubble chart.It is a simple graphical formalism that can be
used to represent a system in terms of input data to the system, various processing
carried out on this data, and the output data is generated by this system.
2.The data flow diagram (DFD) is one of the most important modeling tools.It is used to
model the system components.These components are the system process, that data used
by the process an external entity that interacts with the system and the information
and the transformations that are applied as data moves from input to output.
4.2 Non-Technical Activities
Performance Requirement
1. Dependability
The dependability of a computer system is a property of the system that equates to its
trustworthiness. Trustworthiness essentially means the degree of user confidence that the
system will operate as they expect and that the system will not 'fail' in normal use.
2.Availability
It is the ability of the system to deliver services when requested. There is no error in
theprogram while executing the program.
3. Reliability
The ability of the system to deliver services as specified. The program is compatible
withall types of operating system without any failure.
4. Safety
It is the ability of the system to operate without catastrophic failure. This program is user
friendly and it will never affect the system.
5. Security
It is the ability of the system to protect itself against accidental or deliberate intrusion.
CHAPTER 5
INTERNSHIP OUTCOME
In this project,we propose a machine learning based approach using n-gram features and
an automated machine learning tool for sentiment classification.Although n-gram phrases
areconsidered to be informative and useful compared to single words, using all n gram
phrases is not a good idea because of the large volume of data and many useless
features.To address this problem,we utilize n-gram IDF, a theoretical extensionof Inverse
Document Frequency(IDF).IDF measures how much information the word provides;but
it cannot handle multiple words.
Home page
Admin login
User details
Registration
User login
Profile
Send request
5.4 Conclusion
In this paper,we proposed a sentiment classification method using n-gram IDF and
automates machine learning.We apply this method on three datasets including question
and answers from Stack Overflow,reviews of mobile applications, and comments on jira
issue trackers.Our good classification performance is not based only on advanced
automated machine learning.N-gram IDF also worked well to capture dataset
specific,software-engineering related positive,neutral and negative expressions.Because
of the capability of extracting useful sentiment expressions with n-gram IDF,our method
can be applicable to various software engineering datasets.
BIBLIOGRAPHY.
[1] Y. Zhang and D. Hou, “Extracting problematic API features from forum discussions,” in
Proceedings of 21st International Conference on Program Comprehension (ICPC), 2013, pp.
142–151.
[3] M. Ortu, B. Adams, G. Destefanis, P. Tourani, M. Marchesi, and R. Tonelli, “Are bullies
more productive?: Empirical study of affectiveness vs. issue fixing time,” in Proceedings of
12th Working Conference on Mining Software Repositories (MSR), 2015, pp. 303–313.