Paper 09
Paper 09
Paper 09
Indian Journal of Science and Technology, Vol 10(17), DOI: 10.17485/ijst/2017/v10i17/106493, May 2017 ISSN (Online) : 0974-5645
Abstract
Background/Objectives: Supervised techniques uses human generated summary to select features and parameter for
summarization. The main problem in this approach is reliability of summary based on human generated parameters
and features. Many researches have shown the conflicts in summary generated. Due to diversity of large scale datasets,
supervised techniques based summarization also fails to meet the requirements. Big data analytics for text dataset also
recommends unsupervised techniques than supervised techniques. Unsupervised techniques based summarization
systems finds representative sentences from large amount of text dataset. Methods/Statistical Analysis: Co-selection
based evaluation measure is applied for evaluating the proposed research work. The value of recall, precision, f-measure
and similarity measure are determined for concluding the research outcome for the respective objective. Findings: The
algorithms like KMeans, MiniBatchKMeans, and Graph based summarization techniques are discussed with all technical
details. The results achieved by applying Graph Based Text Summarization techniques with large scale review and
feedback data found improvement over previously published results based on sentence scoring using TF and TF-IDF. Graph
based sentence scoring method is much efficient than other unsupervised learning techniques applied for extractive text
summarization. Application/Improvements: The execution of graph based algorithm with Spark’s Graph X programming
environment will secure execution time for this types of large scale review and feedback dataset which is considered under
Big Data Problem.
tion systems based on approach in which number of Decision Trees, Hidden Markov Model, Log-linear
documents are selected for analyzing the dataset. Multi Models, and Neural Networks. Algebraic Approaches2–4
document summarization is more complex than single such as Latent Semantic Analysis (LSA), Non-negative
document summarization but it recommends for review Matrix Factorization (NMF), and Semi-discrete Matrix
and feedback summarization. Generic and Query-Based Decomposition (SDD) are also used for text summariza-
Summarization Systems7–9 categories summarization tech- tion.
niques between a specific request based and generic query Remaining sections of the paper comprise as follows.
based summarization. Generic summary is based on main In section 2 sentence scoring based text summariza-
topics covered in the text but query based summarization tion techniques are discussed. In section 3 unsupervised
specifies the request or question for summary. Another learning based text summarization techniques are dis-
categorization of text summarization systems are based cussed. In section 4 evaluation methodologies for text
on the Supervised and Unsupervised Techniques10–12. summary generated are discussed. Section 5 produces the
Supervised techniques use dataset which are annotated by proposed research work, its techniques and approaches.
humans before applying the algorithm but unsupervised In section 6, experimental study is described for proposed
techniques do not use this type of human annotations research work. Section 7 discusses performance analy-
with dataset. Unsupervised techniques use the linguistic sis based on evaluation methodologies. In Section 8, we
and statistical information generated from the dataset present conclusion and future extension in the research
for text summarization. Another categorization in text work proposed.
summarization systems are based on Surface-Level and
Deeper-Level Summarization Systems13. Surface-Level
and Deeper-Level Summarization Systems summarize
2. Sentence Scoring Based Text
the text as per the purpose of summary. Generally this Summarization
type of summarizations are used for news articles, scien- Sentence scoring methods discussed in many research
tific text etc. papers basically emphasize on word score, sentence score
Extractive Text Summarization selects representa- and graphs, where word score and sentence scores are
tive sentences from available large scale text dataset. counted based on the frequencies of word in given text
These sentences are selected based on different methods. dataset. The graph based sentence scoring is based on
One method is based on Surface Level Approaches2,3 in relationship between the sentences. The focus of many
which sentences are selected based on the most frequent researches is on analysis of large scale text available or
words. This type of method gives good results for query written with print media. Research work in this paper
and purpose based summarizations but summariza- is focused on analyzing the large amount data extracted
tion for reviews and feedback is not appropriate for it. from web in the form of review and feedback about an
Another method is based on Statistical Approaches2,14 enterprise or organization for their products and services.
which gives the summary based on relevance of informa-
tion extracted from dictionaries. For finding relevance
information about the selected text classifier, algorithms 3. Unsupervised Learning Based
like Bayesian classifier are used. Another type of method Text Summarization
is based on Text Connectivity Approaches2,4. In this
approach, text summarization is generated by the con- Supervised techniques use human generated summary
nectivity of sentences and text based on lexical chains and to select features and parameters for summarization.
Rhetorical Structure. Another type of method is based on The main problem in this approach is reliability of sum-
Graph Based Approaches4. The nodes of directed graph mary based on human generated parameters and features.
represent sentences of text, and edges represent the simi- Many researches have shown the conflicts in summary
larity between these sentences. Summary is generated generated. Due to diversity of large scale datasets, super-
by selection of sentences with highest similarity associ- vised techniques based summarization are also not fitted.
ated. Another method is based on Machine Learning Study and research on big data analytics for text dataset
Based Approaches13. The machine learning based sum- also recommends unsupervised techniques and their
marization algorithms use techniques like Naïve-Bayes, acceptance than supervised techniques15–17. Unsupervised
2 Vol 10 (17) | May 2017 | www.indjst.org Indian Journal of Science and Technology
Jai Prakash Verma and Atul Patel
Vol 10 (17) | May 2017 | www.indjst.org Indian Journal of Science and Technology 3
Evaluation of Unsupervised Learning based Extractive Text Summarization Technique for Large Scale Review and Feedback
Data
4 Vol 10 (17) | May 2017 | www.indjst.org Indian Journal of Science and Technology
Jai Prakash Verma and Atul Patel
Vol 10 (17) | May 2017 | www.indjst.org Indian Journal of Science and Technology 5
Evaluation of Unsupervised Learning based Extractive Text Summarization Technique for Large Scale Review and Feedback
Data
tive text summarization improve the recall, precision, and 5. Ittoo A. Text analytics in industry: Challenges, desiderata
f-measure. MiniBatchKMeans improves the result than and trends. Comput Industry. 2016. Crossref.
K-Means. Graph Based Text Summarization improves the 6. Khan A, Salim N, Obasa AI. An Optimized Semantic
results with recall, precision, and f-measure. Here we are Technique for Multi- Document Abstractive
comparing unsupervised learning techniques with sen- Summarization. Indian Journal of Science and Technology.
2015 Nov; 8(32):1–11. Crossref
tence scoring methods for extractive text summarization.
7. Lloret E, Palomar M. Tackling redundancy in text sum-
marization through different levels of language analysis.
8. Conclusion Computer Standards & Interfaces. 2013; 35:507–18.
8. Bridge D, Healy P. The GhostWriter-2.0 Case-Based
An unsupervised learning based extractive text summari- Reasoning system for making content suggestions to the
zation system is implemented and evaluated with different authors of product reviews. Knowledge-Based Systems.
algorithms. Graph based sentence scoring method is 2012; 29:93–103. Crossref
implemented and evaluated with traditional sentence 9. Online Shopping touched new heights in India in 2012.
scoring methods. Programming with Spark program- Hindustan Times, 31 December 2012. 2014 July; 3(7):1–7,
ming framework on Hadoop Distributed File System Retrieved on 31 December 2012.
10. Bing LI, Keith CC, Chan. A Fuzzy Logic Approach for
storage is better for efficient execution when compared
Opinion Mining on Large Scale Twitter Data. IEEE/
to other Map Reduce with Hadoop environment. Graph
ACM 7th International Conference on Utility and Cloud
based sentence scoring method gives comparatively bet- Computing, 2014. p. 652–7.
ter result than other unsupervised learning techniques 11. Ghorpade T, Ragha L. Hotel Reviews using NLP and
applied for extractive text summarization. Analyzing Bayesian Classification. International Conference on
Amazon’s Review and feedback dataset can provide the Communication, Information & Computing Technology
future enhancement in this work. (ICCICT), Mumbai: 2012 Oct 19-20; 84(6):17–22.
12. Khan A, Baharudin B. Sentiment Classification Using
Sentence-level Semantic Orientation of Opinion Terms
9. References from Blogs IEEE. IEEE. 2011; 1–17.
1. Verma JP, Patel B, Patel A. Big Data Analysis. 13. Thiago S, Guzella, Walmir M, Caminhas. A review of
Recommendation System with Hadoop Framework, machine learning approaches to Spam filtering. Elsevier
IEEE International Conference on Computational Journal - Expert Systems with Applications. 2009;
Intelligence & Communication Technology. 2015. p. 1–6. 36:10206–22. Crossref
PMCid:PMC4410521 14. Sheshasaayee A, Jayanthi R. A Text Mining Approach to
2. Ferreira R, Cabral LS, Lins RD, Silva GP, Freitas F, George Extract Opinions from Unstructured Text. Indian Journal
DC, Cavalcanti A, Lima RA, Steven J, Simske B, Favaro L. of Science and Technology. 2015 Dec; 8(36):1–4. Crossref
Assessing sentence scoring techniques for extractive text 15. Nomoto T, Matsumoto Y. A New Approach to Unsupervised
summarization. Expert Systems with Applications. 2013; Text Summarization. SIGIR’01, Septe, New Orleans,
40:5755–64. Crossref Louisiana, USA: 2001. p. 1–9.
3. Xiang Z, Schwartz Z, John H, Gerdes J, Uysal M. What can 16. Sulthana AR, Subburaj R. An Improvised Ontology
big data and text analytics tell us about hotel guest experi- based K-Means Clustering Approach for Classification
ence and satisfaction. International Journal of Hospitality of Customer Reviews, Indian Journal of Science and
Management. 2015; 44:120–30. Crossref Technology. 2016 Apr; 9(15):1–6. Crossref
4. Ganesan K, Zhai C, Han. Opinosis. A Graph Based 17. Anuradha G, Varma DJ. Fuzzy Based Summarization of
Approach to Abstractive Summarization of Highly Product Reviews for Better Analysis. Indian Journal of
Redundant Opinions. Proceedings of the 23rd International Science and Technology. 2016 Aug; 9(31):1–9. Crossref
Conference on Computational Linguistic, Beijing, China:
2010. p. 1–9.
6 Vol 10 (17) | May 2017 | www.indjst.org Indian Journal of Science and Technology