Unit-5 Mahout
Unit-5 Mahout
Unit-5 Mahout
Mahout
Introduction to Mahout
o Apache Mahout is a library of scalable algorithms and since it runs on top of
Hadoop, characterized by the mascot Elephant, it is given the name Mahout.
o Mahout started as a subproject of Lucene, an API of Apache, which helps in
searching and sorting out relevant data from heterogeneous data sources, such as
XML files, MySQL database, Excel sheets etc., at a very fast speed.
o With time, Mahout acquired functionalities of its own and emerged as an
independent package of scalable and fast-executing machine-learning algorithms.
o Apache Mahout makes use of the MapReduce paradigm and Hadoop framework;
however, it can also run algorithms in the standalone mode.
Some features of Apache Mahout are listed below:
❖ Mahout algorithms are written on the top of Hadoop and can easily run in Hadoop
Distributed File System as well as on independent platforms.
❖ Mahout algorithms are highly scalable and compatible with the cloud network.
❖ Mahout provides a ready-to-use framework facilitating the developers to perform data
mining on huge volumes of data.
❖ Mahout algorithms are very fast and efficient as compared to other Big Data analytic
techniques.
❖ Algorithms of Mahout contain many MapReduce enabled clustering implementations
such as K-Means, Fuzzy K-Means, Canopy, Dirichlet and Mean-Shift.
❖ Mahout supports implementations of Distributed Naïve Bayes and Complementary
Naïve Bayes.
❖ Mahout supports matrix and vector libraries.
Mahout works on the machine-learning framework that processes
data on the basis of the following 3Cs:
➢ Collaborative filtering (as known as Recommendation)
➢ Clustering
➢ Classification
Machine Learning
▪ Machine learning is a type of Artificial Intelligence (AI) that deals with the
programming of computer systems.
▪ Algorithms based on machine-learning techniques enable computer systems to
learn and improve with experience.
▪ Machine learning algorithms targets on the development of computer systems
and programs, enabling them to analyse, grow and adapt themselves to any
changes in the volume and variety of data.
▪The concept of machine learning is similar to that of data mining, as both search
for and establish patterns of data.
Working of Machine learning algorithm
Training Data
Machine Learning
Algorithms