Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Data Scientist/ Machine Learning Engineer: Summary

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 4
At a glance
Powered by AI
Manoj Kumar has over 6 years of experience in data science, machine learning, and data analytics. He has strong skills in Python, R, SQL, and Hadoop/Spark ecosystems.

Manoj Kumar has expertise in machine learning, deep learning, data analysis, data mining, and working with structured and unstructured data. He also has experience in data warehousing and data integration.

Manoj Kumar has experience with languages like Python, R, SQL, Java, and Scala. He also lists skills in tools like AWS, Azure, Hadoop, Spark, HDFS, Hive, Pig, MySQL, SAS, and Tableau.

 

MANOJ KUMAR
chalamalamanojkumar@gmail.com
2168009620
Data Scientist/ Machine Learning Engineer

SUMMARY:
 Data Scientist with around 6 years of experience in areas including Data Analysis, Statistical Analysis,
Machine Learning, Deep Learning, Data mining with large data sets of structured and unstructured data
 Developed various Machine Learning applications with Python Scientific Stack and R.
 Experienced with Deep Learning frameworks like Scikit Learn, Tensorflow and Keras.
 Experienced DataAnalyst with solid understanding of Data Mapping, Data warehousing (OLTP, OLAP),
DataMining, DataGovernance and Data management services with Quality Assurance.
 Experience with Machine Learning algorithms such as logistic regression, KNN, SVM, random forest, neural
network, linear regression, lasso regression and k-means.
 Experience in implementing data analysis with various analytic tools, such as Anaconda 4.0 Jupiter Notebook
4.X, R 3.0 (ggplot2, dplyr, Caret) and Excel
 Adept in Statistical Data Analysis, Exploratory Data Analysis, Machine Learning, Data Mining, Java and Data
visualization using R, Python, Base SAS, SAS Enterprise Guide and SAS Enterprise Miner, Tableau and SQL
 Experienced the full software lifecycle in SDLC, Agile, DevOps and Scrum methodologies including Experience
in Big Data technologies like Spark 1.6, Spark SQL, PySpark, Hadoop 2.X, HDFS, Hive 1.X.
 Strong skills in statistical methodologies such as A/B test, experiment design, hypothesis test, ANOVA
 Working Experience on Python 3.5/2.7such as NumPy, SQLAlchemy, Beautiful soup, pickle, Pyside,
Pymongo, SciPy, PyTables. 
 Highly skilled in using visualization tools like Tableau, ggplot2 and d3.js for creating dashboard.
 Experience in foundational machine learning models and concepts: regression, random forest, boosting, and
deep learning.
 Good knowledge of Hadoop architecture and various components such as HDFS, Job Tracker, Task Tracker,
Name Node, Data Node, Secondary Name Node, MapReduce concepts, and ecosystems including Hive and
Pig.
 Ability to write and optimize diverse SQL queries, working knowledge of RDBMS like SQL Server 2008, NoSQL
databases like MongoDB
 Experience in Data Warehousing including Data Modeling, Data Architecture, Data Integration (ETL/ELT) and
Business Intelligence.
 Good Experience in using various Python libraries (Beautiful Soup, NumPy, Scipy, matplotlib, Python-
twitter, Pandas, MySQL dB for database connectivity).
 Having experienced in Big Datatechnologies including Apache Spark, HDFS, Hive, and MongoDB.
 Used the version control tools like Git2.X and build tools like Apache Maven/Ant.
 Proficient in data mining tools like R, SAS, Python, SQL, Excel, Java,ecosystems Staff leadership and Java
development.
 Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN),
Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based speech
recognition using TensorFlow.
 Strong working knowledge with SQL,SQLServer, Oracle, SAS, Tableau and Jupyter while handling various
applications in multiple projects

EDUCATION:
 Bachelor’s in Computer Science, SASTRA University, India, 2011
TECHNICAL SKILLS:

Programming & Scripting R (Packages: Stats, Zoo, Matrix, data, table, OpenSSL), Python, SQL, C, C++,
languages JAVA, JCL, COBOL, HTML, CSS, JSP, Java Script, Scala
Cloud Technologies AWS (EC2, S3, RDS, EBS,VPC, IAM, Security Groups), Microsoft Azure, Rackspace
Database SQL, MySQL, TSQL, MS Access, Oracle, Hive, MongoDB, Cassandra, PostgreSQL
Statistical Software SPSS, R, SAS
Development Tool R Studio, Notepad++, Python, Jupiter, Spyder IDE
Python Packages Numpy, SciPy, Pandas, scikit-learn, Matplotlib, seaborn, statsmodels, Keras,
TensorFlow, Theano, TensorFlow, NLTK, Scrapy
Techniques Machine learning, Regression, Clustering, Data mining
Data Science/Data Generalized Linear Models, Logistic Regressions, Boxplots, K-Means, Clustering,
Analysis Tools & SVN, PuTTY, WinSCP, Redmine (Bug Tracking, Documentation, Scrum), Neural
Techniques networks, AI, Teradata, Tableau
Algorithms Skills Machine Learning, Neural Networks, Deep Learning, NLP, Bayesian Learning,
Optimization, Prediction, Pattern Identification, Data / Text mining, Regression,
Logistic Regression, Bayesian Belief, Clustering, Classification, Statistical
modeling
Machine Learning Naïve Bayes, Decision trees, Regression models, Random Forests, Time-series,
K-means
Operating Systems Windows, Linux, Unix, Macintosh HD, Red Hat

WORK EXPERIENCE:

Best Buy – Cleveland, OH


August 2018 to Till Date
Role: Sr. Data Scientist/Machine Learning Engineer
Responsibilities:
 Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
 Setup storage and data analysis tools in AmazonWebServices (AWS) cloud computing infrastructure.
 Implemented end-to-end systems for Data Analytics, Data Automation and integrated with custom
visualization tools using R, Mahout, Hadoop and MongoDB.
 Worked as DataArchitects and ITArchitects to understand the movement of data and its storage and ER
Studio 9.7.
 Worked with several R packages including knitr, dplyr, SparkR, CausalInfer, Space-Time.
 Coded R functions to interface with Caffe Deep Learning Framework.
 Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing various
machine learning algorithms.
 Installed and used Caffe Deep Learning Framework
 Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python, a broad
variety of machine learning methods including classifications, regressions, dimensionally reduction etc.
 Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's usingScala,
Spark SQL and MLlib libraries.
 Used DataQuality Validation techniques to validate Critical Data Elements (CDE) and identified various
anomalies.
 Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2,
Excel, Flat Files and Big data.
 Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation,
Visualization and Performed Gap Analysis.
 Good knowledge of Hadoop Architecture and various components such as HDFS, JobTracker, Task Tracker,
Name Node, Data Node, SecondaryNameNode, and MapReduce concepts.
 As Architect delivered various complex OLAPDatabases/Cubes, Scorecards, Dashboards and Reports.
 Programmed a utility in Pythonthat used multiple packages (Scipy, Numpy, Pandas)
 Implemented Classification using supervised algorithms like Logistic Regression, Decision trees, KNN, Naive
Bayes.
 Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and
SnowflakeSchemas.
 Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we
would be able to assign each document a response label for further classification.
 Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
 Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design)
using UML and Visio.
Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS Excel,
Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Amphora Software Pvt Ltd – Hyderabad, India


May 2016 to June 2018
Role: Data Scientist
Responsibilities:
 Implementation of machine learning methods, optimization, and visualization. Mathematical methods of
statistics such as Regression Models, Decision Tree, Naïve Bayes, Ensemble Classifier, Hierarchical Clustering
and Semi-Supervised Learning on different datasets using Python.
 Researched and implemented various Machine Learning Algorithms using the R language.
 Devised a machine learning algorithm using Python for facial recognition.
 Used R for a prototype on a sample data exploration to identify the best algorithmic approach and then
wrote Scala scripts using spark machine learning module.
 Used Scala scripts for spark machine learning libraries API execution for decision trees, ALS, logistic and linear
regressions algorithms.
 Worked on Migrating an On-premises virtual machine to Azure Resource Manager Subscription with Azure
Site Recovery.
 Provide consulting and cloud architecture for premier customers and internal projects running on MS Azure
platform for high-availability of services, low operational costs.
 Develop structured, efficient and error-free codes for Big Data requirements using my knowledge in Hadoop
and its Eco-system.
 Development of web service using Windows Communication Foundation and.Net to receive and process XML
files and deploy on Cloud Service on Microsoft Azure.
 Worked on various methods including data fusion and machine learning and improved the accuracy of
distinguished right rules from potential rules.
 Developed Merge jobs in Python to extract and load data into a MySQL database.
 Used Test driven approach for developing the application and Implemented the unit tests using Python Unit
test framework.
 Wrote unit test cases in Python and Objective-C for other API calls in the customer frameworks.
 Tested with various Machine Learning algorithms like Support Vector Machine, Random Forest, Trees with
XGBoost concluded Decision Trees as a champion model.
 Built models using Statistical techniques like Bayesian HMM and Machine Learning classification models like
XGBoost, SVM, and Random Forest.
 Worked on different data formats such as JSON, XML and performed machine learning algorithms in Python.
Environment : Machine Learning, R Language, Hadoop, Big Data, Azure,Python, Java, J2EE, Spring, Struts, JSF, Dojo,
JavaScript, DB2, CRUD, PL/ SQL, JDBC, coherence, MongoDB, Apache CXF, soap, Web Services, Eclipse

Brillio Technologies Pvt Ltd – Bangalore, India


July 2014 to April 2016
Role: Data Scientist/Analyst
Responsibilities:
 Analyzed Trading mechanism for real-time transactions and build collateral management tools.
 Compiled data from various sources to perform complex analysis for actionable results.
 Utilized machine learning algorithms such as linear regression, multivariate regression, naive Bayes,
RandomForests, K-means, & KNN for data analysis.
 Measured Efficiency of Hadoop/Hive environment ensuring SLA is met.
 Developed Spark code using Scala and Spark-SQL/Streaming for faster processing of data.
 Prepared Spark build from the source code and ran the PIG Scripts using Spark rather using MR jobs for
better performance.
 Used machine learning to design a classifier that matched the performance of subjective pathologist
interpretations.
 Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
 Developed Scripts and Batch Job to schedule various Hadoop Program. Used Tensor Flow to train the model
from insightful data and look at thousands of examples.
 Tested and debugged SAS programs against the test data.
 Processed the data in SAS for the given requirement using SAS programming concepts.
 Expertise in producing RTF, PDF, HTML files using SAS ODS facility.
 Learning new tools and skill sets as needs arise.
 Used SparkAPI over Cloud era HadoopYARN to perform analytics on data in Hive.
 Wrote Hive queries for data analysis to meet the business requirements.
 Developed Kafka producer and consumers for message handling.
 Responsible for analyzing multi-platform applications using Python.
 Used storm for an automatic mechanism to analyze large amounts of non-unique data points with low
latency and high throughput.
 Developed Map Reduce jobs in Python for data cleaning and data processing.
Environment: Machine learning, AWS, MS Azure, Cassandra, SAS, Spark, HDFS, Hive, Pig, Linux, Anaconda Python,
MySQL, Eclipse, PL/SQL, SQL connector, SparkML.

You might also like