Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Akshay Godugu Phone: (424) 272-5152: Required Skills/Experience # Years

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 6

Akshay Godugu

Phone : (424)272-5152
Email : akshaygodugu45@gmail.com.

REQUIRED SKILLS/EXPERIENCE # YEARS


The contractor must have hands on R programming knowledge; statistical background and 6+
understanding
Ability to build and deliver robust statistical models 7+
Deliver high quality documentation and presentations to support and maintain model and 6+
library use
Able to interact with functional and technical consultants to ensure successful model 5+
building and deployment
Development and back testing 5+
Deployment 5+
UAT 7+
Windows 7 2+
Windows 10 3+
R Langue 6+
R Studio 6+
SQL Manager 7+
SQL Langue 7+
R Development/Programming 7+
SQL 7+
ADABAS/Natural 1+
Agile 3+
Power BI 4+
Able to work independently yes

Summary
 Around 8 Years of experience in Machine Learning, Data mining with large datasets of Structured and
Unstructured data, Data Acquisition, Data Validation, Predictive modeling, Data Visualization.
 Used Pandas, NumPy, Seaborn, SciPy, Matplotlib, Scikit-learn, NLTK in Python for developing various
machine learning algorithms and utilized machine learning algorithms such as Linear Regression,
Multivariate Regression, Naive Bayes, Random Forests, K-Means, & KNN for Data Analysis.
 Responsible for design and development of advanced R/Python programs to prepare transform and harmonize
data sets in preparation for modeling.
 Hands on experience in implementing LDA, Naive Bayes and skilled in Random Forests, Decision Trees,
Linear and Logistic Regression, SVM, Clustering, neural networks, Principle Component Analysis and
good knowledge on Recommender Systems.
 Proficient in Statistical Modeling and Machine Learning techniques (Linear, Logistics, Decision Trees,
Random Forest, SVM, K-Nearest Neighbors, Bayesian, XG Boost) in Forecasting/ Predictive Analytics,
Segmentation methodologies, Regression based models, Hypothesis testing, Factor analysis/ PCA, Ensembles.
 Expertise in transforming business requirements into Analytical Models, Designing Algorithms, Building
Models, Developing Data Mining and reporting solutions that scales across massive volume of structured
and unstructured data.
 Developed Logical Data Architecture with adherence to Enterprise Architecture.
 Strong experience in Software Development Life Cycle (SDLC) including Requirements Analysis, Design
Specification and Testing as per Cycle in both Waterfall and Agile methodologies.
 Adept in statistical programming languages like Rand also Python including Big Data technologies like
Hadoop, Hive.
 Skilled in usingdplyr and pandas in R and Python for performing Exploratory data analysis.
 Experience working with data modeling tools like Erwin, Power Designer and ERStudio.
 Experience in designing star schema, Snow flake schema for Data Warehouse, ODS architecture.
 Experience in designing stunning visualizations using Tableau software and publishing and presenting
dashboards, Storyline on web and desktop platforms.
 Improved fraud prediction performance by using random forest and gradient boosting for feature selection
with Python Scikit-learn.
 Designed and implemented system architecture for Amazon EC2 based cloud-hosted solution for the client.
 Analysed large data sets apply machine learning techniques and develop predictive models, statistical models
and developing and enhancing statistical models by leveraging best-in-class modelling techniques.
 Wrote Python modules to extract/load asset data from the MySQL source database.
 Highly skilled in using Hadoop (pig and Hive) for basic analysis and extraction of data in the infrastructure to
provide data summarization.
 Highly skilled in using visualization tools like Tableau, ggplot2 and d3.JS for creating dashboards.
 Worked and extracted data from various database sources like Oracle, SQL Server, DB2, Regularly accessing
JIRA tool and other internal issue trackers for the Project development.
 Skilled in System Analysis, E-R/Dimensional Data Modeling, Database Design and implementing RDBMS
specific features.
 Knowledge of working with Proof of Concepts (PoC's) and gap analysis and gathered necessary data for
analysis from different sources, prepared data for data exploration using Data Munging and Teradata.
 Well experienced in Normalization & De-Normalization techniques for optimum performance in relational
and dimensional database environments.

Technical Skills:
Scripting/programming R (dplyr, ggplot2, shiny, plotly), Python (Numpy, Scipy, Pandas, Scikit-learn,
language Matplotlib, NLTK, Beautiful Soup, Selenium, Python IDE), Pyspark
Machine learning/Deep Classification, Regression(Linear, Logistic, Elastic Net), Clustering analyses using
learning neuralnets (MLP), RF, KNN, SVM, GLM, MLR, Logit, K-means algorithms
Database management RDBMS (Microsoft SQL server, Oracle DB, Teradata)
systems
Big Data MySQL, Spark, Hadoop/MapReduce, Hive, Impala
Statistical Analysis Tools SAS Studio, SAS Enterprise Guide, SAS Enterprise Miner, Python, R, ggplot2,
dplyr,cart, scipy,sklearn
Data storage/processing Hadoop And Spark
framework
Data Tableau, Power BI and shiny
visualization/reporting
Operating System Windows, Unix
Case Tools Erwin &ERStudio

Professional Experience:

East West Bank, Pasadena, CA Oct 2018 to Till Date


Role: Data Scientist/Machine Learning Engineer
Responsibilites:
 Responsible for working with various teams on a project to develop analytics based solution to target roaming
subscribers specifically.
 Worked with several R packages includingknitr, dplyr, SparkR, CausalInfer, Space-Time.
 Coded R functions to interface with CaffeDeepLearningFramework.
 Used Pandas, Numpy, Seaborn, Scipy, Matplotlib, Sci-kit-learn, and NLTK in Python for developing
various machinelearning algorithms.
 Combination of these elements (travel prediction & multi-dimensional segmentation) would enable operators
to conduct highly targeted and personalized roaming services campaigns leading to significant subscriber
uptake.
 Installed and used CaffeDeep Learning Framework.
 Scaled up to Machine Learning pipelines: 4600 processors, 35000 GB memory achieving 5-minute
execution.
 Deployed GUI pages by using JSP, JSTL, HTML, DHTML, XHTML, CSS, JavaScript, AJAX.
 Develop Python, Pyspark, HIVE scripts to filter/map/aggregate data. Scoop to transfer data to and from
Hadoop.
 Configured the project on WebSphere 6.1 application servers
 Developed a Machine Learning test-bed with 24 different model learning and feature learning algorithms.
 By thorough systematic search, demonstrated performance surpassing the state-of-the-art (deep learning).
 Developed in-disk, huge (100GB+), highly complex Machine Learning models.
 Used SAX and DOM parsers to parse the RAWXML documents
 Used RAD as Development IDE for web applications.
 Utilized Spark, Scala, Hadoop, HBase, Cassandra, MongoDB, Kafka, Spark Streaming, MLLib, Python,
a broad variety of machine learning methods including classifications, regressions, dimensionally reduction
etc. and Utilized the engine to increase user lifetime by 45% and triple user conversations for target categories.
 Used Spark Data frames, Spark-SQL, Spark MLLib extensively and developing and designing POC's
using Scala, Spark SQL and MLlib libraries.
 Redesigned Interactive Visualization graphs in D3.js
 Used DataQuality Validation techniques to validate Critical Data Elements (CDE) and identified various
anomalies.
 Extensively worked on DataModeling tools ErwinDataModeler to design the DataModels.
 Developed various Qlik-View Data Models by extracting and using the data from various sources files, DB2,
Excel, Flat Files and Big data.
 Participated in all phases of Data-Mining, Data-collection, Data-Cleaning, Developing-Models, Validation,
Visualization and Performed Gap Analysis.
 Designed both 3NF data models for ODS, OLTP systems and Dimensional DataModels using Star and Snow
flake Schemas.
 Updated Python scripts to match training data with our database stored in AWS Cloud Search, so that we
would be able to assign each document a response label for further classification.
 Created SQL tables with referential integrity and developed queries using SQL, SQL PLUS and PL/SQL.
 Designed and developed Use Case, Activity Diagrams, Sequence Diagrams, OOD (Object oriented Design)
using UML and Visio.
 Interaction with BusinessAnalyst, SMEs and other DataArchitects to understand Business needs and
functionality for various project solutions.
Environment: AWS, R, Informatica, Python, HDFS, ODS, OLTP, Oracle 10g, Hive, OLAP, DB2, Metadata, MS
Excel, Mainframes MS Vision, Map-Reduce, Rational Rose, SQL, and MongoDB.

Novus International, St. Louis, MO Jun 2016 to Sep 2018


Role: Data Scientist
Responsibilites:
 Developed the prediction model for crop yield, based on different kinds of field, weather and imagery data.
 Exploratory data analysis and Feature engineering to best fit the regression model.
 Designed a static pipeline in MS Azure for data ingestion and dashboarding. Used MS ML Studio for
modeling and MS Power BI for dash boarding.
 Analyze large datasets to provide strategic direction to the company.
 Perform quantitative analysis of product sales trends to recommend pricing decisions.
 Conduct cost and benefits analysis on new ideas.
 Developing Models on SCALA and SPARK for users, prediction models, sequential algorithms
 Used PANDAS, NUMPY, SEABORN, MATPLOTLIB, SCIKIT-LEARN, SCIPY, NLTK in Python for
developing various machine learning algorithms.
 Scrutinize and track customer behavior to identify trends and unmet needs.
 Develop statistical models to forecast inventory and procurement cycles.
 Assist in developing internal tools for Data Analysis.
 Advising on the suitability of methodologies and suggesting improvements.
 Carrying out specified data processing and statistical techniques.
 Supplying qualitative and quantitative data to colleagues & clients.
 Using Informatica& SAS to extract transform & load source data from transaction systems.
 Creating data pipelines using big data technologies like Hadoop, pyspark etc.
 Familiarity with Hadoop cluster environment and configurations for resource management for analysis works
Python, Pyspark, HIVE for analytics and developing dashboards
 Creating statistical models using distributed and standalone models to build various diagnostics, predictive and
prescriptive solution.
 Utilize a broad variety of statistical packages like SAS, R , MLIB, Graphs, Hadoop, Spark , MapReduce,
Pig and others.
 Refine and train models based on domain knowledge and customer business objectives
 Deliver or collaborate on delivering effective visualizations to support the client's objectives.
 Produce solid and effective strategies based on accurate and meaningful data reports and analysis and/or keen
observations.
 Used Teradata utilities such as Fast Export, MLOAD for handling various tasks DATA MIGRATION/ETL
from OLTP Source Systems to OLAP Target Systems
 Developed web applications using .net technologies; work on bug fixes/issues that arise in the production
environment and resolve them at the earliest.
Environment:Power BI, MS Azure, HDFS, SAS, Python, Pyspark, Informatica, Mapreduce,PIG, Hive, Unix,OLAP,
OLTP,ODS, NLTK,XML, JSON etc.

Pacific Global Bank, Chicago, IL Jan 2015 to May 2016


Role: BIG Data Anayst/Engineer
Responsibilites:
 Worked with Ajax API calls to communicate with Hadoop through Impala Connection and SQL to render the
required data through it .These API calls are similar to Microsoft Cognitive API calls.
 Good grip on Cloudera and HDP ecosystem components.
 Used ElasticSearch (Big Data) to retrieve data into application as required.
 Performed Map Reduce Programs those are running on the cluster.
 Developed multiple MapReduce jobs in java for data cleaning and preprocessing.
 Statistical Modelling with ML to bring Insights in Data under guidance of Principal Data Scientist Data
modeling with Pig, Hive, Impala.
 Ingestion with Sqoop, Flume.
 Understanding and implementation of text mining concepts, graph processing and semi structured and
unstructured data processing.
 Analyzed the partitioned and bucketed data and compute various metrics for reporting.
 Involved in loading data from RDBMS and web logs into HDFS using Sqoop and Flume.
 Worked on loading the data from MySQL to HBase where necessary using Sqoop.
 Developed Hive queries for Analysis across different banners.
 Extracted data from Twitter using Java and TwitterAPI. Parsed JSON formatted twitter data and uploaded to
database.
 Launching Amazon EC2 Cloud Instances using Amazon Images (Linux/ Ubuntu) and Configuring
launched instances with respect to specific applications.
 Exported the result set from Hive to MySQL usingSqoop after processing the data.
 Analyzed the data by performing Hive queries and running Pig scripts to study customer behavior.
 Have hands on experience working on Sequence files, AVRO, HAR file formats and compression.
 Used Hive to partition and bucket data.
 Experience in writing MapReduce programs with Java API to cleanse Structured and unstructured data.
 Wrote Pig Scripts to perform ETL procedures on the data in HDFS.
 Created HBase tables to store various data formats of data coming from different portfolios.
 Worked on improving performance of existing Pig and Hive Queries.
Environment:HDFS, HBase, Flume, Sqoop, Pig, Hive, Impala, Java API, Mapreduce, elastic search, Amazon EC2
etc.

The Money Store, Florham Park, NJ Nov 2013 to Dec 2014


Role: Engineer - Data Analytics
Responsibilites:
 Participated in all phases of research including requirement gathering, data cleaning, data mining, developing
model and visualization.
 Collaborated with Data analyst and others to get insights and understanding of the data.
 Used R to manipulate and analyze data for solution. Packages were used for test mining.
 Performed Data Mining, Data Analytics, Data Collection, and Data Cleaning.
 Developing Models, Validation, Visualization and performed Gap analysis.
 Cleansed and transformed the data by treating Outliers, Imputing the missing values.
 Used predictive analysis to create models of customer behavior that are correlated positively with historical
data and use these models to forecast future results.
 Translate business needs into mathematical model and algorithms and build exceptional machine learning
algorithms.
 Tried and implemented multiple models to evaluate predictions and performance.
 Application of various machine learning algorithms and statistical modelling like Logistic Regression,
Decision tree, SVM, to identify Volume using Scikit-learn package in R & Python.
 Performed Boosting method on predicted model for the improve efficiency of the model.
 Improve efficiency and accuracy by evaluating model in R.
 The model could predict 87.4% accurate.
 Documented the visualizations and results and submitted to HR management.
 Presented Dashboards to Higher Management for more Insights using Power BI and Tableau.
 Working knowledge of MapReduce coding, including Java, Python, Pig programming, Hadoop Streaming,
Hive for data analysis of production applications.
Environment: R/R studio, Python, Tableau, MS SQL Server 2005/ 2008, MS Access.

AN Technologies, India Aug 2012 to Oct 2013


Role: Associate Data Analyst
Responsibilites:
 Performed exploratory analysis on research data and transformed raw information into insight and provided
recommendations using Tableau, Advanced Excel Macros, PL/SQL and Omniture analytics tool.
 Derived insights from machine learning algorithm using SAS to analyze web log files and campaign data to
recommend/improve promotional opportunities
 Tuned PL/SQL query procedures and scheduled cronjobs in apache server to update/enhance business logics
and process. I have strong SQL skills with the ability to write complex SQL statements that analyze data and
create prototypes.
 Interacted with clients for system study, requirements gathering, business analysis and scoping for
modification in the existing system
 Developed an automated report generation tool that improved efficiency by 50% and increased revenue by
80% with technologies such as JAVA, Python, Shell Scripting in Apache httpd LINUX server.
 Created SQL procedures, functions, computed query performance tunings and index optimization for
database efficiency.
 Familiarity with several software delivery methodologies (RUP/Agile/Waterfall). Strong knowledge of
requirement gathering, use case analysis and user acceptance testing.
 Troubleshooting, tracking and investigating data related issues and providing KPIs and campaign end metrics
with actionable insights using SAS and Tableau.
Environment:PL/SQL, Tableau, Java, Python, Shell scripting, SAS etc.

EDUCATION
 JawaharlaNehru Technological University, Hyderabad, India Aug 2008 – May 2012

You might also like