BigData - Resume

Name: HarishRaj T
Big Data Engineer

E-Mail ID: HaishRaj99967@gmail.com
Contact No: 7777-799-3456
SUMMARY:
 7+ years of IT professional experience as Software developer with core expertise in
building Bigdata and Hadoop data pipelines.
 Over 4+ Years of Big Data experience in building highly scalable data analytics applications.
 Experience in working with Hadoop components like HDFS, Map Reduce, Hive, HBase,
Sqoop, Oozie, Spark, Kafka.
 Strong understanding of Distributed systems design, HDFS architecture, internal working
details of MapReduce and Spark processing frameworks.
 Solid experience developing Spark Applications for performing high scalable data
transformations using RDD, Dataframe and Spark-SQL.
 Good hands-on experiencing working with various Hadoop distributions mainly Cloudera
(CDH), Hortonworks (HDP) and Amazon EMR.
 Expertise in developing production ready Spark applications utilizing Spark-Core,
Dataframes, Spark-SQL, Spark-ML and Spark-Streaming API's.
 Strong experience troubleshooting failures in spark applications and fine-tuning spark
applications and hive queries for better performance.
 Good experience utilizing various optimization options in Spark like broadcast joins,
caching (persisting), sizing executors appropriately, reducing shuffle stages etc.,
 Worked extensively on Hive for building complex data analytical applications.
 Strong experience writing complex map-reduce jobs including development of custom Input
Formats and custom Record Readers.
 Sound knowledge in map side join, reduce side join, shuffle & sort, distributed cache,
compression techniques, multiple hadoop Input & output formats.
 Good experience working with AWS Cloud services like S3, EMR, Redshift, Athena, Glue
metastore etc.,
 Deep understanding of performance tuning, partitioning for optimizing spark applications.
 Worked on building real time data workflows using Kafka, Spark streaming and HBase.
 Extensive knowledge on NoSQL databases like HBase, Cassandra and Mongo DB.
 Solid experience in working with csv, text, Avro, parquet, orc, json formats of data.
 Extensive experience in performing ETL on structured, semi-structured data using Pig Latin
Scripts.
 Designed and implemented Hive and Pig UDF's using Java for evaluation, filtering, loading
and storing of data.
 Strong understanding of Data Modelling and experience with Data Cleansing, Data
Profiling and Data analysis.
 Experience in writing test cases in Java Environment using JUnit.
 Proficiency in programming with different IDE’s like Eclipse, Net Beans.
 Good Knowledge about scalable, secure cloud architecture based on Amazon Web Services
like EC2, Cloud Formation, VPC, S3, etc.
 Good knowledge in the core concepts of programming such as algorithms, data structures,
collections.
 Good understanding of Service Oriented architecture (SOA) and web services like XML, XSD,
XSDL, and SOAP.
 Excellent communication and inter-personal skills, flexible and adaptive to new
environments, self-motivated, team player, positive thinker and enjoy working in
multicultural environment.
 Analytical, organized and enthusiastic to work in a fast paced and team-oriented
environment.
 Expertise in interacting with business users and understanding the requirement and
providing solutions to match their requirement.
TECHNICAL SKILLS:
Hadoop/Big Data: Spark, Hive, HDFS, MapReduce, Sqoop, Oozie, Kafka, Impala
Programming Java, Scala, Python
languages:
Cloud AWS-EC2, S3, EMR, RDS, Lambda, Redshift, Athena, Glue Metastore
Database: NoSQL (Hbase, Cassandra, MongoDB) , Teradata, Oracle, DB2, MySQL,
Posgtres
IDE Tools: Eclipse, Intellij
Development Agile, Waterfall
Approach:
Version Control: CVS, SVN, Git
Reporting Tools: Tableau, QlikView
PROFESSIONAL EXPERIENCE:
Thrivent Financials, Marietta, GA

Senior Data Engineer May 2020- Present
Responsibilities:
 Developed Spark applications using Scala utilizing Data frames and Spark SQL API for
faster processing of data.
 Developed highly optimized Spark applications to perform various data cleansing,
validation, transformation and summarization activities according to the requirement
 Data pipeline consists Spark, Hive, Sqoop and custom build Input Adapters to ingest,
transform and analyze operational data.
 Developed Spark jobs and Hive Jobs to summarize and transform data.
 Used Spark for interactive queries, processing of streaming data and integration with
popular NoSQL database for huge volume of data.
 Involved in converting Hive/SQL queries into Spark transformations using Spark
DataFrames and Scala.
 Automated creation and termination of AWS EMR clusters using Amazon Java SDK.
 Involved in deploying spark and hive applications in AWS stack.
 Handled importing data from different data sources into S3 using Sqoop and performing
transformations using Hive, and Spark.
 Exported the analyzed data to the Redshift using spark, to further visualize and generate
reports for the BI team.
 Helped DevOps Engineers for deploying code and debug issues.
 Used Hive to analyze the partitioned and bucketed data and compute various metrics for
reporting.
 Developed Hive scripts in Hive QL to de-normalize and aggregate the data.
Environment: AWS EMR, S3, Spark, Scala, Hive, Sqoop, ETL, Java, Athena, Glue, Maven, GitHub
Bayer, St. Louis, MO

Data Engineer Dec 2018 – April 2020
Responsibilities:
 Involved in creating data ingestion pipelines for collecting clinical trial and health record
data from various external sources like FTP Servers and S3 buckets.
 Involved in migrating existing Teradata Datawarehouse to AWS S3 based data lakes.
 Involved in migrating existing traditional ETL jobs to Spark and Hive Jobs on new cloud
data lake.
 Wrote complex Spark applications for performing various de-normalization of the
datasets and creating a unified data analytics layer for downstream teams.
 Primarily responsible for fine-tuning long running Spark applications, writing custom
spark udfs, troubleshooting failures etc.,
 Involved in building a real time pipeline using Kafka and Spark streaming for delivering
event messages to downstream application team from an external rest-based application.
 Involved in creating Hive scripts for performing adhoc data analysis required by the
business teams.
 Worked extensively on migrating on prem workloads to AWS Cloud.
 Worked on utilizing AWS cloud services like S3, EMR, Redshift, Athena and Glue
Metastore.
 Used broadcast variables in spark, effective & efficient Joins, caching and other capabilities
for data processing.
 Involved in continuous Integration(CI/CD) of application using Jenkins.
 Responsible for debugging and troubleshooting the running applications in production.
Environment: AWS EMR, Spark, Hive, HDFS, Sqoop, Kafka, Oozie, HBase, Scala, MapReduce.
Discovery Communications, Sterling, VA

Hadoop Developer Mar 2016 – Aug 2018
Responsibilities:
 Involved in writing Spark applications using Scala to perform various data cleansing,
validation, transformation, and summarization activities according to the requirement.
 Load the data into Spark RDD and perform in-memory data computation to generate the
output as per the requirements.
 Developed data pipelines using Spark, Hive and Sqoop to ingest, transform and analyze
operational data.
 Worked on performance tuning of Spark application to improve performance.
 Real time streaming the data using Spark with Kafka. Responsible for handling Streaming data
from web server console logs.
 Worked on different file formats like Text, Sequence files, Avro, Parquet, JSON, XML files and
Flat files using Map Reduce Programs.
 Developed daily process to do incremental import of data from DB2 and Teradata into Hive
tables using Sqoop.
 Wrote Pig Scripts to generate Map Reduce jobs and performed ETL procedures on the data in
HDFS.
 Analyzed the SQL scripts and designed the solution to implement using Spark.
 Solved performance issues in Hive and Pig scripts with understanding of Joins, Group and
Aggregation and how does it translate to MR jobs.
 Work with cross functional consulting teams within the data science and analytics team to
design, develop and execute solutions to derive business insights and solve client’s operational
and strategic problems.
 Exported the analyzed data to the relational databases using Sqoop for visualization and to
generate reports for the BI team.
 Extensively used Hive/HQL or Hive queries to query data in Hive Tables and loaded data into
HBase tables.
 Extensively worked with Partitions, Dynamic Partitioning, bucketing tables in Hive,
designed both Managed and External tables, also worked on optimization of Hive queries.
 Involved in collecting and aggregating large amounts of log data using Flume and staging data
in HDFS for further analysis.
Environment: HDFS, Map Reduce, Sqoop, Hive, Pig, Oozie, HBase, Python, Yarn, Spark, Tableau and
Cloudera Manager.
Helical Tech, Hyderabad, India

Hadoop Developer Aug 2014 – Feb 2016
Responsibilities:
 Responsible for building scalable distributed data solutions using Hadoop. Worked hands
on with ETL process using Pig.
 Worked on data analysis in HDFS using MapReduce, Hive and PIG jobs.
 Worked on MapReduce programming and Mapreduce-HBase Integration.
 Involved in creating external table, partitioning, bucketing of table in Hive.
 Ensuring adherence to guidelines and standards in project process.
 Facilitating testing in different dimensions.
 Wrote and modified stored procedures to load and modifying of data according to business
rule changes.
 Worked on production support environment.
 Extracted the data from Teradata into HDFS using Sqoop.
 Continuous monitoring and managing the Hadoop cluster through Cloudera Manager.
 Developed Hive queries to process the data and generate the data cubes for visualizing.
 Kerberos security was implemented to safeguard the cluster.
 Worked on a stand-alone as well as a distributed Hadoop application.
Environment: Apache Hadoop, Cloudera, Pig, Hive, SQOOP, Flume, Java/J2EE, Oracle 11G, Crontab,
JBoss 5.1.0Application Server, Linux OS, Windows OS, AWS.
SOWEDANE, Hyderabad, India

JAVA Developer June 2013 – July 2014
Responsibilities:
 Performed analysis for the client requirements based on the developed detailed design
documents.
 Developed User Interface using JavaScript and HTML.
 Implemented MVC architecture by creating Model, View and Controller classes.
 Involved in unit testing, debugging and bug fixing of application modules.
 Extensively involved in writing the SQL queries to fetch data from database.
 Defined Web Services using XML-based Web Services Description Language.
 Building Java API's/Services backing User interface screens using Spring MVC.
 Have experience in integrating other systems through XML.
 Worked with Core Java concepts like Collections Framework, multi-threading, memory
management.
 Experience of resolving issues with JVM and multi-threading. Connected to backend database
by using JDBC.
 Using JDBC and SQL developed, data access objects.
Environment: Java, J2EE, JSP, JDBC, EJB, log4j, XML, Apache Tomcat, JUNIT, DB2, SQL Server, CVS.

BigData - Resume

Uploaded by

Copyright:

Available Formats

BigData - Resume

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BigData - Resume

Uploaded by

Copyright:

Available Formats

Name: HarishRaj T

Big Data Engineer

Thrivent Financials, Marietta, GA

Bayer, St. Louis, MO

Discovery Communications, Sterling, VA

Helical Tech, Hyderabad, India

SOWEDANE, Hyderabad, India

You might also like