Dice Resume CV Vijay Krishna
Dice Resume CV Vijay Krishna
Dice Resume CV Vijay Krishna
Vijay.gcp3@gmail.com
(407) 476-6397
Summary:
Certification:
Experience in building multiple Data pipelines, end to end ETL and ELT process for Data
ingestion and transformation in GCP and coordinate task among the team.
Design and implement various layer of Data lake, Design star schema in Big Query.
Using g-cloud function with Python to load data in to Big query for on arrival csv files in
GCS bucket.
Process and load bound and unbound data from Google pub/sub topic to Big query using
cloud Dataflow with Python.
Designed Pipelines with Apache Beam, KubeFlow, Dataflow and orchestrated jobs into
GCP.
Developed and Demonstrated the POC, to migrate on-prem workload to Google Cloud
Platform using GCS, Big Query, Cloud SQL and Cloud DataProc.
Documented the inventory of modules, infrastructure, storage, components of existing On-
Prem data warehouse for analysis and identifying the suitable technologies/strategies
required for Google Cloud Migration.
Design, development and implementation of performing ETL pipelines using python API
(pySpark) of Apache Spark.
Worked on GCP POC to migrate data and applications from On-Prem to Google Cloud.
Exposure on IAM roles in GCP.
Create firewall rules to access Google data procs from other machines.
Process and load bound and unbound Data from Google pub/subtopic to Big query using
cloud Dataflow with Python.
Setup GCP Firewall rules to ingress or egress traffic to and from the VM's instances based on
specified configuration and used GCP cloud CDN (content delivery network) to deliver
content from GCP cache locations drastically improving user experience and latency.
Environment: GCP, Cloud SQL, Big Query, Cloud DataProc, GCS, Cloud SQL, Cloud Composer,
Informatica Power Center 10.1, Talend 6.4 for Big Data, Hadoop, Hive, Teradata, SAS, Teradata, Spark,
Python, Java, SQL Server.
Data Engineer
PWC | Tampa, FL | Dec 2019 – Jun 2020
Project: US-ASR NQO Consult & Advisory
Responsibilities:
Worked on implementing scalable infrastructure and platform for large amounts of data
ingestion, aggregation, integration, analytics in Hadoop using Spark and Hive.
Worked on developing streamlined workflows using high-performance API services
dealing withlarge amounts of structured and unstructured data.
Developed Spark jobs in Python to perform data transformation, creating Data
Frames and
Spark SQL.
Worked on processing un-structured data in JSON format to structured data in parquet
formatby performing several transformations using Pyspark.
Developed Spark applications using spark libraries to perform ETL transformations
and therebyeliminating the need for utilizing ETL tools.
Developed the end-to-end data pipeline in spark using python to ingest, transform and
analyses data.
Created Hive tables using HiveQL, then loaded the data into Hive tables and analyzed
the databy developing Hive queries.
Created and executed Unit test cases to validate transformations and process
functions areworking as expected.
Worked on scheduling Control-M workflow engine to run multiple jobs.
Written shell scripts to automate application deployments.
Implemented solutions to switch schemas based on the dates so that the transformation
wouldbe automated.
Developed custom functions and UDFs in python to incorporate methods and
functionality ofSpark.
Developed data validation scripts in Hive and Spark and perform validation using
Jupiter Notebook by spinning up the query cluster in AWS EMR.
Executed Hadoop and Spark jobs on AWS EMR using data stored in Amazon S3.
Implemented Spark RDD transformations to map business analysis and apply actions
on top oftransformations.
Worked on Data serialization formats for converting complex objects into sequence bits by
using
Parquet, Avro, JSON, CSV formats.
Environment: Hadoop, Hive, Zookeeper, Sqoop, Spark, Control-M, Python, Bamboo, SQL, Bit
bucket, AWS, Linux.
Data Engineer
Thomson Reuters |India |June 2016 – Jan 2019
Responsibilities:
Creating and managing nodes that utilize Java jars and python, shell scripts for
scheduling jobsto customize Data ingestion.
Developed Pig Scripts for change data capture and delta record processing between
newlyarrived data and already existing data in HDFS.
Developed and performed Sqoop import from Oracle to load the data into HDFS.
Created Partitions, Buckets based on State to further process using Bucket based Hive
joins.
Created Hive tables to store the processed results in a tabular format.
Scheduled MapReduce jobs in production environment using Oozie scheduler and
Autosys.
Developed Kafka producer and brokers for message handling.
Imported the data to Hadoop using Kafka and implemented the Oozie job for daily
imports.
Configured Kafka ingestion pipeline to transmit the logs from web server to Hadoop.
Worked with POC’s for stream processing using Apache NIFI.
Worked on Hortonworks Hadoop Solutions with Real-time Streaming using Apache NIFI.
Analyzed Hadoop logs using Pig scripts to oversee the errors caused by the team.
Performed MySQL queries for efficient retrieval of ingested data using MySQL workbench.
Implemented data ingestion and transformation using automated workflows using Oozie.
Created and generated audit reports to notify security threat and track all user activity
usingvarious Hadoop components.
Designed various plots showing HDFS analytics and other operations performed on
theenvironment.
Worked with Infra team for testing the environment after patches, upgrades and migration.
Developed multiple Java scripts for delivering end-to-end support while maintaining
productintegrity.
Environment: HDFS, Hive, MapReduce, Pig, Spark, Kafka, Sqoop, Scala, Oozie, Maven, GitHub, Java,
Python, MySQL, Linux.
Environment: MS SQL Server, SQL, SSIS, MySQL, Unix, Oracle, Java, Python, Shell.