Meghna Reddy Kunta
Meghna Reddy Kunta
Meghna Reddy Kunta
Skill Summary:
Meghna has 8+ years of experience in information technology as Data Engineer with an expert hand in the
areas of Database Development, ETL Development, Data Modelling, Data Pipeline, Report Development and
Big Data Technologies. Experience in Data Integration and Data Warehousing using various ETL tools
Informatica PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), Talend. Expertise with
Teradata utilities, fastload, multiload, tpump to load data. Has strong expertise with major programming
languages like Java, Scala, Python and SQL. Extensively used Informatica PowerCenter, Informatica Data
Quality (IDQ) as ETL tool for extracting, transforming, loading and cleansing data from various source data
inputs to various targets, in batch and real time. Experience in Designing Business Intelligence Solutions with
Microsoft SQL Server and using MS SQL Server Integration Services (SSIS), MS SQL Server Reporting
Services (SSRS) and SQL Server Analysis Services (SSAS). Expert-level mastery in designing and
developing complex mappings to extract data from diverse sources including flat files, RDBMS tables, legacy
system files, XML files, Applications, COBOL Sources & Teradata. Knowledge and experience in Cloudera
ecosystem such as HDFS, Hive, SQOOP, HBASE, Kafka, Data pipeline, Data analysis and processing with
Hive SQL, IMPALA, SPARK, SPARK SQL. Created clusters in Google Cloud and manage the clusters using
Kubernetes(k8s). Using Jenkins to deploy code to Google Cloud, create new namespaces, creating docker
images and pushing them to container registry of Google Cloud. Expertise with data model development using
ERWIN. Excellent Communication and presentation skills. Available for phone/video conference interview.
Availability/Location:
Phone Interview: Anytime with 24 hours’ notice
WebEx Interview: Anytime with 24 hours’ notice
To Start: 1-2 weeks
Location: Dallas, TX
VZ Reference: NA
PROFESSIONAL SUMMARY:
8+ years of professional experience in information technology as Data Engineer with an expert hand
in the areas of Database Development, ETL Development, Data modelling, Report Development
and Big Data Technologies.
Experience in Data Integration and Data Warehousing using various ETL tools Informatica
PowerCenter, AWS Glue, SQL Server Integration Services (SSIS), Talend.
Experience in Designing Business Intelligence Solutions with Microsoft SQL Server and using MS
SQL Server Integration Services (SSIS), MS SQL Server Reporting Services (SSRS) and SQL Server
Analysis Services (SSAS).
Extensively used Informatica PowerCenter, Informatica Data Quality (IDQ) as ETL tool for
extracting, transforming, loading and cleansing data from various source data inputs to various
targets, in batch and real time.
Experience working with Amazon Web Services (AWS) cloud and its services like Snowflake, EC2,
S3, RDS, EMR, VPC, IAM, Elastic Load Balancing, Lambda, RedShift, Elastic Cache, Auto
Scaling, Cloud Front, Cloud Watch, Data Pipeline, DMS, Aurora, ETL and other AWS Services.
Strong expertise in Relational Data Base systems like Oracle, MS SQL Server, TeraData, MS
Access, DB2 design and database development using SQL, PL/SQL, SQL PLUS, TOAD, SQL-
LOADER.
Highly proficient in writing, testing and implementation of triggers, stored procedures, functions,
packages, Cursors using PL/SQL.
Proficient in writing Selenium Web Driver automation scripts in Java, Python, C#,
JavaScript using Maven, Cucumber, Ruby and TestNG Automation Testing for Web Applications
Hands on Experience with AWS Snowflake cloud data warehouse and AWS S3 bucket for
integrating data from multiple source system which include loading nested JSON formatted data into
Snowflake table.
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for
Data ingestion and transformation in GCP.
Extensive experience in integration of Informatica Data Quality (IDQ) with Informatica PowerCenter.
Extensive experience in Data Mining solutions to various business problems and generating data
visualizations using Tableau, PowerBI, Alteryx.
Well knowledge and experience in Cloudera ecosystem such as HDFS, Hive, SQOOP, HBASE,
Kafka, Data pipeline, Data analysis and processing with Hive SQL, IMPALA, SPARK, SPARK
SQL.
Worked with different scheduling tools like Talend Administrator Console(TAC), UC4/Atomic,
Tidal, Control M, Autosys, CRON TAB and TWS (Tivoli Workload Scheduler).
Experienced in design, development, Unit testing, integration, debugging and implementation and
production support, client interaction and understanding business application, business data flow and
data relations.
Using Flume, Kafka and Spark streaming to ingest real time or near real time data to HDFS.
Analysed data and provided insights with Python Pandas.
Worked on AWS Data Pipeline to configure data loads from S3 into Redshift.
Worked on Data Migration from Teradata to AWS Snowflake Environment using Python and BI
tools like Alteryx.
Experience in moving data between GCP and Azure using Azure Data Factory.
Developed Python scripts to parse the Flat Files, CSV, XML, JSON files and extract the data from
various sources and load the data into data warehouse.
Developed Automated scripts to do the migration using Unix shell scripting, Python, Oracle/TD
SQL, TD Macros and Procedures.
Good Knowledge on No SQL database like HBase, Cassandra.
Expert-level mastery in designing and developing complex mappings to extract data from diverse
sources including flat files, RDBMS tables, legacy system files, XML files, Applications, COBOL
Sources & Teradata.
Worked on JIRA for defect/issues logging & tracking and documented all my work using
CONFLUENCE.
Experience with ETL workflow Management tools like Apache Airflow and have significant
experience in writing the python scripts to implement the workflow.
Experience in identifying Bottlenecks in ETL Processes and Performance tuning of the production
applications using Database Tuning, Partitioning, Index Usage, Aggregate Tables, Session
partitioning, Load strategies, commit intervals and transformation tuning.
Worked on performance tuning of user queries by analyzing the explain plans, recreating the user
driver tables by right Primary Index, scheduled collection of statistics, secondary or various join
indexes.
Experience with scripting languages like PowerShell, Perl, Shell, etc.
Expert knowledge and experience in fact dimensional modelling (Star schema, Snow flake schema),
transactional modelling and SCD (Slowly changing dimension).
Create clusters in Google Cloud and manage the clusters using Kubernetes(k8s). Using Jenkins to
deploy code to Google Cloud, create new namespaces, creating docker images and pushing them to
container registry of Google Cloud.
Excellent interpersonal and communication skills, experienced in working with senior level managers,
business people and developers across multiple disciplines.
Strong problem solving, analytical and have the ability to work both independently and as a team.
Highly enthusiastic, self-motivated and rapidly assimilate with new concepts and technologies.
TECHINCAL SKILLS:
ETL Informatica Power Center 10.x/9.6/9.1, AWS Glue, Talend 5.6, SQL Server
Integration Services (SSIS)
Databases & Tools MS SQL Server 2014/2012/2008, Teradata 15/14, Oracle 11g10g, SQL
Assistant, Erwin 8/9, ER Studio
Cloud Environment AWS Snowflake, AWS RDS, AWS Aurora, Redshift, EC2, EMR, S3,
Lambda, Glue, Data Pipeline, Athena, Data Migration Services, SQS, SNS,
ELB, VPC, EBS,RDS, Route53, Cloud Watch, AWS Auto Scaling, Git, AWS
CLI, Jenkins, Microsoft Azure, Google Cloud Platform(GCP)
Reporting Tools Tableau, PowerBI
Big Data Ecosystem HDFS, Map Reduce, Hive/Impala, Pig, Sqoop, Hbase, Spark, Scala, Kafka.
Programming Languages Unix Shell Scripting, SQL, PL/SQL, Perl, Python, T-SQL
Data Warehousing & BI Star Schema, Snowflake schema, Facts and Dimensions tables, SAS, SSIS,
and Splunk
EDUCATION:
Bachelor's in CS, Jawaharlal Nehru Technological University, Hyderabad, 2012
PROFESSIONAL EXPERIENCE:
TJ Maxx Boston MA
Sr. Data Engineer 01/2020 – Till date
Responsibilities:
Experience in building and architecting multiple Data pipelines, end to end ETL and ELT process for
Data ingestion and transformation.
Perform Informatica Cloud Services, Informatica Power Center Administration ETL strategies and
ETL Informatica mapping. Setting up of Secure Agent and connect different applications and its Data
Connectors for processing the different kinds of data including unstructured (logs, click streams,
Shares, likes, topics etc..), semi structured (XML, JSON) and structured like RDBMS.
Worked extensively with AWS services like EC2, S3, VPC, ELB, Auto Scaling Groups, Route 53,
IAM, CloudTrail, CloudWatch, CloudFormation, CloudFront, SNS, and RDS.
Developed Python scripts to parse XML, Json files and load the data in AWS Snowflake Data
warehouse.
Outguessed the data from HDFS to Azure SQL data warehouse by building ETL pipelines using S
worked on various methods including data fusion, machine learning and improved the accuracy of
distinguished the right rules from potential rules.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources
like S3, Parquet/Text Files into AWS Redshift.
Build a program with Python and apache beam and execute it in cloud Dataflow to run Data
validation between raw source file and Big query tables.
Strong background in Data Warehousing, Business Intelligence and ETL process (Informatica, AWS
Glue) and expertise on working on Large data sets and analysis
Building a Scala and spark based configurable framework to connect common Data sources like
MYSQL, Oracle, Postgres, SQL Server, Salesforce, Big query and load it in Big query.
Extensive Knowledge and hands-on experience implementing PaaS, IaaS, SaaS style delivery models
inside the Enterprise (Data centre) and in Public Clouds using like AWS, Google Cloud, and
Kubernetes etc.
Expertise in Selenium Automation using Selenium WebDriver, Selenium Grid, Java,
JavaScript, Protractor, AnjularJS, Junit, ANT & TestNG.
Worked on documentation of al worked Extract. Transform and Load, Designed, developed and
validated and deployed the Talend ETL processes for the Data Warehouse team using PIG, Hive.
Applied required transformation using AWS Glue and loaded data back to Redshift and S3.
Extensively worked on making REST API (application program interface) calls to get the data as
JSON response and parse it.
Experience in analyzing and writing SQL queries to extract the data in Json format through Rest API
calls with API Keys, ADMIN Keys and Query Keys and load the data into Data warehouse.
Extensively worked on Informatica tools like source analyzer, mapping designer, workflow manager,
workflow monitor, Mapplets, Worklets and repository manager.
Building data pipeline ETLs for data movement to S3, then to Redshift.
Designed and implemented ETL pipelines between from various Relational Data Bases to the Data
Warehouse using Apache Airflow.
Worked on Data Extraction, aggregations and consolidation of Adobe data within AWS Glue using
PySpark.
Developed SSIS packages to Extract, Transform and Load ETL data into the SQL Server database
from the legacy mainframe data sources.
Worked on Building data pipelines in airflow in GCP for ETL related jobs using different airflow
operators.
Worked on Postman using HTTP requests to GET the data from RESTful API and validate the API
calls.
Hands-on experience with Informatica power center and power exchange in integrating with different
applications and relational databases
Prepared dashboards using Tableau for summarizing Configuration, Quotes, Orders and other e-
commerce data.
Created Informatica workflows and IDQ mappings for - Batch and Real Time.
Developed the Pysprk code for AWS Glue jobs and for EMR.
Created custom T-SQL procedures to read data from flat files to dump to SQL Server database using
SQL Server import and export data wizard.
Design and architect various layer of Data Lake.
Developed ETL python scripts for ingestion pipelines which run on AWS infrastructure setup of
EMR, S3, Redshift and Lambda.
Monitoring big query, Dataproc and cloud Data flow jobs via Stack driver for all environments.
Configured EC2 instances and configured IAM users and roles and created S3 data pipe using Boto
API to load data from internal data sources.
Hands on experience with Alteryx software for ETL, data preparation for EDA and performing spatial
and predictive analytics.
Provided Best Practice document for Docker, Jenkins, Puppet and GIT.
Expertise in implementing DevOps culture through CI/CD tools like Repos, Code Deploy, Code
Pipeline, GitHub.
Install and configured splunk Enterprise environment on Linux, Configured Universal and Heavy
forwarder.
Developed various Shell Scripts for scheduling various data cleansing scripts and loading process and
maintained the batch processes using Unix Shell Scripts.
Backing up AWS Postgres to S3 on daily job run on EMR using Data Frames.
Developed server-based web traffic using RESTful API's statistical analysis tool using Flask, Pandas.
Analyse various type of raw file like Json, Csv, Xml with Python using Pandas, Numpy etc.
Environment: Informatica Power Center 10.x/9.x, IDQ, AWS Redshift, Snowflake, S3, Postgres, Google
Cloud Platform(GCP), MS SQL Server, Big query, Salesforce Sql, Python, Postman, Tableau, Unix Shell
Scripting, EMR, GitHub.
AbbVie Chicago, IL
Data Engineer 10/2018 – 12/2019
Responsibilities:
Involved in full Software Development Life Cycle (SDLC) - Business Requirements Analysis,
preparation of Technical Design documents, Data Analysis, Logical and Physical database design,
Coding, Testing, Implementing, and deploying to business users.
Developed complex mappings using Informatica Power Center Designer to transform and load the
data from various source systems like Oracle, Teradata, and Sybase into the final target database.
Analyzed source data coming different sources like SQL Server tables, XML files and Flat files then
transformed according to business rules using Informatica and loaded the data in to target tables.
Designed and developed a number of complex mappings using various transformations like Source
Qualifier, Aggregator, Router, Joiner, Union, Expression, Lookup, Filter, Update Strategy, Stored
Procedure, Sequence Generator, etc.
Involved in creating the Tables in Greenplum and loading the data through Alteryx for Global Audit
Tracker.
Configured Maven for JAVA automation projects and developed Maven project object model
(POM).
Analyzed large and critical datasets using HDFS, HBase, Hive, HQL, PIG, Sqoop and Zookeeper.
Data Extraction, aggregations, and consolidation of Adobe data within AWS Glue using PySpark.
Developed Python scripts to automate the ETL process using Apache Airflow and CRON scripts in
the UNIX operating system as well.
Worked on Google Cloud Platform (GCP) services like compute engine, cloud load balancing, cloud
storage and cloud SQL.
Developed data engineering and ETL python scripts for ingestion pipelines which run on AWS
infrastructure setup of EMR, S3, Glue and Lambda.
Changing the existing Data Models using Erwin for Enhancements to the existing Datawarehouse
projects.
Designed, developed, and implemented POM based automation testing framework
utilizing Java, Test NG and Selenium WebDriver.
Used Talend connectors integrated to Redshift – BI Development for multiple technical projects
running in parallel.
Performed Query Optimization with the help of explain plans, collect statistics, Primary and
Secondary indexes. Used volatile table and derived queries for breaking up complex queries into
simpler queries. Streamlined the scripts and shell scripts migration process on the UNIX box.
Using g-cloud function with Python to load Data in to Big query for on arrival csv files in GCS
bucket.
Created iterative macro in Alteryx to send Json request and download Json response from webservice
and analyze the response data.
Migrated data from Transactional source systems to Redshift data warehouse using spark and AWS
EMR.
Experience in Google Cloud components, Google container builders and GCP client libraries.
Supported various business teams with Data Mining and Reporting by writing complex SQLs using
Basic and Advanced SQL including OLAP functions like Ranking, partitioning and windowing
functions, Etc.
Expertise in writing scripts for Data Extraction, Transformation and Loading of data from legacy
systems to target data warehouse using BTEQ, FastLoad, MultiLoad, and Tpump.
Worked with EMR, S3 and EC2 services in AWS cloud and Migrating servers, databases, and
applications from on premise to AWS.
Tuning SQL queries using Explain analyzing the data distribution among AMPs and index usage,
collect statistics, definition of indexes, revision of correlated sub queries, usage of Hash functions,
etc…
Developed shell scripts for job automation, which will generate the log file for every job.
Extensively used spark SQL and Data frames API in building spark applications.
Written complex SQLs using joins, sub queries and correlated sub queries. Expertise in SQL Queries
for cross verification of data.
Extensively worked on performance tuning of Informatica and IDQ mappings.
Creating, maintain, support, repair, customizing System & Splunk applications, search queries and
dashboards.
Experience on data profiling & various data quality rules development using Informatica Data Quality
(IDQ).
Create new UNIX scripts to automate and to handle different file processing, editing and execution
sequences with shell scripting by using basic Unix commands and ‘awk’, ‘sed’ editing languages.
Experience in cloud versioning technologies like GitHub.
Integrate Collibra with Data Lake using Collibra connect API.
Design and Develop ETL Processes in AWS Glue to migrate Campaign data from external sources
like S3, ORC/Parquet/Text Files into AWS Redshift.
Create firewall rules to access Google Data proc from other machines.
Write Scala program for spark transformation in Dataproc.
Providing technical support and guidance to the offshore team to address complex business problems.
Environment: Informatica Power Center 9.5, AWS Glue, Talend, Google Cloud Platform(GCP), PostgreSQL
Server, Python, Oracle, Teradata, CRON, Unix Shel Scripting, SQL, Erwin, AWS Redshift, GitHub, EMR