DB Admin Technical Test
DB Admin Technical Test
DB Admin Technical Test
Take-Home Tests
This section is using AWS terms, but should be applicable with other public cloud services as
well.
Overview
You are working as a Data Engineer in ACME and it's your first day at the office
The hiring manager shows you around the office and at explain to you about the current data
ingestion architecture in the company :
Property Value
Bucket Name interview-bucket
Prefix datasets/sample.csv
Access Key ID "accesskeyid"
Secret Access Key "supersecretaccesskey"
2. An Amazon PostgreSQL RDS instance that is running in DATABASES account with the
following properties:
Property Value
Hostname interview-db.ap-southeast-3.rds.amazonaws.com
Port 5432
Instance name interviewInstance
Schema name interviewSchema
Table name interviewTable
Username candidate
Password candidatepassword
Table size 20 GB
Estimated ingestion time from start to finish 3 hours
3. An on-premise MS-SQL DB server that is deployed in COLO with the following properties :
Property Value
Hostname INTERVIEWDB.DATASINTESA.NET
Port 1433
Instance name interviewInstance
Schema name interviewSchema
Table name interviewTable
Username candidate
Password candidatepassword
Table size 50 MB
Estimated ingestion time from start to finish less than 15 minutes
4. A third-party API which exposes the data layer as REST endpoint to the client with the
following properties:
Property Value
URL https://interview-api-datasintesa.net/table_name=interview_table
Returned object JSON
Username candidate
Password candidatepassword
Goals
1. Write brief explanation on how you can access and ingest datasets from (1), (2), (3), (4).
2. Write example code to ingest the data from scenario (1), (2), (3), (4)
3. In which situation would you use a NoSQL database in your ETL pipeline?
4. Cloud services knowledge check :
1. Which AWS services to use and why? (Note : you can use equivalent Azure /
GCP services)
2. What programming language and framework to use and why?
3. The high-level data flow from source to sink e.g. do you have to mutate the data
in any way?
4. Credentials management, i.e. where will you put the username and password to
access the source system in your pipeline.
Outputs
1. Submit your answer 2 days after the hiring team sent you the questions.
2. Wrote your answer as a text file (.txt, .word, etc)
3. Code block must be formatted in an human-readable way.