Data Engineer Assignment
Data Engineer Assignment
Data Engineer Assignment
Objective:
Task: Your task is to develop Task: Your task is to develop a robust data engineering pipeline.
Project Specifications:
Technical Stack:
Dataset
● Your initial dataset is at the link below, please download it from Kaggle:
○ https://www.kaggle.com/datasets/atulanandjha/temperature-readings-iot-devices/data
● The dataset we will feed is going to be on the same order of magnitude of size as the original
dataset but feel free to suggest in your documentation how you would scale it up
Database:
Optional tasks:
Deliverables:
● Source code:
○ Provide a Git repository with source code, Docker files (if applicable).
● README
Evaluation Criteria:
Additional Notes:
This assignment is intended to help us evaluate your technical and problem-solving skills relevant
to Tenderd's innovative approach in fleet management.
It gives you the opportunity to show your proficiency in critical areas to our mission, including
sustainable fleet operations and cutting-edge software practices.
The inclusion of Dockerization as an optional component allows you to display your skills in
modern deployment strategies.
The solutions you provide will provide insight into your potential contributions to revolutionizing
fleet management.
Data Architecture Assignment (presentation)
Purpose: Assess the candidates skills in tackling a hard, conceptual engineering problem in data
architecture
Introduction
Hello and many thanks for considering Tenderd! We are excited to test out your journey in a brief (4-8
hours) assignment. We think this is an essential part of the process for us to know how you will help us
handle some of our toughest challenges and for you to know whether you would enjoy the work.
The assignment is inspired by some of the current challenges so please treat this as confidential but it is
not a one-to-one mapping of our current challenges.
This is meant to be a conversational exercise so if you would like to request any further information that will
help you, please do not hesitate to reach out.
Lastly and most importantly—this exercise is not intended to fit you into some checkbox exercise and it is
primarily designed to assess how you think, not whether you fit into some one-size-fits-all scoring criteria.
We are considering you to demonstrate high-level engineering thinking and work—so feel free to show that
to us in any way you see fit. Thus if you think that parts of this assignment are focusing on the wrong thing,
feel free to tell us that. But do try to fulfill the sprint (if not the letter) of the assignment—just as you would
with a trusted manager.
Any questions, please feel free to reach out to: Jakub Langr
Objective
Come up with a data architecture strategy that will take us to a robust data architecture that can support
further growth in:
● scale
● integrity
● complexity
● timeliness
● further engineering being onboarded
And at the same time improve the overall quality of the product.
Inputs
Here is a hypothetical but verisimilar example of our current data ingestion process to give you an idea of
what to think about, rather than create this idea in a vacuum:
Full image: TrackerService_saveProvider (1).png
If you would like a brief description of the platform, then we can arrange a call.
Outcomes
The outcome of this assignment should be a 10-20 page slide presentation describing the approach taken,
including key deadlines and milestones. It should focus on the:
● Description of a “final” state (we understand no data work is ever truly final), but what would be a
good milestone
○ Especially noting what is in scope and out of scope
○ What DB technology to choose and how to organize it
○ What architecture to choose (eg, microservices etc.) and how to integrate it with overall data
flow
● Key stages (eg what is done when), how is it tested, when is put into production
● Toolchain selected (eg, which Open Source, Cloud tools etc.) would be used to achieve this
outcome
○ Most quickly
○ Most reliably
○ Most economically
● A sense of the tradeoffs and risks
Format and Structure Check the standardization of Time formats must be consistent
data formats and structures across the same collection
Audit Logs To be able to tell who viewed Eg Saurav went to see user data
what data when etc. at 4 pm Saturday
Easily Testable Data Quality Checks should Eg, easy to test non-negative
easily test any logical edge emissions, future/1990s dates,
conditions outlandishly high values (1M t
CO2 in a day), etc.
Easily Traceable There should be clear data E.g. The high productivity value
lineage (ideally with data diff on the 10th March was a
tools) to clearly identify source of combination of the threshold
any thresholds/set values, value set on 9th March and the
computations, data involved etc. incoming data, which was
subsequently backfilled. This was
the only row affected
Mutation Information All key data points should have E.g, this datapoint was received
created_at and last_modified_at on the 9th March, was modified
on the 11th March and
(optionally?) how many times it
was modified
Possible Structure
Here is one way to think about the problem, which is certainly not the only or even the best way to think
about how to structure work, but we would be looking for a lot more detail in the final solution:
Deadline
Recommended deadline is a week from receipt of the assignment. If you need more time, please let us
know and we will advise further.