Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Data Engineer Assignment

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Data Engineer Assignment (code)

Data Engineer Assignment (code)............................................................................................................... 1


Objective:............................................................................................................................................. 2
Technical Stack:................................................................................................................................... 2
Backend Requirements:.......................................................................................................................2
API Endpoints:............................................................................................................................... 2
Database:.......................................................................................................................................2
Integration with IoT:........................................................................................................................ 3
Frontend Requirements::............................................................................................................... 3
Containerization:............................................................................................................................ 3
Docker Compose:.......................................................................................................................... 3
Source Code:................................................................................................................................. 3
Documentation:.............................................................................................................................. 3
Evaluation Criteria:.........................................................................................................................3
Additional Notes:............................................................................................................................ 4
Data Architecture Assignment (presentation)............................................................................................ 5
Introduction................................................................................................................................................ 5
Objective.................................................................................................................................................... 5
Inputs......................................................................................................................................................... 6
Outcomes...................................................................................................................................................6
Wishlist.......................................................................................................................................................7
Possible Structure......................................................................................................................................9
Deadline.....................................................................................................................................................9
Thank you for your interest in the Senior Data Engineer at Tenderd. As part of our selection
process, we request you to undertake the following task to demonstrate your abilities and
expertise.

Objective:

Task: Your task is to develop Task: Your task is to develop a robust data engineering pipeline.

1. You will be given a dataset to calibrate your data pipeline.


2. You write your code without knowledge of what anomalous data may be fed through this
pipeline (much like in the real world)
3. You should write a script that
a. processes this data and write it into a database
b. print out some summary of the potential data issues etc.
4. After submission we will feed in anomalous data in the same format as the original dataset
and we would like your script to flag the issues back to us

Project Specifications:

Technical Stack:

Preferred tool: Prefect


But any scripting language and/or framework is acceptable.

Dataset

● Your initial dataset is at the link below, please download it from Kaggle:
○ https://www.kaggle.com/datasets/atulanandjha/temperature-readings-iot-devices/data
● The dataset we will feed is going to be on the same order of magnitude of size as the original
dataset but feel free to suggest in your documentation how you would scale it up

Database:

Design a schema for database in any tool.

Recommendation: Use SQL as a database system.

Optional tasks:

● Testing and Documentation


● Perform unit and integration tests.
● Dockerization
● Deployment strategy (feel free to just write a description or a diagram, no code is necessary but
would be a bonus point)z

Deliverables:

● Source code:
○ Provide a Git repository with source code, Docker files (if applicable).
● README

Evaluation Criteria:

● Code quality / readability


● Code performance
● Number of data issues flagged
● Clarity of documentation

Additional Notes:

This assignment is intended to help us evaluate your technical and problem-solving skills relevant
to Tenderd's innovative approach in fleet management.

It gives you the opportunity to show your proficiency in critical areas to our mission, including
sustainable fleet operations and cutting-edge software practices.

The inclusion of Dockerization as an optional component allows you to display your skills in
modern deployment strategies.

The solutions you provide will provide insight into your potential contributions to revolutionizing
fleet management.
Data Architecture Assignment (presentation)
Purpose: Assess the candidates skills in tackling a hard, conceptual engineering problem in data
architecture

Introduction
Hello and many thanks for considering Tenderd! We are excited to test out your journey in a brief (4-8
hours) assignment. We think this is an essential part of the process for us to know how you will help us
handle some of our toughest challenges and for you to know whether you would enjoy the work.

The assignment is inspired by some of the current challenges so please treat this as confidential but it is
not a one-to-one mapping of our current challenges.

This is meant to be a conversational exercise so if you would like to request any further information that will
help you, please do not hesitate to reach out.

Lastly and most importantly—this exercise is not intended to fit you into some checkbox exercise and it is
primarily designed to assess how you think, not whether you fit into some one-size-fits-all scoring criteria.
We are considering you to demonstrate high-level engineering thinking and work—so feel free to show that
to us in any way you see fit. Thus if you think that parts of this assignment are focusing on the wrong thing,
feel free to tell us that. But do try to fulfill the sprint (if not the letter) of the assignment—just as you would
with a trusted manager.

Any questions, please feel free to reach out to: Jakub Langr

Objective
Come up with a data architecture strategy that will take us to a robust data architecture that can support
further growth in:

● scale
● integrity
● complexity
● timeliness
● further engineering being onboarded
And at the same time improve the overall quality of the product.
Inputs
Here is a hypothetical but verisimilar example of our current data ingestion process to give you an idea of
what to think about, rather than create this idea in a vacuum:
Full image: TrackerService_saveProvider (1).png

If you would like a brief description of the platform, then we can arrange a call.

Outcomes
The outcome of this assignment should be a 10-20 page slide presentation describing the approach taken,
including key deadlines and milestones. It should focus on the:

● Description of a “final” state (we understand no data work is ever truly final), but what would be a
good milestone
○ Especially noting what is in scope and out of scope
○ What DB technology to choose and how to organize it
○ What architecture to choose (eg, microservices etc.) and how to integrate it with overall data
flow
● Key stages (eg what is done when), how is it tested, when is put into production
● Toolchain selected (eg, which Open Source, Cloud tools etc.) would be used to achieve this
outcome
○ Most quickly
○ Most reliably
○ Most economically
● A sense of the tradeoffs and risks

Optionally, please include:


● any written documents you wish to give as further context
● any code / prototype prepared to demonstrate this effort
Wishlist
Below is a incomplete wishlist of what we would like the next solution to look like, but also feel free to focus
on just a few key concepts rather than everything

Dimension Description Example

Accuracy Correctness of data, how well it -


reflects real-world values.

Completeness Ensure all data is present without -


gaps.

Consistency Data is uniform across different Granular data matches data in


sources and platforms. aggregated tables

Timeliness Reflects how up-to-date the data Data coming in is registered at


is, ensure data is coming it at the the correct time & without delays.
correct time without delays or
mismatches.

Integrity Ensures the accuracy and Fuel consumption shouldn’t be 0


reliability of relationships when the vehicle is active.
between different pieces of data.
Data integrity is crucial for
maintaining overall database
reliability.

Format and Structure Check the standardization of Time formats must be consistent
data formats and structures across the same collection

Audit Logs To be able to tell who viewed Eg Saurav went to see user data
what data when etc. at 4 pm Saturday

Composable Human Eg, Excavators can be easily Easy to recreate a whole


Understanding / Ontology composed into an excavator view excavator object to quickly
that will have all the associated understand key information like
factors (productivity, when did this excavator change
maintenance, emissions etc.), ingestor, how did it impact
configs (device configs, SIM productivity, what is the data cap
cards, ingestor config etc.) and still remaining etc.
group data

Easily Testable Data Quality Checks should Eg, easy to test non-negative
easily test any logical edge emissions, future/1990s dates,
conditions outlandishly high values (1M t
CO2 in a day), etc.

Easily Traceable There should be clear data E.g. The high productivity value
lineage (ideally with data diff on the 10th March was a
tools) to clearly identify source of combination of the threshold
any thresholds/set values, value set on 9th March and the
computations, data involved etc. incoming data, which was
subsequently backfilled. This was
the only row affected

Versioned Schema The Data Schema should be E.g. column fuel_adjustments


versioned so we can always was added on the 11th March
re-create the state of the DB at and was populated with 1s
the time

Mutation Information All key data points should have E.g, this datapoint was received
created_at and last_modified_at on the 9th March, was modified
on the 11th March and
(optionally?) how many times it
was modified
Possible Structure
Here is one way to think about the problem, which is certainly not the only or even the best way to think
about how to structure work, but we would be looking for a lot more detail in the final solution:

Deadline
Recommended deadline is a week from receipt of the assignment. If you need more time, please let us
know and we will advise further.

You might also like