Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
458 views

Task - Data Engineering

The document provides instructions for a scraping task. It asks the user to choose one of two options - either scraping time series data from NASA, NOAA, BEA or Google Data Commons, or scraping projects and tender data from sources like the World Bank, Chinese government procurement sites, or the Indian government eProcurement site. It lists requirements like using a Python class scraper, outputting to CSV, including metadata, and presenting the data. The submission will be evaluated on data quality and code quality like optimization and modularity. The deadline is 3 days and code should be pushed to a specified GitHub repo.

Uploaded by

Last Over
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
458 views

Task - Data Engineering

The document provides instructions for a scraping task. It asks the user to choose one of two options - either scraping time series data from NASA, NOAA, BEA or Google Data Commons, or scraping projects and tender data from sources like the World Bank, Chinese government procurement sites, or the Indian government eProcurement site. It lists requirements like using a Python class scraper, outputting to CSV, including metadata, and presenting the data. The submission will be evaluated on data quality and code quality like optimization and modularity. The deadline is 3 days and code should be pushed to a specified GitHub repo.

Uploaded by

Last Over
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

About the Task

Create a scraper to get data from one of the following websites. The scrapper file should be in
the .py format and scrapper must have a single python class which will be called to get the required data.
The output should be in the csv format. Requirements:
● Only pick one of your trial tasks from the sources listed below
Note: This is also a gauge of which type of data structures you are most comfortable with.
● Create scrapper and follow evaluation guidelines below
● Build clean standards, data should contain metadata along with all the values present in the
dataset.
● Simple way to present your data in map, graphs or charts to provide synthesis and show
analytical skills in a short report
The submission will be evaluated on the quality of the data output as well as code. Scrapper should be
well optimized and able to handle large amounts of data. The deadline for the task is 3 days. Upload
your code in your GitHub repo and push your code for us to evaluate.
Learn more about our data standards: https://developer.taiyo.ai/api-doc/StandardLib/

Pick only ONE (Either 1 or 2) from below

1. Time Series Data (Fork Branch and Push your code to: https://github.com/Taiyo-ai/ts-mesh-pipeline)
Time Series Data Standards (to follow): https://developer.taiyo.ai/api-doc/TimeSeries/

● NASA Earth Data: https://www.earthdata.nasa.gov/engage/open-data-services-and-software/api


● NOAA World Data: https://www.nnvl.noaa.gov/view/globaldata.html
● Bureau of Economic Analysis: Write a generalist harvester that could be scaled across BEA data products
● Google Data Commons: Pick a generalist harvester that could be scaled across Data Commons

2. Projects and Tenders (Fork Branch and Push your code to:
https://github.com/Taiyo-ai/pt-mesh-pipeline)
Projects and Tenders Data Standards (to follow): https://developer.taiyo.ai/api-doc/ProjectsandTenders/

Scrap data for the following sources by getting details of all the tenders present on the website:
● World Bank Evaluation and Ratings: https://ieg.worldbankgroup.org/data
● China Procurement Sources:
○ https://www.chinabidding.com/en
○ http://www.ggzy.gov.cn/
○ http://en.chinabidding.mofcom.gov.cn/
○ https://www.cpppc.org/en/PPPyd.jhtml
○ https://www.cpppc.org:8082/inforpublic/homepage.html#/searchresult
● E-procurement Government of India: https://etenders.gov.in/eprocure/app

Evaluation Guidelines:
Evaluation is based on the following parameters:
● Web Scraping Standards and Libraries used
○ Update requirements.txt for packages used in sample solution
● Modular, DRY Code
○ Follow Sample/Dummy Projects Directory/Packages Structure
○ Python Packages handling and client.py/main.py for calling different steps/module of
code is must
● Config Params or Control Params using External ENV Variables, Unit Tests & Logging Standards
● Working solution with control of config/Params driven/triggered using client.py/main.py package
file.

❖ Kindly find the survey form: Behavioral Survey

You might also like