PDF Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download
PDF Data Ingestion with Python Cookbook: A practical guide to ingesting, monitoring, and identifying errors in the data ingestion process 1st Edition Esppenchutz download
com
OR CLICK HERE
DOWLOAD NOW
https://ebookmass.com/product/python-data-cleaning-cookbook-second-
edition-michael-walker/
ebookmass.com
https://ebookmass.com/product/study-guide-for-pathophysiology-the-
biological-basis-disease-in-adults-and-ebook-pdf-version/
ebookmass.com
Cherished by the Agent (In Clear Sight Book 2) Kennedy L.
Mitchell
https://ebookmass.com/product/cherished-by-the-agent-in-clear-sight-
book-2-kennedy-l-mitchell/
ebookmass.com
https://ebookmass.com/product/unthinkable-anna-hill/
ebookmass.com
https://ebookmass.com/product/energy-systems-and-sustainability-third-
edition-bob-everett/
ebookmass.com
https://ebookmass.com/product/evolution-of-a-taboo-pigs-and-people-in-
the-ancient-near-east-max-d-price/
ebookmass.com
https://ebookmass.com/product/dad-jokes-for-kids-350-silly-laugh-out-
loud-jokes-for-the-whole-family-jimmy-niro/
ebookmass.com
Zoology 12th Edition Stephen A. Miller
https://ebookmass.com/product/zoology-12th-edition-stephen-a-miller/
ebookmass.com
Data Ingestion with Python
Cookbook
Gláucia Esppenchutz
BIRMINGHAM—MUMBAI
Data Ingestion with Python Cookbook
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted
in any form or by any means, without the prior written permission of the publisher, except in the case
of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information
presented. However, the information contained in this book is sold without warranty, either express
or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable
for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and
products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot
guarantee the accuracy of this information.
ISBN 978-1-83763-260-2
www.packtpub.com
This book represents a lot and wouldn’t be possible without my loving husband, Lincoln, and his
support and understanding during this challenging endeavor. I want to thank all my friends that
didn’t let me give up and always boosted my spirits, along with my grandmother, who always believed,
helped, and said I would do big things one day. Finally, I want to thank my beloved and four-pawed
best friend, who is at peace, Minduim, for “helping” me to write this book.
– Gláucia Esppenchutz
Contributors
I want to thank my patient and beloved husband and my friends. Thanks also to my mentors in the
Python open source community and the DataBootCamp founders, who guided me at the beginning
of my journey.
Thanks to the Packt team, who helped me through some hard times; you were terrific!
About the reviewers
Bitthal Khaitan is currently working as a big data and cloud engineer with CVS Health, a Fortune
4 organization. He has a demonstrated history of working in the cloud, data and analytics industry
for 12+ years. His primary certified skills are Google Cloud Platform (GCP), the big data ecosystem
(Hadoop, Spark, etc.), and data warehousing on Teradata. He has worked in all phases of the SDLC
of DW/BI and big data projects with strong expertise in the USA healthcare, insurance and retail
domains. He actively helps new graduates with mentoring, resume reviews, and job hunting tips in
the data engineering domain. Over 20,000 people follow Bitthal on LinkedIn. He is currently based
out of Dallas, Texas, USA.
Jagjeet Makhija is a highly accomplished technology leader with over 20 years of experience. They are
skilled not only in various domains including AI, data warehouse architecture, and business analytics,
but also have a strong passion for staying ahead of technology trends such as AI and ChatGPT.
Jagjeet is recognized for their significant contributions to the industry, particularly in complex proof
of concepts and integrating Microsoft products with ChatGPT. They are also an avid book reviewer
and have actively shared their extensive knowledge and expertise through presentations, blog articles,
and online forums.
Krishnan Raghavan is an IT professional with over 20 years of experience in the area of software
development and delivery excellence across multiple domains and technology, ranging from C++ to
Java, Python, data warehousing, and big data tools and technologies. Krishnan tries to give back to the
community by being part of GDG – Pune Volunteer Group, helping the team in organizing events.
When not working, Krishnan likes to spend time with his wife and daughter, as well as reading fiction,
non-fiction, and technical books. Currently, he is unsuccessfully trying to learn how to play the guitar.
You can connect with Krishnan at mail to: k r i s h n a n @ g m a i l . c o m or via
LinkedIn: www.linkedin.com/in/krishnan-raghavan
I would like to thank my wife, Anita, and daughter, Ananya, for giving me the time and space to
review this book.
Table of Contents
Prefacexv
2
Principals of Data Access – Accessing Your Data 31
Technical requirements 31 How to do it… 47
Implementing governance in a data How it works… 48
access workflow 32 There’s more… 49
Getting ready 32 See also 52
How to do it… 33 Managing encrypted files 52
How it works… 34 Getting ready 52
See also 34 How to do it… 53
Accessing databases and data How it works… 54
warehouses34 There’s more… 55
Getting ready 35 See also 56
How to do it… 35 Accessing data from AWS using S3 56
How it works… 37 Getting ready 56
There’s more… 38 How to do it… 59
See also 39 How it works… 62
Accessing SSH File Transfer Protocol There’s more… 63
(SFTP) files 39 See also 63
Getting ready 39 Accessing data from GCP using
How to do it… 41 Cloud Storage 64
How it works… 43 Getting ready 64
There’s more… 43 How to do it… 66
See also 44 How it works… 68
Retrieving data using API There’s more… 70
authentication44 Further reading 70
Getting ready 45
3
Data Discovery – Understanding Our Data before Ingesting It 71
Technical requirements 71 How to do it… 73
Documenting the data discovery How it works… 77
process71 Configuring OpenMetadata 77
Getting ready 72 Getting ready 77
Table of Contents ix
4
Reading CSV and JSON Files and Solving Problems 95
Technical requirements 95 How it works… 105
Reading a CSV file 96 There’s more… 106
See also 107
Getting ready 96
How to do it… 96 Using PySpark to read CSV files 108
How it works… 98 Getting ready 108
There’s more… 98 How to do it… 108
See also 99 How it works… 109
Reading a JSON file 99 There’s more… 110
See also 114
Getting ready 100
How to do it… 100 Using PySpark to read JSON files 114
How it works… 100 Getting ready 114
There’s more… 101 How to do it… 115
See also 103 How it works… 116
Creating a SparkSession for PySpark 103 There’s more… 117
See also 117
Getting ready 103
How to do it… 104 Further reading 117
5
Ingesting Data from Structured and Unstructured Databases 119
Technical requirements 119 There’s more… 127
Configuring a JDBC connection 120 See also 129
6
Using PySpark with Defined and Non-Defined Schemas 159
Technical requirements 159 How to do it… 169
Applying schemas to data ingestion 160 How it works… 170
7
Ingesting Analytical Data 181
Technical requirements 181 How it works… 197
Ingesting Parquet files 182 There’s more… 198
See also 200
Getting ready 182
How to do it… 183 Ingesting partitioned data 200
How it works… 184 Getting ready 200
There’s more… 185 How to do it… 201
See also 185 How it works… 201
Ingesting Avro files 185 There’s more… 203
See also 204
Getting ready 186
How to do it… 186 Applying reverse ETL 204
How it works… 188 Getting ready 204
There’s more… 190 How to do it… 205
See also 190 How it works… 206
Applying schemas to analytical data 191 There’s more… 207
See also 207
Getting ready 191
How to do it… 191 Selecting analytical data for reverse
How it works… 194 ETL207
There’s more… 194 Getting ready 207
See also 195 How to do it… 208
Filtering data and handling common How it works… 209
issues195 See also 210
9
Putting Everything Together with Airflow 243
Technical requirements 244 How to do it… 257
Installing Airflow 244 How it works… 260
There's more… 262
Configuring Airflow 244 See also 262
Getting ready 244
How to do it… 245 Configuring sensors 262
How it works… 247 Getting ready 262
See also 248 How to do it… 263
How it works… 264
Creating DAGs 248 See also 265
Getting ready 248
How to do it… 249 Creating connectors in Airflow 265
How it works… 253 Getting ready 266
There's more… 254 How to do it… 266
See also 255 How it works… 269
There's more… 270
Creating custom operators 255 See also 270
Getting ready 255
Table of Contents xiii
10
Logging and Monitoring Your Data Ingest in Airflow 281
Technical requirements 281 Designing advanced monitoring 304
Installing and running Airflow 282 Getting ready 304
How to do it… 306
Creating basic logs in Airflow 283
How it works… 308
Getting ready 284
There’s more… 309
How to do it… 284
See also 309
How it works… 287
See also 289 Using notification operators 309
Getting ready 310
Storing log files in a remote location 289
How to do it… 312
Getting ready 289
How it works… 315
How to do it… 290
There’s more… 318
How it works… 298
See also 299 Using SQL operators for data quality 318
Getting ready 318
Configuring logs in airflow.cfg 299
How to do it… 320
Getting ready 299
How it works… 321
How to do it… 299
There’s more… 323
How it works… 301
See also 323
There’s more… 303
See also 304 Further reading 324
11
Automating Your Data Ingestion Pipelines 325
Technical requirements 325 Scheduling daily ingestions 326
Installing and running Airflow 326 Getting ready 327
xiv Table of Contents
12
Using Data Observability for Debugging, Error Handling,
and Preventing Downtime 349
Technical requirements 349 Getting ready 358
Docker images 350 How to do it… 358
How it works… 361
Setting up StatsD for monitoring 351 There’s more… 363
Getting ready 351
How to do it… 351 Creating an observability dashboard 363
How it works… 353 Getting ready 363
See also 354 How to do it… 363
How it works… 369
Setting up Prometheus for storing There’s more… 370
metrics354
Getting ready 354 Setting custom alerts or notifications 370
How to do it… 354 Getting ready 371
How it works… 356 How to do it… 371
There’s more… 357 How it works… 377
Index379
Title: Au Hoggar
mission de 1922
Language: French
Au Hoggar
MISSION DE 1922
PARIS
SOCIÉTÉ D’ÉDITIONS
GÉOGRAPHIQUES, MARITIMES ET COLONIALES
ANCIENNE MAISON CHALLAMEL, FONDÉE EN 1839
17, rue Jacob (VIe)
1925
A
M. E.-F. GAUTIER
EN HOMMAGE
DE RESPECTUEUSE ADMIRATION
C. K.
INTRODUCTION
[1] De ce « côté sportif » des explorations, je crois que l’on peut dire
qu’il est passionnant à vivre, agréable à raconter, supportable à
écouter et odieux à lire. C’est pourquoi je n’en ai point écrit ici.
Conrad KILIAN
MISSION DE 1922 AU HOGGAR
ITINÉRAIRE GÉNÉRAL
(Agrandissement)
Hoggar (Arabe). = Ahaggar (Tamahak).
PREMIÈRE PARTIE
I
ÉTUDES GÉOLOGIQUES
*
* *
*
* *
*
* *
D’une part :
a) A mon passage à Tanezrouft, j’ai constaté qu’en ce point où
l’on fait traverser la Hamada de Tinghert par l’Igharghar, il y a bien
un oued, mais qu’il coule du Nord vers le Sud, du Nord de la daia
Tanezrouft à la daia Tanezrouft, au lieu de se diriger du Sud vers le
Nord ;
b) Il m’a semblé que la Hamada n’était franchie nulle part par
l’Igharghar. Des militaires qui avaient parcouru cette région m’ont
déclaré avoir eu la même impression. Je n’ai encore pu trouver
personne qui ait vu, autre part que sur la carte, l’Igharghar traverser
la Hamada ;
c) Dans la dépression qui suit le Bâten (versant à falaises) de la
Hamada au Sud, on rencontre en abondance Corbicula saharica P.
Fischer et Melania tuberculata Mâll., faune sub-actuelle qui semble
indiquer l’existence récente dans cette dépression d’une vaste
« daia » ou d’une série de « daia » dans laquelle ou dans lesquelles
les eaux venant du Sud se réunissaient.
Une partie de cette eau devait disparaître par évaporation, une
autre partie pouvait être absorbée par les graviers, grès friables et
autres formations crétacées perméables, s’enfoncer sous le plateau
crétacé suivant le pendage si régulier de ces terrains vers le Nord et
emprisonnées par les formations argileuses et marneuses
intercalées dans ce Crétacé, alimenter le Nord en eaux artésiennes
par une circulation sous pression en profondeur, dans le fond de la
vaste cuvette crétacée comme cela continue à se produire
actuellement.
Certaines « reculées » dans la Hamada de Tinghert, qui ont
d’ailleurs donné leur nom à la Hamada[9], semblent comme des
« manches » et des « culs-de-sac » d’absorption.
Et il convient de signaler également la présence d’entonnoirs
d’effondrements et d’absorptions dus aux formations de gypse dans
la Hamada, qui favorisent la disparition des eaux superficielles et
jouent un rôle important pour la compréhension de la circulation
souterraine de l’eau dans ces régions.
D’autre part :
Welcome to our website – the ideal destination for book lovers and
knowledge seekers. With a mission to inspire endlessly, we offer a
vast collection of books, ranging from classic literary works to
specialized publications, self-development books, and children's
literature. Each book is a new journey of discovery, expanding
knowledge and enriching the soul of the reade
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookmass.com