03-Jupyter Markdown Python
03-Jupyter Markdown Python
4.0
fit@hcmus
WHY SHOULD A DATA SCIENTIST HAVE
JUPYTER
NOTEBOOK IN HIS/HER TOOLBOX?
2
Why Jupyter notebook
A data science process is a process of finding an
answer for a question through data; this is a long
process …
During this process, we want to write text, code,
write text, code, ...
Documenting the process not only helps us to
review and continue our work next day but also
helps stakeholders to verify our results at the end
The coding style in data science is also quite
special: exploratory — code some lines, observe
results, code some lines, observe results, …
3
Why Jupyter notebook
4
Why Jupyter notebook
Jupyter Notebook is a good tool for our need
Jupyter Notebook is a “notebook” allowing
us to:
Write text (using Markdown)
Write code and run code (using Python, but
other programming languages are also
supported)
Output of the code
Create interactive data visualizations
Jupyter Notebook also allow us to code in
exploratory style
5
Offers a single document that
contains
Visualizations
Mathematical equations
Statistical modeling
Narrative text
Any other rich media
This single document approach enables users to
develop, visualize the results and add information,
charts, and formulas that make work more
understandable, repeatable, and shareable.
6
Two variants of the Jupyter
notebook
Jupyter Classic Notebook, with all the
capabilities mentioned above.
JupyterLab, a new next-generation
notebook interface designed to be much
more extensible and modular, with
support for a wide variety of workflows
from data science, machine learning, and
scientific computing.
7
What is Jupyter Notebook
"notebook" or "notebook documents" denote
documents that contain both code and rich
text elements, such as figures, links,
equations, ...
Because of the mix of code and text
elements, these documents are the ideal
place to bring together an analysis
description, and its results, as well as, they
can be executed perform the data analysis in
real time.
The Jupyter Notebook App produces these
documents.
8
Jupyter Notebook App
As a server-client application,
the Jupyter Notebook App
allows you to edit and run your
notebooks via a web browser.
The application can be
executed on a PC without
Internet access, or it can be
installed on a remote server,
where you can access it
through the Internet.
9
https://www.coursera.org/learn/open-source-tools-for-data-science/
Jupyter Notebook App
10
Jupyter Notebook App
Its two main components are the kernels and a
dashboard.
A kernel is a program that runs and introspects
the user’s code. The Jupyter Notebook App has a
kernel for Python code, but there are also kernels
available for other programming languages.
The dashboard of the application not only shows
you the notebook documents that you have made
and can reopen but can also be used to manage
the kernels: you can which ones are running and
shut them down if necessary.
11
Use your Jupyter Notebooks
12
Use your Jupyter Notebooks
Don't forget to name your notebook documents!
Try to keep the cells of your notebook simple: don't
exceed the width of your cell and make sure that you
don't put too many related functions in one cell.
If possible, import your packages in the first code cell
of your notebook
Display the graphics inline.
Sometimes, your notebook can become quite code-
heavy, or maybe you just want to have a cleaner
report. In those cases, you could consider hiding
some of this code. You can already hide some of the
code by using magic commands such as %run to
execute a whole Python script as if it was in a
notebook cell.
How to hide code
13
Share your Jupyter Notebooks
Share .ipynb
Click “Cell > All Output > Clear”
Click “Kernel > Restart & Run All”
Wait for your code cells to finish executing
and check they did so as expected
14
Share your Jupyter Notebooks
File>Download as
jupyter nbconvert --to html Untitled4.ipynb
15
Jupyter Notebooks for Data Science
Teams: Best Practices
Use two types of notebooks for a data science
project, namely, a lab notebook and a
deliverable notebook. The difference between
the two is the fact that individuals control the lab
notebook, while the deliverable notebook is
controlled by the whole data science team,
Use some type of versioning control (Git, Github,
...). Don't forget to commit also the HTML file if
your version control system lacks rendering
capabilities, and
Use explicit rules on the naming of your
documents.
For more information 16
Learn From the Best Notebooks
17
Jupyter notebook reference
https://jupyter-
notebook.readthedocs.io/en/stable/notebo
ok.html
https://www.edureka.co/blog/wp-
content/uploads/2018/10/Jupyter_Notebo
ok_CheatSheet_Edureka.pdf
18
Markdown
19
Markdown
20
Markdown - Reference
https://www.markdownguide.org/basic-
syntax/
https://www.markdowntutorial.com/
21
Python
22
Why Python write faster
Interpreted language: no need to compile all code
before running; instead, programmers can code some
lines of code and run immediately, and can continue to
code and run from previous running state
Dynamic typing
23
Why Python write faster
24
Why Python write faster
25
Why Python run slower
Interpreted language: interpreted languages often run
slower than compiled languages, because in compiled
languages, before running, compiler will look at all
source code and do optimizations
Dynamic typing: to achieve this, a variable in Python is
just a pointer pointing to an object, and this object
contains not only value but also meta info such as data
type, …; when do an operation with 2 objects, Python
interpreter first have to spend time opening these
objects to identify data type
Automatic memory management: it also costs extra
work and time to know when to free allocated memories
26
Homework
Compare runtime of matrix multiplication program with:
C, Python, numpy.
Test with several different size of matrix.
Write report with Jupyter notebook with code and
explain.
27
Reference
https://www.datacamp.com/tutorial/tutorial
-jupyter-notebook
https://www.markdowntutorial.com/
28