Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
52 views

M56. Dasar Data Analytics Menggunakan Python

The document discusses using Python for basic data analytics. It covers introducing Python, exploring data, statistical methods for evaluating data, and various Python libraries for data analytics tasks. It also provides tutorials for using Python to import data, plot random and histogram data, perform summary statistics, scatter plots, and grouping data by date. The tutorials demonstrate basic data manipulation and analysis using Python libraries like Pandas, NumPy, and Matplotlib.

Uploaded by

Fahmi Ramdhani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views

M56. Dasar Data Analytics Menggunakan Python

The document discusses using Python for basic data analytics. It covers introducing Python, exploring data, statistical methods for evaluating data, and various Python libraries for data analytics tasks. It also provides tutorials for using Python to import data, plot random and histogram data, perform summary statistics, scatter plots, and grouping data by date. The tutorials demonstrate basic data manipulation and analysis using Python libraries like Pandas, NumPy, and Matplotlib.

Uploaded by

Fahmi Ramdhani
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

Minggu 4

Dasar Data Analytics


menggunakan
Python
Dasar Data Analytics
menggunakan Python
 Bahasan  Setelah mengikuti kuliah dan
praktikum ini, mahasiswa
 Pengenalan Python diharapkan dapat:
 Eksplorasi data  Melakukan operasi dasar Data
 Metode statistika Analytics menggunakan Python
untuk evaluasi data  Menerapkan metode statistik
dalam Data Analytics
menggunakan Python
 Pustaka
 EMC Education Services, (2015)
 Ross (2014)
 Wickham and Grolemund
(2017)
Anaconda

 Anaconda
 Aplikasi untuk
pemrograman
Python dan R
 Memfasilitasi
Data Analytic dan
Machine Learning
 Open Source,
Free Distribution
Python Libraries for Data Analytics

No Library Functions
1 Pandas (Python • Indexing, manipulating renaming, sorting,
Data Analysis) merging data frame
• Update, add, delete columns from a data
frame
• Impute missing files, handle missing data or
NANs
• Plot data with histogram or box plot
2 NumPy • Basic array operations: add, multiply, slice,
flatten, reshape, index arrays
• Advanced array operations: stack arrays,
split into sections, broadcast arrays
• Work with DateTime or Linear Algebra
• Basic Slicing and Advanced Indexing in
NumPy Python
Python Library
untuk Data Analytics
No Library Fungsi
3 SciPy • to perform common scientific
programming tasks as linear algebra,
integration, calculus, ordinary differential
equations, and signal processing.
4 Matplotlib • Line plots
• Scatter plots
• Area plots
• Bar charts and Histograms
• Pie charts
• Stem plots
• Contour plots
• Quiver plots
• Spectrograms
Python Library
untuk Data Analytics
No Library Fungsi
5 Seaborn • Determine relationships between multiple
variables (correlation)
• Observe categorical variables for
aggregate statistics
• Analyze uni-variate or bi-variate
distributions and compare them between
different data subsets
• Plot linear regression models for dependent
variables
• Provide high-level abstractions, multi-plot
grids
Python Library
untuk Data Analytics
No Library Fungsi
6 Scikit Learn • Classification: Spam detection, image
recognition
• Clustering: Drug response, Stock price
• Regression: Customer segmentation,
Grouping experiment outcomes
• Dimensionality reduction: Visualization,
Increased efficiency
• Model selection: Improved accuracy via
parameter tuning
• Pre-processing: Preparing input data as a
text for processing with machine learning
algorithms.
Python Library
untuk Data Analytics
No Library Fungsi
7 Statsmodels • Linear Regression
• Correlation
• Ordinary Least Squares (OLS)
• Survival analysis
• Generalized linear models and Bayesian
model
• Uni-variate & bi-variate analysis, Hypothesis
Testing
Python Library
untuk Data Analytics
No Library Fungsi
8 Plotly • Basic Charts: Line, Pie, Scatter, Bubble, Dot,
Gantt, Sunburst, Treemap, Sankey, Filled
Area Charts
• Statistical and Seaborn Styles: Error, Box,
Histograms, Facet and Trellis Plots, Tree
plots, Violin Plots, Trend Lines
• Scientific charts: Contour, Ternary, Log,
Quiver, Carpet, Radar, Heat maps
Windrose and Polar Plots
• Financial Charts
• Maps
• Subplots
• Transforms
• Jupyter Widgets Interaction
Python Library
untuk Data Analytics
No Library Fungsi
9 TensorFlow • Voice/Sound Recognition — IoT,
Automotive, Security, UX/UI, Telecom
• Sentiment Analysis —CRM or CX
• Text-Based Apps — Threat Detection,
Google Translate, Gmail smart reply
• Face Recognition — Facebook’s Deep
Face, Photo tagging, Smart Unlock
• Time Series — Recommendation from
Amazon, Google, and Netflix
• Video Detection — Motion Detection, Real-
Time Threat Detection in Gaming, Security,
Airports
Python Library
untuk Data Analytics
No Library Fungsi
10 Keras • Determine percentage accuracy
• Compute loss function
• Create custom function layers
• Built-in data and image processing
• Write functions with repeating code blocks:
20, 50, 100 layers deep
Tutorial 1: Console

 Dari Anaconda, aktifkan Spyder


 Aktifkan hanya Editor dan IPython Console
 Pada Console, ketikkan perintah berikut:
import numpy as np
dir(np)
dir(np.random)
 Lakukan hal yang sama untuk perintah berikut:
import matplotlib as mpl
dir(mpl)
import matplotlib.pyplot as plt
dir(plt)
Tutorial 2: Plot Data Random

 Pada Editor, ketikkan perintah berikut, simpan program sbg


prg1.py
import numpy as np #import library
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams["figure.dpi"] = 100 #inisialisasi plot
plt.clf() #plot data random
plt.plot(np.random.rand(25), “o”)
plt.title("Judul")
plt.xlabel("Label x")
plt.ylabel("Label y")
Tutorial 3: Histogram

 Pada Editor, ketikkan perintah berikut, simpan sbg prg2.py


import numpy as np #import library
import matplotlib.pyplot as plt
import matplotlib as mpl
mpl.rcParams["figure.dpi"] = 100 #inisialisasi plot
plt.clf() #plot data random
s1 = np.random.normal(mu, sigma, 1000)
plt.hist(s1, 30, color='y’)
plt.figure()
s2 = np.random.poisson(4, 20000)
plt.hist(s2, 15, color='b')
Tutorial 4
 Gunakan kembali file Total Plant Data dari e-learning di folder kerja
anda yang tinggal worksheet utama saja, contoh:
D:\BDT\PRKT1\Total Plant Data EE1.xls
 Rubah nama file agar singkat dan tidak mengandung spasi, contoh
D:\BDT\PRKT1\Total_Data_EE1.xls
 Pada Editor, ketikkan perintah berikut, simpan sbg prg3.py
import pandas as pd
df1 = pd.read_excel (D:\BDT\PRKT1\Total_Data_EE1.xls’)
df1a = pd.DataFrame(df1, columns= ['Date','Section','Scientific name',2])
print (df1a)
Tutorial 5
 Pada Editor, ketikkan perintah berikut, simpan sbg prg4.py
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_excel (r’D:\BDT\PRKT1\Total_Data_EE1.xls’)
df1a = pd.DataFrame(df1, columns= ['Date','Section’,
'Scientific name',1,2,3,4,5,'Total’])
df1b = pd.DataFrame(df1, columns= ['Section','Total’])
df1c = df1b.apply (pd.to_numeric, errors='coerce’)
df1d = df1c.dropna()
df1e = df1d.reset_index(drop=True)
plt.scatter(df1e['Section'], df1e['Total'], color='red’)
plt.title('Section Vs Total Bunga', fontsize=14)
plt.xlabel('Section', fontsize=14)
plt.ylabel('Total Bunga', fontsize=14)
plt.grid(True)
plt.show()
Tutorial 6
 Pada Editor, ketikkan perintah berikut, simpan sbg prg5.py
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_excel (r'D:\BDT\PRKT1\Total_Data_EE1.xls’)
df1a = pd.DataFrame(df1,
columns= ['Date','Section','Scientific name’,1,2,3,4,5,’Total’])
mean1 = df1a[‘Total'].mean()
sum1 = df1a['Total '].sum()
max1 = df1a['Total '].max()
min1 = df1a['Total '].min()
count1 = df1a['Total '].count()
median1 = df1a['Total '].median()
std1 = df1a['Total '].std()
var1 = df1a['Total '].var()
Tutorial 6
 Pada Editor, ketikkan perintah berikut, simpan sbg prg5.py
print ('Mean Total: ' + str(mean1))
print ('Sum of Total: ' + str(sum1))
print ('Max Total: ' + str(max1))
print ('Min Total: ' + str(min1))
print ('Count of Total: ' + str(count1))
print ('Median Total: ' + str(median1))
print ('Std of Total: ' + str(std1))
print ('Var of Total: ' + str(var1))
Tutorial 7
 Pada Editor, ketikkan perintah berikut, simpan sbg prg6.py
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.read_excel (D:\BDT\PRKT1\Total_Data_EE1.xls’)
df1a = pd.DataFrame(df1,columns= ['Date','Section','Scientific
name','Total’])
df1b = pd.DataFrame(df1,columns= ['Date’])
groupby_sum1 = df1a.groupby(['Date']).sum()
print ('Sum, grouped by the Date : ' + str(groupby_sum1))
plt.plot(groupby_sum1, 'ro-’)
plt.title('Date Vs Total Bunga', fontsize=14)
plt.xlabel('Date', fontsize=14)
plt.ylabel('Total Bunga', fontsize=14)
plt.grid(True)
plt.show()

You might also like