Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
21 views

Data Exploration With SQL & Python - Ipynb - Colab

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views

Data Exploration With SQL & Python - Ipynb - Colab

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

11/30/24, 6:23 PM Data Exploration with SQL & Python.

ipynb - Colab

keyboard_arrow_down Data Exploration with SQL & Python


keyboard_arrow_down Installing Required Libraries
# Installing SQLAlchemy, PandaSQL
!pip install SQLAlchemy
!pip install PandaSQL
!pip install pymysql
# sqlite3 is already part of python.

Requirement already satisfied: SQLAlchemy in /usr/local/lib/python3.10/dist-packages (2.0.36)


Requirement already satisfied: typing-extensions>=4.6.0 in /usr/local/lib/python3.10/dist-packages (
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from SQLA
Collecting PandaSQL
Downloading pandasql-0.7.3.tar.gz (26 kB)
Preparing metadata (setup.py) ... done
Requirement already satisfied: numpy in /usr/local/lib/python3.10/dist-packages (from PandaSQL) (1.2
Requirement already satisfied: pandas in /usr/local/lib/python3.10/dist-packages (from PandaSQL) (2.
Requirement already satisfied: sqlalchemy in /usr/local/lib/python3.10/dist-packages (from PandaSQL)
Requirement already satisfied: python-dateutil>=2.8.2 in /usr/local/lib/python3.10/dist-packages (fr
Requirement already satisfied: pytz>=2020.1 in /usr/local/lib/python3.10/dist-packages (from pandas-
Requirement already satisfied: tzdata>=2022.7 in /usr/local/lib/python3.10/dist-packages (from panda
Requirement already satisfied: typing-extensions>=4.6.0 in /usr/local/lib/python3.10/dist-packages (
Requirement already satisfied: greenlet!=0.4.17 in /usr/local/lib/python3.10/dist-packages (from sql
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.10/dist-packages (from python-date
Building wheels for collected packages: PandaSQL
Building wheel for PandaSQL (setup.py) ... done
Created wheel for PandaSQL: filename=pandasql-0.7.3-py3-none-any.whl size=26772 sha256=2c220120cc0
Stored in directory: /root/.cache/pip/wheels/e9/bc/3a/8434bdcccf5779e72894a9b24fecbdcaf97940607eaf
Successfully built PandaSQL
Installing collected packages: PandaSQL
Successfully installed PandaSQL-0.7.3
Collecting pymysql
Downloading PyMySQL-1.1.1-py3-none-any.whl.metadata (4.4 kB)
Downloading PyMySQL-1.1.1-py3-none-any.whl (44 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 45.0/45.0 kB 2.3 MB/s eta 0:00:00
Installing collected packages: pymysql
Successfully installed pymysql-1.1.1

keyboard_arrow_down SQLAlchemy
## End point creation :
# Format & Syntax: mysql+pymysql://username:password@host:port/database

# host: learning-activity-rr.cejogcrmn6il.ap-south-1.rds.amazonaws.com
# port: 3306
# Database: assignment
# username: almafolk
# password: 8l39zk60
# Access: Read-only
# Final End Point : mysql+pymysql://almafolk:8l39zk60q@learning-activity-rr.cejogcrmn6il.ap-south-1.rds.a

import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool
def mysql(query:'Write the query here .'):
'''
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 1/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab
This function fetches data from database and returns the result.
'''
try:
engine_db = create_engine('mysql+pymysql://almafolk:8l39zk60q@learning-activity-rr.cejogcrmn6il.a
conn = engine_db.connect() ## Connection Object
# Reading Data
df = pd.read_sql_query(query, conn)

#if your connection object is named conn


if not conn.closed:
conn.close()
engine_db.dispose()
return df
except Exception as e:
print(e)

mysql('''show tables''')

Tables_in_assignment

0 abcnews-date-text

1 abcnews_date_text

2 almax_job_scraping

3 campaign_identifier

4 customer_nodes

... ...

77 subscriptions

78 telecom_churn

79 test_store_csv

80 users

81 weekly_sales

82 rows × 1 columns

keyboard_arrow_down Table 1: plans


mysql('''select * from plans''')

plan_id plan_name price

0 0 trial 0.0

1 1 basic monthly 9.9

2 2 pro monthly 19.9

3 3 pro annual 199.0

4 4 churn NaN

mysql('''select * from plans''').info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):

https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 2/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 plan_id 5 non-null int64
1 plan_name 5 non-null object
2 price 4 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes

keyboard_arrow_down Table 2: subscriptions


mysql('''select * from subscriptions''')

customer_id plan_id start_date

0 1 0 2020-08-01

1 1 1 2020-08-08

2 2 0 2020-09-20

3 2 3 2020-09-27

4 3 0 2020-01-13

... ... ... ...

2645 999 2 2020-10-30

2646 999 4 2020-12-01

2647 1000 0 2020-03-19

2648 1000 2 2020-03-26

2649 1000 4 2020-06-04

2650 rows × 3 columns

mysql('''select * from subscriptions''').info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2650 entries, 0 to 2649
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customer_id 2650 non-null int64
1 plan_id 2650 non-null int64
2 start_date 2650 non-null object
dtypes: int64(2), object(1)
memory usage: 62.2+ KB
keyboard_arrow_down

1. How many customers has the restaurant ever had?

[ ] ↳ 2 cells hidden
keyboard_arrow_down

2. What plan start_date values occur after the year 2020 for our dataset?

Show the breakdown by count of events for each plan_name.

[ ] ↳ 9 cells hidden

https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 3/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab

keyboard_arrow_down SQLite3
SQLite is commonly used in scenarios where lightweight, serverless, and self-contained database capabilities are
required. Here are some common use cases:

1. Embedded Systems: SQLite is often used in embedded systems and IoT devices where a compact, self-
contained database engine is needed to store and retrieve data locally.
2. Mobile Applications: Many mobile applications use SQLite as their local database engine due to its small
footprint, simplicity, and ease of integration with mobile platforms like Android and iOS.
3. Prototyping and Testing: SQLite is frequently used during the prototyping and testing phases of software
development due to its simplicity and convenience. Developers can quickly set up and interact with
databases without the need for a separate database server.

import sqlite3

# Connecting to DB
# Connection to DataBase
conn = sqlite3.connect('coffee_shop.db') # It will create the DB if not exists.

c = conn.cursor() #DML or DDL # Cursor Object


c.execute('''DROP TABLE IF EXISTS transactions;''')

## Cursor Object
## It is an object used to establish a connection to run SQL queries.
## It serves as middleware between the SQL query and the SQLite database connection.
## After connecting to an SQLite database, it is generated.

c.execute('''
CREATE TABLE transactions (
transaction_id INTEGER PRIMARY KEY,
date DATE,
time TIME,
item TEXT,
price REAL,
quantity INTEGER,
total_amount REAL
);
''')

c.execute('''
INSERT INTO transactions (
transaction_id, date, time, item, price, quantity, total_amount)
VALUES
(1, '2022-01-01', '09:00', 'Coffee', 2.50, 1, 2.50),
(2, '2022-01-01', '10:30', 'Croissant', 1.75, 2, 3.50),
(3, '2022-01-01', '11:15', 'Cappuccino', 3.00, 1, 3.00),
(4, '2022-01-02', '08:45', 'Latte', 3.50, 2, 7.00),
(5, '2022-01-02', '10:00', 'Muffin', 2.25, 1, 2.25),
(6, '2022-01-02', '12:15', 'Espresso', 2.75, 1, 2.75),
(7, '2022-01-03', '09:30', 'Croissant', 1.75, 3, 5.25),
(8, '2022-01-03', '11:00', 'Iced Coffee', 3.50, 2, 7.00),
(9, '2022-01-03', '12:45', 'Hot Chocolate', 3.25, 1, 3.25),
(10, '2022-01-04', '10:15', 'Cappuccino', 3.00, 2, 6.00);
''')

https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 4/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab

cursor=c.execute('''
SELECT * from transactions
''')
# display all data from hotel table
for row in cursor:
print(row)

df1= pd.read_sql_query('''
SELECT * from transactions
''',conn)

df1.head()

df1_g=df1.groupby(["item"])["price"].mean().reset_index()

df1_g.head()

df = pd.read_sql_query('''
SELECT item, avg(price) AS avg_price
FROM transactions
group by item;
''',conn)

df.head()

# DF to SQL table
df.to_sql("item_price", conn, if_exists="replace")
# item_price is the table name
## if_exists options: 'append','replace'

pd.read_sql_query('''
SELECT * from item_price
''',conn)

### Iterative Execution of Query


df = pd.DataFrame() # Empty Data frame

for item in ['Coffee','Croissant','Cappuccino']:


df_item = pd.read_sql_query(f'''
SELECT *
FROM transactions where item = '{item}';
''',conn)
df = pd.concat([df,df_item],axis = 0).reset_index(drop=True)
print(f"Execution of item {item} is done!")

df

df.loc[df.item == 'Coffee',:]

## Iterative
# DF to SQL table
for i in ['Coffee','Croissant','Cappuccino']:
df.loc[df.item == i,:].to_sql("new_table", conn, if_exists="append")

https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 5/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab

pd.read_sql_query('''
SELECT * from new_table
''',conn)

# Close the connection


conn.close()

PandaSQL
keyboard_arrow_down

[ ] ↳ 7 cells hidden

https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 6/6

You might also like