Data Exploration With SQL & Python - Ipynb - Colab
Data Exploration With SQL & Python - Ipynb - Colab
ipynb - Colab
keyboard_arrow_down SQLAlchemy
## End point creation :
# Format & Syntax: mysql+pymysql://username:password@host:port/database
# host: learning-activity-rr.cejogcrmn6il.ap-south-1.rds.amazonaws.com
# port: 3306
# Database: assignment
# username: almafolk
# password: 8l39zk60
# Access: Read-only
# Final End Point : mysql+pymysql://almafolk:8l39zk60q@learning-activity-rr.cejogcrmn6il.ap-south-1.rds.a
import pandas as pd
from sqlalchemy import create_engine
from sqlalchemy.pool import NullPool
def mysql(query:'Write the query here .'):
'''
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 1/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab
This function fetches data from database and returns the result.
'''
try:
engine_db = create_engine('mysql+pymysql://almafolk:8l39zk60q@learning-activity-rr.cejogcrmn6il.a
conn = engine_db.connect() ## Connection Object
# Reading Data
df = pd.read_sql_query(query, conn)
mysql('''show tables''')
Tables_in_assignment
0 abcnews-date-text
1 abcnews_date_text
2 almax_job_scraping
3 campaign_identifier
4 customer_nodes
... ...
77 subscriptions
78 telecom_churn
79 test_store_csv
80 users
81 weekly_sales
82 rows × 1 columns
0 0 trial 0.0
4 4 churn NaN
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 3 columns):
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 2/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 plan_id 5 non-null int64
1 plan_name 5 non-null object
2 price 4 non-null float64
dtypes: float64(1), int64(1), object(1)
memory usage: 248.0+ bytes
0 1 0 2020-08-01
1 1 1 2020-08-08
2 2 0 2020-09-20
3 2 3 2020-09-27
4 3 0 2020-01-13
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2650 entries, 0 to 2649
Data columns (total 3 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customer_id 2650 non-null int64
1 plan_id 2650 non-null int64
2 start_date 2650 non-null object
dtypes: int64(2), object(1)
memory usage: 62.2+ KB
keyboard_arrow_down
[ ] ↳ 2 cells hidden
keyboard_arrow_down
2. What plan start_date values occur after the year 2020 for our dataset?
[ ] ↳ 9 cells hidden
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 3/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab
keyboard_arrow_down SQLite3
SQLite is commonly used in scenarios where lightweight, serverless, and self-contained database capabilities are
required. Here are some common use cases:
1. Embedded Systems: SQLite is often used in embedded systems and IoT devices where a compact, self-
contained database engine is needed to store and retrieve data locally.
2. Mobile Applications: Many mobile applications use SQLite as their local database engine due to its small
footprint, simplicity, and ease of integration with mobile platforms like Android and iOS.
3. Prototyping and Testing: SQLite is frequently used during the prototyping and testing phases of software
development due to its simplicity and convenience. Developers can quickly set up and interact with
databases without the need for a separate database server.
import sqlite3
# Connecting to DB
# Connection to DataBase
conn = sqlite3.connect('coffee_shop.db') # It will create the DB if not exists.
## Cursor Object
## It is an object used to establish a connection to run SQL queries.
## It serves as middleware between the SQL query and the SQLite database connection.
## After connecting to an SQLite database, it is generated.
c.execute('''
CREATE TABLE transactions (
transaction_id INTEGER PRIMARY KEY,
date DATE,
time TIME,
item TEXT,
price REAL,
quantity INTEGER,
total_amount REAL
);
''')
c.execute('''
INSERT INTO transactions (
transaction_id, date, time, item, price, quantity, total_amount)
VALUES
(1, '2022-01-01', '09:00', 'Coffee', 2.50, 1, 2.50),
(2, '2022-01-01', '10:30', 'Croissant', 1.75, 2, 3.50),
(3, '2022-01-01', '11:15', 'Cappuccino', 3.00, 1, 3.00),
(4, '2022-01-02', '08:45', 'Latte', 3.50, 2, 7.00),
(5, '2022-01-02', '10:00', 'Muffin', 2.25, 1, 2.25),
(6, '2022-01-02', '12:15', 'Espresso', 2.75, 1, 2.75),
(7, '2022-01-03', '09:30', 'Croissant', 1.75, 3, 5.25),
(8, '2022-01-03', '11:00', 'Iced Coffee', 3.50, 2, 7.00),
(9, '2022-01-03', '12:45', 'Hot Chocolate', 3.25, 1, 3.25),
(10, '2022-01-04', '10:15', 'Cappuccino', 3.00, 2, 6.00);
''')
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 4/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab
cursor=c.execute('''
SELECT * from transactions
''')
# display all data from hotel table
for row in cursor:
print(row)
df1= pd.read_sql_query('''
SELECT * from transactions
''',conn)
df1.head()
df1_g=df1.groupby(["item"])["price"].mean().reset_index()
df1_g.head()
df = pd.read_sql_query('''
SELECT item, avg(price) AS avg_price
FROM transactions
group by item;
''',conn)
df.head()
# DF to SQL table
df.to_sql("item_price", conn, if_exists="replace")
# item_price is the table name
## if_exists options: 'append','replace'
pd.read_sql_query('''
SELECT * from item_price
''',conn)
df
df.loc[df.item == 'Coffee',:]
## Iterative
# DF to SQL table
for i in ['Coffee','Croissant','Cappuccino']:
df.loc[df.item == i,:].to_sql("new_table", conn, if_exists="append")
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 5/6
11/30/24, 6:23 PM Data Exploration with SQL & Python.ipynb - Colab
pd.read_sql_query('''
SELECT * from new_table
''',conn)
PandaSQL
keyboard_arrow_down
[ ] ↳ 7 cells hidden
https://colab.research.google.com/drive/15JvXgBfQlTIHL4My4aaNG5Kiyy9x1dWd?usp=sharing#scrollTo=2oOkYg6evcFh 6/6