Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

SQL

The document provides an overview of SQL and its application in managing structured data within relational databases, emphasizing the importance of queries, keys, and debugging practices. It also covers data analysis techniques using Google Sheets, including data cleaning, statistical analysis, and functions for manipulating and extracting insights from datasets. Key concepts include sorting, filtering, and conditional functions, as well as the use of VLOOKUP for searching information in tables.

Uploaded by

bemamdangiu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

SQL

The document provides an overview of SQL and its application in managing structured data within relational databases, emphasizing the importance of queries, keys, and debugging practices. It also covers data analysis techniques using Google Sheets, including data cleaning, statistical analysis, and functions for manipulating and extracting insights from datasets. Key concepts include sorting, filtering, and conditional functions, as well as the use of VLOOKUP for searching information in tables.

Uploaded by

bemamdangiu
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

MÁY TÍNH TRONG KINH DOANH

SQL
1. Data lives in databases. Just about every company and organization
relies on some form of database to store and organize information.
TRUE

Bài học rút ra:

 SQL allows you to work with data stored in a database


 The SELECT query is used to get data from a table

1. Data that can be stored in tables is called structured data.


2. SQL makes working with structured data very fast and effective. A
single data query can access thousands and thousands of records in
the blink of an eye.
3. Real databases can contain a very high amount of information.

A query is a way to access only the data you are interested in.

4. Unstructured data is information that is difficult to store in tables.


Ex: sales table: structured; audio file: unstructured.
5. SQL stands for Structured Query Language.
With SQL you'll be able to extract data from massive datasets with
thousands of fields and records.
SQL is used to work with structured data in the form of tables.

1. The relational database is the most common type of database. The


image shows a relational database.
A relational database can contain several tables
2. The different tables in a relational database connect to each other
using fields (columns) with values in common. These fields are
called keys.
Ex: Star wars: Lucasfilm; Frozen: Walt Disney
3. A relational database stores data in tables.
What data category can a relational database handle? Structured
data.
4. Key fields are used to connect the tables in a relational database.
5. You might have used spreadsheets like Excel to work with data before.
Databases are better for storing larger and more complex collections of
data.
Working with data in 1 single table: database.
Working with data in multiple, larger tables: spreadsheet.
⭐ relational databases store information in interconnected
tables

⭐ keys connect tables in a relational database


⭐ you can query data FROM different tables in the database

REVIEW
1. A database is an organized collection of data
2. SQL is used to work with the data in a database

Debugging

1. If your SQL query contains a mistake, you’ll not be able to access the
data.
2. A schema is a visual representation of how a database is organized,
showing its tables, fields and keys. Arrows are used to show how the
different tables are related.
3. The * symbol allows you to select all the fields in a table. This way
you can avoid typos when listing field names.

⭐ Bugs in queries cause error messages

⭐ The schema of a database helps you avoid errors

⭐ SQL queries are organized in different lines so they are easier to read by
humans

REVIEW

1. The term for a request of information from a database is query


2. Error in code: bug
Information request: query
Visual representation of a database: schema
3. A relational database stores information in table
4. What connects different tables in a relational database? Common key
fields

Standards & Best Practices

1. Comments in code are Explanations for humans


2. It's a good practice to use comments in your code
3. You can use comments to temporarily disable a query or part of a
query when testing your code. This way the computer will skip the
instruction.
4. If you need to make comments with multiple lines, you can use /* … */
block comments.
5. Lowercase for field and table names
⭐ You can add comments to your code with the double hyphens (--)
⭐ You can add a block comment with /* … */
⭐ SQL is a case-insensitive language.

1. Sorting consists in putting data in a meaningful order


2. The ORDER BY command is used to sort the extracted data in the
results table.
3. Data is sorted by fields.
4. Your extracted data will be sorted in ascending order by default. If you
need the data sorted in descending order (from largest to smallest) you
need to add the DESC keyword.
5. By default, data is sorted in ascending order. You can use the explicit
ASC keyword to clarify and make your queries more readable,
particularly when writing complex queries.
⭐ You can sort extracted data with the ORDER BY command
⭐ DESC sorts data in descending order (from largest to smallest, or Z
to A)
⭐ ASC sorts data in ascending order (from smallest to largest, or A to
Z)

Limiting Data
1. The LIMIT keyword extracts a limited number of records.

GG SHEETS
Bảng tính cho phép thực hiện nhiều nhiệm vụ chính trong qtrinh phân tích dữ
liệu:

 Dọn dẹp và thao tác dữ liệu (cleaning + manipulating data)


 Phân tích thống kê (statistical analysis)
 Ptich và tạo hình ảnh trực quan (Visualizations)

Để báo cáo – tất cả đều cần rất ít hoặc k cần mã.

Phân tích dữ liệu: là quá trình trích xuất những hiểu biết có nghĩa từ dl mặc
dù mỗi dự án sẽ có mtieu riêng nhưng hầu hết đều tuân theo 1 khuôn khổ
(Data analysis: process of extracting meaningful insights from data).

Các hàm tích hợp: Các phép tính được viết sẵn có trong các công thức
(Built-in functions: Pre-written calculations that are available in formulas).

 Hàm làm tròn đến chữ số thập phân ROUND(value, [place])


o Required arguments: value (Đối số bắt buộc)
o Optional argument: place  0 by default (Đối số tùy chọn:

place mặc định 0)

Exploring data

 Characterize the data


 Identify data quality issues

Summary statistics

1. Measures of frequency (tần suất): How often dóe a value occur?


- Count:
 COUNT() : cells containing numerical data (đếm số ô trong 1 phạm vi)
o Dates
o Currencies
 COUNTA() : cells containing any data type
o Empty strings (“”) chuỗi trống
o Errors (#DIV/0!)
 COUNTBLANK() : cells
o Empty cells
o Empty strings (“”) chuỗi trống
2. Measures of center: What does a typical value look like?
 Aim to describe a “typical” value
 Mean “average”: Sum of values/Count of values
 Median: The middle number in a sorted list of values; used when
there are outliers (gtri ngoại lai, làm sai lệch k cân xứng phép tính tb)
3. Measures of spread (độ phân tán): How do values vary across the
datasets?

Identifying data quality issue

Missing data Errouneous data


COUNTBLANK() MAX()  maximum value in a range
COUNT() MIN()  Minimum value in a range
COUNTA()

Finding uniqe values

 Categorical data: can only be one of a finite number of value (Dữ liệu
phân loại: chỉ có thể là một trong số hữu hạn các giá trị)
UNIQUE(range)  find the number of unique values

Filtering

 Extract subsets of the dataset for more detailed exploration (lọc dl hữu
ích cho việc khám phá dữ liệu)
FILTER(range, condition1, [condition2,…])
range: phạm vi
condition1: dữ liệu của phạm vi
condition2: điều kiện đề cho

 Identify largest and smallest values


SORT(range, sort_column, is_ascending)
Range: có thể 1/nhiều cột
Sort_column: sắp xếp phạm vi
Is_ ascending (TRUE/FALSE): chỉ định xem muốn sắp xếp theo tt
tăng/giảm dần

Cleaning and preparing data

 80/20 rule: 80% cleaning, 20% analyzing


 A clean dataset
o Can be easily processed during analysis
o will return valid conclusions
o save more time dủing analysis

Dates and times

 Collected for measurements over time


 Continuous (liên tục) data: can take any value
 Discrete data: can take one of a finite number of categories (DL rời
rạc: có thể chứa 1 số lượng hữu hạn các phạm trù)

Trích xuất tp năm từ 1 ngày, YEAR(date)

MONTH(date)

Chuyển month  “mmm” TEXT(number, format)

WEEKDAY(date, [type])

 type: the numbering system to use


o 1 (default): Start Sunday=1
o 2: Start Monday=1
o 3: Start Monday=0

Functions to extract time components:

HOUR(time)

MINUTE(time)
SECOND(time)

Tính toán khoảng thời gian ngày

TODAY()

NOW()

 Spreadsheet is refreshed (gtri hiển thi trong ô sẽ đc cập nhật bất cứ khi
nào bảng tính đc làm mới)

DATEDIF(start_date, end_date, unit)

 End_date > start_date


 Unit: “Y”, “M”, “D”,…

Result are chopped

chèn TODAY sau start_date

Làm sạch dữ liệu văn bản

PROPER(): viết hoa- thường

LOWER(): chữ thường

UPPER(): chữ hoa

Removing whitespace:

Extra whitespace:

Leading space before text

Trailing space after text

Repeated >1 space between characters

TRIM(“ text “): xóa khoảng trắng

Combining text data:

CONCATENATE(string1, [string2,…]) dán 2 ô lại k có khoảng trắng muốn thì, “


“,

Combining text data – email addresses:


Thao tác dữ liệu văn bản

LEN(): trả về độ dài chuỗi, bằng vs chỉ số cuối cùng

SEARCH(search_for, text_to_search, [starting_at])

search_for: chuỗi cần tìm kiếm

text_to_search: vb cần tìm kiếm

starting_at (default=1): chỉ mục bắt đầu tìm kiếm, theo mặc định là vtri bắt
đầu của văn bản

LEFT(string, [number_of_characters])

RIGHT (string, [number_of_characters])

SUBSTITUTE(text_to_search, search_for, replace_with, [occurrence_number])

text_to_search: the text to search through

search_for: the string to search for

replace_with: the replacement string

occurrence_number: which occurrence should be substituted

Conditional functions and logic

Conditional functions: return diffrent results depending on criteria

IF(logical_expression, value_if_true, value_if_false)

AND(logical_expression1, [logical_expression2, …])

Returns TRUE if all logical expressions return TRUE

Conditional aggregations

COUNT()

SUM()

AVERAGE()

COUNTIF(range, criterrion)
 Criterrion (tiêu chí)
o String (chuỗi) to match ex “United Kingdom”
o Number to match ex 150
o String containing a number and comparison operator ex “>9”

Nhiều tiêu chí COUNTIFS( criterion1, [criteria_range2, criterion2,…])

SUMIF(range, criterion, [sum_range])

SUMIFS(sum_range , criteria_range1, criterion1, [criteria_range2, criterion2,


…])

AVERAGEIF(criteria_range, criterion, [average_range])

AVERAGE(average_rang, criteria_range1, criterion1, [criteria_range2,


criterion2,...])

VLOOKUP

VerticalLOOKUP: search for information in a table based on search keys

Lookup table & main table

VLOOKUP(search_key, range, index, [is_sorted])

 search_key: the value in main table to search for in the lookup table
 range: the range containing the lookup table (usually absolute
references)
 index (chỉ mục): the column index in the lookup table to return
 is_sorted: TRUE/FALSE to indicate is the lookup table is sorted

You might also like