B210 Sizing
B210 Sizing
B210 Sizing
After completing this module, you will be able to: Determine column sizing requirements based on chosen data type. Determine physical table row size. Determine table sizing requirements via estimates and empirical evidence. Determine index sizing requirements via estimates and empirical evidence.
Row Format
Only for a V2R5 Table with a PPI. Only if Variable Length Columns are declared.
R o w L e n g t h
Row ID
S p a Uniq. r Value e
Row Hash
P VAR Fixed UncomVAR r CHAR Length pressed CHAR e Offsets Columns Columns Columns s e n c e Part. # B i t s
R e f. A r r a y P t r.
2 2 2
0-n Bytes 2
Data for Compressible Columns that are neither Compressed nor NULL.
1 - n Bytes
Presence Bits
Single Value Compression (prior to V2R5): Based on the column attributes, Teradata may need 0, 1, or 2 presence bits represent data storage per column. Multi-Value Value Compression (V2R5 Feature):
Based on the column attributes, Teradata may need 0 to 9 presence bits represent data storage per column.
NULL & COMPRESS One or More Whole Bytes # of Presence Bits Description 0 1 1 1 2 No NULLs; no values are compressed No NULLs; compresses the value specified Allows NULLs; nothing is compressed Allows NULLs; only NULLs are compressed Allows NULLs; compresses 'value' specified
Column Attribute NOT NULL NOT NULL COMPRESS 'value' Nullable Nullable COMPRESS Nullable COMPRESS 'value'
0 1 2
COMPRESS all fixed length columns where at least 10% of the rows
participate.
Reduces storage cost by storing more logical data per unit of physical capacity. Performance is improved because there is less physical data to retrieve during scan-oriented queries.
Best candidate for compression - fixed width column with a small number of frequently occurring values in large table used frequently for FTSs. Compression in V2R4 provided up to 25% capacity savings and scan performance increases up to 35%. V2R5 Compression is expected to yield even greater savings.
Compress the top 15 most populated countries. Compress the Sex field.
CREATE TABLE People ( Name VARCHAR(100), Address VARCHAR(100), Country CHAR(100) COMPRESS ( 'Australia', 'Bangladesh', 'Brazil', 'China', 'England', 'France', 'Germany', 'India', 'Indonesia', 'Japan', 'Mexico', 'Nigeria', 'Pakistan', 'Russian Federation', 'United States of America' ), Sex CHAR(1) COMPRESS ('F', 'M') );
Note: Compression does not carry forward into spool files. VARCHAR is carried into spool, but not uncompressed.
V2R5 Compression
Compression is not supported on the following data types: INTERVAL VARCHAR TIME VARBYTE TIMESTAMP VARGRAPHIC
Compression is supported on the following data types: DATE (4) SMALLINT (2) DOUBLE (8) CHAR (N) (N<256) INTEGER (4) DECIMAL (1, 2, 4, or 8) BYTEINT (1) FLOAT/REAL (8) BYTE (N) (N<256)
Default Values Binary Indicators (e.g., T/F) Education Level State, County, City Status
FLOAT FLOAT FLOAT DECIMAL(n,m) DECIMAL(n,m) CHAR(n) VARCHAR(n), CHAR VARYING(n) LONG VARCHAR
BYTE(n), VARBYTE(n) GRAPHIC(n) VARGRAPHIC(n), LONG VARGRAPHIC
Data Types
BYTEINT
S I G N
-128 to +127
Non-ANSI
SMALLINT
S I G N
-32,768 to +32,767
INTEGER
S I G N
-2,147,483,648 to +2,147,483,647
hh:mm:ss.ssssss
Date + Time
DATE, TIME, and TIMESTAMP are also SQL functions. CURRENT_DATE, CURRENT_TIME, and CURRENT_TIMESTAMP represent
values.
1 to 2
3 to 4 5 to 9 10 to 18
1 byte
2 bytes 4 bytes 8 bytes
Range is 2 * 10 -307 to 2 * 10 +308 15 significant decimal digit precision. Manipulated in IEEE floating point format. Corresponds to, but is not identical to, IBM normalized 64 bit floating point.
S I G N Exponent 11 Bits Fraction/Mantissa 52 Bits
8 Bytes
2
2 byte column offset identifies location in row
2 bytes per character. Used for double-byte Kanji and Hiragana, and
Chinese double-byte Hanzi values.
GRAPHIC ( n ) n = 1 - 32000 Fixed length multi-byte character string; n is the length in logical characters. VARGRAPHIC ( n ) n = 1 - 32000 Variable length multi-byte character string; n is the length in logical characters.
2
2 byte column offset identifies location in row
c1 data 5 0
c3 data 7 5 1 0 0
The definition of variable length columns requires one additional 2byte offset that locates the end of the final variable column.
Sizing Considerations
Compress only columns where at least 10% to 20% of the rows participate. COMPRESS will create smaller rows, and smaller rows are generally more efficient. Compress columns whose NULL values are not subject to changes. Compression saves space but costs computational overhead. Adding a column that is not compressible expands all rows. Adding a column that is compressible and there are no spare presence bits expands all rows. Dropping a column changes all row sizes where data is present.
TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable
SUM(a) = SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.
12 1 2 4 8 8 SUM(n) SUM(a) LOGICAL SIZE Overhead Partitioned Primary Index Overhead (2) Variable Column Offsets (__ * 2 ) + 2 ; zero if no variable columns _____ Bits for Compressible Columns _____ Nullable Columns _____ / 8 (Quotient only) PHYSICAL ROW SIZE
* * * * * *
= = = = = = = = = = =
=
14
= =
Using this logical row layout, the next page will size a typical row of the Employee table.
TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable
SUM(a) =
14
SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.
12 1 2 4 1 8 8 1 SUM(n) 1 SUM(a) LOGICAL SIZE Overhead Partitioned Primary Index Overhead (2) Variable Column Offsets (_1_ * 2 ) + 2 ; zero if no variable columns ___0__ Bits for Compressible Columns ___3__ Nullable Columns ___3__ / 8 (Quotient only) PHYSICAL ROW SIZE
* * * * * *
= = = = = = = = = = =
=
8 20 14 64 14 0 4
= =
0 82
TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable
SUM(a) = SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.
12 1 2 4 8 8 SUM(n) SUM(a) LOGICAL SIZE Overhead Partitioned Primary Index Overhead (2) Variable Column Offsets (__ * 2 ) + 2 ; zero if no variable columns _____ Bits for Compressible Columns _____ Nullable Columns _____ / 8 (Quotient only) PHYSICAL ROW SIZE
* * * * * *
= = = = = = = = = = =
=
14
= =
PHONE
EXTENSION
DOMAIN NAME Area_Code Call_Number Call_Priority_Code Call_Status_Code Call_Type_Code Contact_Number Customer_Number Date Employee_Number Extension Part_Category Phone System_Number Time
DATA TYPE SMALL INT INT SMALL INT SMALL INT CF INT INT D INT INT INT INT INT TIME
MAX BYTES 2 4 2 2 2 4 4 4 4 4 4 4 4 6
This table will be partitioned via RANGE_N on Call_Date with Monthly intervals.
3 10 1 1
6 40 4 6
TIMESTAMP/ ZONE DECIMAL 1-2 3-4 5-9 10-18 FLOAT Fixed Variable
* * * * * *
= = = = = = = = =
= =
SUM(a) = SUM(a) = SUM of the AVERAGE number of bytes expected for the variable column. SUM(n) = SUM of the CHAR and GRAPHIC column bytes. ** For V2, round up to an even number of bytes.
2 0 58 14 2
Variable Column Offsets (___ * 2 ) + 2 ; zero if no variable columns = ______ Bits for Compressible Columns ______ 9 Nullable Columns ______ = 9 / 8 (Quotient only) PHYSICAL ROW SIZE =
+1 75 76
Variable length blocks within a table are also supported. These features also make accurate space estimates for tables and their
indexes more difficult.
Physical row size and table row count determine space requirements.
Table Headers
One row per AMP per Table. Table headers are a separate subtable. Minimum table header block size is 512
bytes (1 sector) per AMP.
V2R4 Example of Table Header
STANDARD ROW HEADER
LENGTH, ROW ID, PRESENCE/SPARE BYTES
FIELD 2 OFFSET FIELD 3 OFFSET FIELD 4 OFFSET FIELD 5 OFFSET FIELD 6 OFFSET FIELD 7 OFFSET FIELD 8 OFFSET FIELD 9 OFFSET EXTRA OFFSET
F I E L D 1 F 2 3 F 4
Tables with 4 or more columns Tables with 2 columns and a NUSI Compressed values are maintained
within the table header.
INDEX DESCRIPTORS
(36 BYTES * # INDEXES) PLUS 20 BYTES PER INDEX COLUMN ALWAYS NULL
F I E L D 5
Maximum value of approximately 64K. The base table header covers all of its
secondary index subtables.
F 6 7 8 9
Typically, the maximum block size is 63.5 KB bytes, and a typical block size is 48 KB.
Formula:
(BlockSize - 38) / RowSize = RowsPerBlock RowCount / RowsPerBlock = Blocks NumAmps * 1024 = Header (Blocks * BlockSize) + Header = NO FALLBACK (BlockSize = Typical Block Size) (rounded down) (rounded up)
= Typical block size in bytes = Number of AMPs in the system = Number of table rows expected = Physical row size
Note: For large tables, table headers and block overhead (38 bytes) add a minimal amount of size to the table. Therefore, multiply row size by number of rows and double for Fallback.
Formula:
(BlockSize - 38) / RowSize = RowsPerBlock RowCount / RowsPerBlock = Blocks NumAmps * 1024 = Header (Blocks * BlockSize) + Header = No Fallback (Blocks * BlockSize) * 2 + Header = Fallback (round down) (round up)
Calculation:
(49,152 - 38) / 98 = 501 rows per block 501,000,000 / 501 = 1,000,000 blocks 20 * 1024 = 20,480 for table headers (1,000,000 * 49,152) + 20,480 = 49,152,020,480 (1,000,000 * 49,152) * 2 + 20,480 = 98,304,020,480
2
Bytes
4
Spare & Presence
7 Variable
Row Offsets & Misc. (>=7)
2 (opt.)
There is one Index row for each Base Table row. USI subtable row size = (Index value size + 29 or 31)
Where 31 = + + + + 4 8 9 2 8 (Row Header and Row Ref. Array pointer) (This row's Row ID) (Spare, presence, and offset bytes) (Optional Partition # for PPI tables) (Base table Row ID)
To estimate the amount of space needed for a USI subtable, you can use the following formulas.
For tables with NPPI, USI Subtable Size = (Row count) * (index value size + 29) For tables with PPI, USI Subtable Size = (Row count) * (index value size + 31)
2
Bytes
4
Spare & Presence
Variable
Row Offsets & Misc. (>=7)
8/10
8/10
There is at least one index row per AMP for each distinct index value that is in the base table on that AMP.
Valid Flag
Count
2
Bytes
4
Overhead
0+ Variable
Row Offsets
There is one reference index row for each distinct foreign key value.
To estimate the size of a Reference Index (RI) subtable, you can use the following formula.
RI Subtable Size = (Distinct count) * (index size + 25)
1,000,000 * (4 + 31 + 1) * 2 = 72,000,000
NUSI Size = (Row Count * 10)
+
(#distinct values) * (Index value size + 21) * MIN ( #AMPs , Rows per value ) (10,000,000 + (20,000 * (29 + 21) * 20)) * 2 = 60,000,000
Note: +1 - Rows are allocated on even offsets within the data block.
Empirical Sizing
The best way to size a production table, including indexes is: 1. Load a known percentage of rows onto the system. 2. Query the DD through the view DBC.TableSize. 3. Create one index. 4. Query the DD through the view DBC.TableSize. 5. Repeat steps 3 and 4 as necessary. 6. Multiply the results to determine the production size. Example: Step 1 Load 1% of a table onto a system. Step 2 SELECT SUM(CurrentPerm) FROM DBC.Tablesize WHERE DatabaseName = DATABASE AND TableName = 'Daily_Sales' ; Sum(CurrentPerm) 671,744 Step 3 CREATE INDEX (sales_date) ON Daily_Sales; Step 4 SELECT SUM(CurrentPerm) FROM DBC.Tablesize WHERE DatabaseName = DATABASE AND TableName = 'Daily_Sales' ; Therefore, index size is: 914,944 671,744 243,200
Note: The same query without the SUM keyword returns per/AMP figures which reveal distribution efficiency.
Sum(CurrentPerm) 914,944
Spool Space
Maximum spool space needs vary with table size, use (type of application),
and frequency of use.
Cylinders not currently used for data may be used for spool.
The user's spool amount may be changed dynamically. Avoid unnecessary copying or redistribution of entire tables to spool.
If a user exceeds their Spool space limit, they will receive the following error
message.
2646 No more spool space in username
If the AMP runs out of Spool space (insufficient available cylinders for Spool),
the following message will be displayed.
Release of Spool
Intermediate Spool
Intermediate Spool results are held until the (LastUse) Explain step.
Output Spool Output Spool results are held until: Last spool Response - BTEQ CLOSE cursor - PreProcessor ERQ, Terminate function - CLI Session ends (Job Abort, timeout, logoff, etc.) System is restarted
System Restart - each AMP rebuilds its Master Index from its Cylinder Indexes.
The AMPs delete all spool files by moving them to the Free Cylinder List. This costs only one I/O per spool cylinder, and saves maintaining the Master Index on disk.
If not using Fallback, multiply the amount of raw data by a factor of 2 or 3. If using Fallback, multiply the amount of raw data by a factor of 4 or 5.
Example: User raw data Estimate of Vdisk space needed Proof: Estimate of Vdisk space - Spool (20%) - PJs/development/staging (5%) - DBC & Transient Journal (5%) 400 GB 1600 - 2000 GB
1600 - 2000 GB - 400 GB - 100 GB - 100 GB 1000 - 1400 GB 400 GB 80 - 200 GB 960 - 1200 GB
A 4 node system, each with 7 AMPs, and each AMP with 72 GB of Vdisk space would meet this requirement. 4 x 7 x 72 GB = 2016 GB of available space
User raw data 20 - 50% for indexes Fallback (for data and indexes)
Sizing Summary
Accurate row counts and sizes are needed to get good space estimates.
Join Indexes + Fallback + Hash Indexes + Fallback + Permanent Journal (dual or single) + Stored Procedure space + Spool space + Temporary space
Review Questions
1. Which of the following can be used with the COMPRESS option? a. Referencing columns - Foreign Key b. Referenced column - Primary Key as a USI c. Unique Primary Index d. Non-unique Secondary Index
2. Which section of a row identifies the starting location of variable length data column data and is present only if variable length columns are declared? a. Uncompressed Columns b. VARCHAR Columns c. Presence Bits d. Column Offsets 3. How can you override the default that a column with a NULL value will require row space? a. Use the COMPRESS option on the column as part of the CREATE TABLE statement. b. When creating the user, set the default so that columns will default to COMPRESS when creating a table. c. Use the NOT NULL option on the column as part of the CREATE TABLE statement d. Use the DEFAULT NULL option on the column as part of the CREATE TABLE statement. 4. What is the minimum space the table headers will take for a 6-column table on a 10 AMP system? a. 10240 bytes b. 4096 bytes c. 5120 bytes d. 1024 bytes 5. What DD view can you query to get sizing information about tables? _____________________
2. Which section of a row identifies the starting location of variable length data column data and is present only if variable length columns are declared? a. Uncompressed Columns b. VARCHAR Columns c. Presence Bits d. Column Offsets 3. How can you override the default that a column with a NULL value will require row space? a. Use the COMPRESS option on the column as part of the CREATE TABLE statement. b. When creating the user, set the default so that columns will default to COMPRESS when creating a table. c. Use the NOT NULL option on the column as part of the CREATE TABLE statement d. Use the DEFAULT NULL option on the column as part of the CREATE TABLE statement. 4. What is the minimum space the table headers will take for a 6-column table on a 10 AMP system? a. 10240 bytes b. 4096 bytes c. 5120 bytes d. 1024 bytes 5. What DD view can you query to get sizing information about tables? DBC.Tablesize
Lab Exercises
Lab Exercise 10-1
Purpose In this lab, you will compress multiple values for a column in order to reduce Perm space. What you need Populated AU.Accounts table and an empty table in your database Tasks 1. Populate your Accounts table from the AU.Accounts table using the INSERT/SELECT statement:
Lab Exercises
Lab Exercise 10-2
Purpose In this lab, you will use populate tables, determine tables sizes, and create secondary indexes. What you need Populated AU.Trans table and an empty table in your database Tasks 1. Determine the size of your empty Trans table using the DBC.TableSize view (SELECT with and without the SUM aggregate function). Size of empty Trans = _______________ What size are the table headers on each AMP? _______________ 2. Using SHOW TABLE, the Row Size Calculation form and the Sizing a Data Table Formula, estimate the size of this table; assume 15,000 rows. Estimated size of Trans = _______________ 3. Populate your Trans table from the AU.Trans table using the following INSERT/SELECT statement: INSERT INTO Trans SELECT * FROM AU.TRANS; Use the SELECT COUNT(*) function to verify the number of rows. ___________
Lab Exercises
Lab Exercise 10-2 (cont.)
Tasks 4. Using the DBC.TableSize view, determine the actual size of the Trans table by using the SUM aggregate function. Size of populated Trans = _______________
5. Create a USI on the Trans_Number column. Estimate the size of the USI = _______________ Actual size of the USI = _______________ (use the empirical sizing technique)
6. Create a NUSI on the Trans_ID column. Estimate the size of the NUSI = ______________ (Hint: use DISTINCT function) Actual size of the NUS I= ______________ (use the empirical sizing technique)
Lab Exercises
Lab Exercise 10-3
Purpose In this lab, you will determine tables sizes and establish referential integrity between two tables. What you need Populated PD tables and empty tables in your database Tasks 1. Populate your Employee and Emp_Phone tables from the PD.Employee and PD.Emp_Phone tables using the following INSERT/SELECT statements. INSERT INTO Employee SELECT * FROM PD.Employee; INSERT INTO Emp_Phone SELECT * FROM PD.Emp_Phone; 2. Using the DBC.TableSize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function. Size of populated Emp_Phone table = _______________
Lab Exercises
Lab Exercise 10-3 (cont.)
Tasks 3. The Foreign key is Employee_Number in PD.Emp_Phone and the Primary Key is the Employee_Number in PD.Employee. Create a References constraint on Employee_Number using the following SQL statements. ALTER TABLE Emp_Phone ADD CONSTRAINT fk1 FOREIGN KEY (Employee_Number) REFERENCES Employee (Employee_Number); (use the HELP CONSTRAINT Emp_Phone.fk1; to view constraint information. 4. Using the DBC.TableSize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function. Estimate the size of the Reference Index = _______________ Size of populated Emp_Phone with references index = _______________ Size of references index = _______________ 5. Drop the Foreign Key constraint by executing the following SQL command.
CREATE SET TABLE Accounts_MVC, FALLBACK, NO BEFORE JOURNAL, NO AFTER JOURNAL (ACCOUNT_NUMBER INTEGER NOT NULL, NUMBER INTEGER, STREET CHAR(25), CITY CHAR(20) COMPRESS ('Hermosa Beach', 'Culver City', 'Los Angeles','Santa Monica'), STATE CHAR(2), ZIP_CODE INTEGER, BALANCE_FORWARD DECIMAL(10,2), BALANCE_CURRENT DECIMAL(10,2)) PRIMARY INDEX ( ACCOUNT_NUMBER );
Populate your Accounts_MVC table from the AU.Accounts table using INSERT/SELECT. Using the DBC.TableSize view, what is the amount of Perm space used. Accounts_MVC = 1,404,828
SELECT SUM(CurrentPerm) FROM DBC.Tablesize WHERE DatabaseName = DATABASE AND TableName = 'Trans' ; Sum(CurrentPerm) 8192
Size of empty Trans = 8192 (Captured from an 8 AMP system)
What size are the table headers on each AMP? 1024 2. Using SHOW TABLE, the Row Size Calculation form and the Sizing a Data Table Formula, estimate the size of this table; assume 15,000 rows.
Each row is 24 bytes long plus 14 bytes for overhead = 38 bytes 38 x 15,000 = 570,000 x 2 (Fallback) = 1,140,000 bytes approx.
Estimated size of Trans = 1,140,000 3. Populate your Trans table from the AU.Trans table using the following INSERT/SELECT statement:
Count(*) 15000
SUM(CurrentPerm) FROM DBC.Tablesize DatabaseName = DATABASE TableName = 'Trans' ; (Estimated size was 1,140,000)
Sum(CurrentPerm) 1153024
5. Create a USI on the Trans_Number column.
SELECT COUNT(DISTINCT(Trans_ID)) FROM Trans; Count(Distinct(TRANS_ID)) 975 (15,000 x 8) + ( 975 x (4 + 21) x 8 ) = 315,000 bytes approx. 315,000 x 2 (Fallback) = 630,400 bytes approx.
Actual size of the NUSI = 523,264 (use the empirical sizing technique)
INSERT INTO Employee SELECT * FROM PD.Employee; INSERT INTO Emp_Phone SELECT * FROM PD.Emp_Phone;
2. Using the DBC.Tablesize view, determine the actual size of the Emp_Phone table by using the SUM aggregate function. Size of populated Emp_Phone = 124,928
Sum(CurrentPerm) 124,928
The Foreign key is Employee_Number in the Emp_Phone table and the Primary Key is the Employee_Number in the Employee table. Create a References constraint on Employee_Number using the following SQL statements. ALTER TABLE Emp_Phone ADD CONSTRAINT fk1 FOREIGN KEY (Employee_Number) REFERENCES Employee (Employee_Number); (use the HELP CONSTRAINT Emp_Phone.fk1; to view constraint information.
SELECT COUNT(DISTINCT(Employee_Number)) AS "Count" FROM Emp_Phone; Count 1000 (4 + 25) x 1,000 = 29,000 x 2 (Fallback) = 58,000 bytes approx.
Estimate the size of the Reference Index = 58,000
Size of populated Emp_Phone with references index = 190,464
Sum(CurrentPerm) 190464
Size of references index = 65,536 5. Drop the Foreign Key constraint by executing the following SQL command.