3rd Normal Form Vs Star Schema
3rd Normal Form Vs Star Schema
3rd Normal Form Vs Star Schema
We model a database for the same reasons that Boeing builds an aircraft model to test flight characteristics in a wind tunnel. It's simpler and cheaper to do than repeatedly reconstruct the plane until you get it right. A proper data model should be designed to reflect the business components and their possible relationships. The debate today is whether to model the data warehouse using 3rd Normal Form or a Star-Schema model. 3rd Normal Form can be summed up by stating that each attribute (column) must be a fact about the primary key, the whole key, and nothing but the key. Data is placed in tables where it makes the most sense with no repeating groups, derived data, or optional columns. This allows users to ask any question at any time on all data in the enterprise. A Star-Schema model is comprised of a fact table and a number of dimension tables. The fact table is a table with a multi-part key. Each element of the key is itself a foreign key to a single dimension table. The remaining fields in the fact table are knows as facts and they are numeric, continuously valued, and additive. Facts can be thought of as measurements taken at the intersection of all of the dimensions. Dimension attributes are mostly textual, and are almost always the source of constraints and report breaks. This model enhances performance on known queries. Most database modelers prefer to create a logical model in 3rd Normal Form, but most database engines are overcome by physical limitations so they must compromise the model. The four hardest things for a database to do are:
Join tables Aggregate data Sort data Scan large volumes of data
In order to get around these system limitations, vendors will suggest models to avoid joins, use summarized data to avoid aggregation, store data in sorted order, and over use indexes to avoid large scans. The reason for the debate is because Teradata is the only database engine with the power and maturity to utilize a 3rd Normal Form physical model on databases of any size, even though approaching, and exceeding, a terabyte in size. Because of the physical limitations, other databases have had to use a Star-Schema model to enhance performance, but have given up the ability to perform Ad Hoc queries and data mining. 3rd normal form is the model that should be used for the central data warehouse. This allows users the ability to ask any question at any