Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Datadeling

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 27

What is schema?

Multidimensional Schema is especially designed to model data warehouse systems. The


schemas are designed to unique needs of very large databases designed for the analytical purpose
(OLAP).

Types of Data Warehouse Schema:

Following are 3 chief Galaxy Schema

What is a Star Schema?


Star Schema in data warehouse, in which the center of the star can have one fact table and a
number of associated dimension tables.

In the following Star Schema example, the fact table is at the center which contains keys to every
dimension table like Dealer_ID, Model ID, Date_ID, Product_ID, Branch_ID & other attributes
like Units sold and revenue.

Characteristics of Star Schema:

 Every dimension in a star schema is represented with the only one-dimension table.
 The dimension table should contain the set of attributes.
 The dimension table is joined to the fact table using a foreign key
 The dimension table are not joined to each other
 Fact table would contain key and measure
 The dimension tables are not normalized. For instance, in the above figure, Country_ID
does not have Country lookup table as an OLTP design would have.

What is a Snowflake Schema?


Snowflake Schema in data warehouse is a logical arrangement of tables in a multidimensional
database such that the ER diagram resembles a snowflake shape. A Snowflake Schema is an
extension of a Star Schema, and it adds additional dimensions. The dimension tables are
normalized which splits data into additional tables.

In the following Snowflake Schema example, Country is further normalized into an individual
table.
Exam
ple of Snowflake Schema

Characteristics of Snowflake Schema:

 The main benefit of the snowflake schema it uses smaller disk space.
 Easier to implement a dimension is added to the Schema
 Due to multiple tables query performance is reduced
 The primary challenge that you will face while using the snowflake Schema is that you
need to perform more maintenance efforts because of the more lookup tables.

Star Schema Vs Snowflake Schema: Key Differences


Following is a key difference between Snowflake schema vs Star schema:

Star Schema Snowflake Schema


In a star schema, only single join creates the relationship A snowflake schema requires many
between the fact table and any dimension tables. joins to fetch the data.
Simple DB Design. Very Complex DB Design.
Denormalized Data structure and query also run faster. Normalized Data Structure.

What is a Galaxy Schema?


A Galaxy Schema contains two fact table that share dimension tables between them. It is also
called Fact Constellation Schema. The schema is viewed as a collection of stars hence the name
Galaxy Schema.
Ex
ample of Galaxy Schema

As you can see in above example, there are two facts table

1. Revenue
2. Product.

In Galaxy schema shares dimensions are called Conformed Dimensions.

What is Star Cluster Schema?


Snowflake schema contains fully expanded hierarchies. However, this can add complexity to the
Schema and requires extra joins. On the other hand, star schema contains fully collapsed
hierarchies, which may lead to redundancy. So, the best solution may be a balance between these
two schemas which is Star Cluster Schema design.

Example of Star Cluster Schema

Overlapping dimensions can be found as forks in hierarchies. A fork happens when an entity acts
as a parent in two different dimensional hierarchies. Fork entities then identified as classification
with one-to-many relationships.
Schema Types In Data Warehouse Modeling – Star & SnowFlake Schema

Data Warehouse Schema


In a data warehouse, a schema is used to define the way to organize the system with all the
database entities (fact tables, dimension tables) and their logical association.

Here are the different types of Schemas in DW:

1. Star Schema
2. SnowFlake Schema
3. Galaxy Schema
4. Star Cluster Schema

#1) Star Schema

This is the simplest and most effective schema in a data warehouse. A fact table in the center
surrounded by multiple dimension tables resembles a star in the Star Schema model.

The fact table maintains one-to-many relations with all the dimension tables. Every row in a fact
table is associated with its dimension table rows with a foreign key reference.

Due to the above reason, navigation among the tables in this model is easy for querying
aggregated data. An end-user can easily understand this structure. Hence all the Business
Intelligence (BI) tools greatly support the Star schema model.

While designing star schemas the dimension tables are purposefully de-normalized. They are
wide with many attributes to store the contextual data for better analysis and reporting.

Benefits Of Star Schema

 Queries use very simple joins while retrieving the data and thereby query performance is
increased.
 It is simple to retrieve data for reporting, at any point of time for any period.

Disadvantages Of Star Schema

 If there are many changes in the requirements, the existing star schema is not recommended to
modify and reuse in the long run.
 Data redundancy is more as tables are not hierarchically divided.

An example of a Star Schema is given below.


Querying A Star Schema

An end-user can request a report using Business Intelligence tools. All such requests will be
processed by creating a chain of “SELECT queries” internally. The performance of these queries
will have an impact on the report execution time.

#2) SnowFlake Schema

Benefits of SnowFlake Schema:

 Data redundancy is completely removed by creating new dimension tables.


 When compared with star schema, less storage space is used by the Snow Flaking dimension
tables.
 It is easy to update (or) maintain the Snow Flaking tables.

Disadvantages of SnowFlake Schema:

 Due to normalized dimension tables, the ETL system has to load the number of tables.
 You may need complex joins to perform a query due to the number of tables added. Hence
query performance will be degraded.

An example of a SnowFlake Schema is given below.


#3) Galaxy Schema

A galaxy schema is also known as Fact Constellation Schema. In this schema, multiple fact
tables share the same dimension tables. The arrangement of fact tables and dimension tables
looks like a collection of stars in the Galaxy schema model.

The shared dimensions in this model are known as Conformed dimensions.

This type of schema is used for sophisticated requirements and for aggregated fact tables that are
more complex to be supported by the Star schema (or) SnowFlake schema. This schema is
difficult to maintain due to its complexity.

An example of Galaxy Schema is given below.


#4) Star Cluster Schema

A SnowFlake schema with many dimension tables may need more complex joins while
querying. A star schema with fewer dimension tables may have more redundancy. Hence, a star
cluster schema came into the picture by combining the features of the above two schemas.

Star schema is the base to design a star cluster schema and few essential dimension tables from
the star schema are snowflaked and this, in turn, forms a more stable schema structure.

An example of a Star Cluster Schema is given below.


Which Is Better Snowflake Schema Or Star Schema?

Star and SnowFlake are the most frequently used schemas in DW. Star schema is preferred if BI
tools allow business users to easily interact with the table structures with simple queries. The
SnowFlake schema is preferred if BI tools are more complicated for the business users to interact
directly with the table structures due to more joins and complex queries.

You can go ahead with the SnowFlake schema either if you want to save some storage space or if
your DW system has optimized tools to design this schema.

Star Schema Vs Snowflake Schema

Given below are the key differences between Star schema and SnowFlake schema.

S.No Star Schema Snow Flake Schema

1 Data redundancy is more. Data redundancy is less.

Storage space for dimension tables is


2 Storage space for dimension tables is more.
comparatively less.

3 Contains de-normalized dimension tables. Contains normalized dimension tables.

Single fact table is surrounded by multiple Single fact table is surrounded by multiple
4
dimension tables. hierarchies of dimension tables.

Queries use direct joins between fact and Queries use complex joins between fact and
5
dimensions to fetch the data. dimensions to fetch the data.

6 Query execution time is less. Query execution time is more.

8 Uses top down approach. Uses bottom up approach.

=============================*==================================

1) What is data modelling?

Data modelling is the process of creating a model for the data to store in a database. It is a
conceptual representation of data objects and its association with different data objects.

2) Explain various types of data models

There are mainly three different types of data models:


Conceptual: Conceptual data model defines what should the system contain. This model is
typically created by business stakeholders and data architects. The purpose is to organize, scope,
and define business concepts and rules.

Logical: Defines how the system should be implemented regardless of the DBMS. This model is
typically created by data architects and business analysts. The purpose is to develop a technical
map of rules and data structures.

Physical: This data model describes how the system will be implemented using a specific
DBMS system. This model is typically created by DBA and developers. The purpose is the
actual implementation of the database.

3) Explain the fact and fact table

The fact represents quantitative data. For example, the net amount which is due. A fact table
contains numerical data as well as foreign keys from dimensional tables.

4) List out various design schema in data modelling

There are two different types of data modelling schemes schemas: 1) Star Schema, and 2)
Snowflake Schema

5) When should you consider denormalization?

Denormalization is used when there is a lot of involvement of the table while retrieving data. It is
used to construct a data warehouse.

6) Explain dimension and attribute

Dimensions represent qualitative data. For example, product, class, plan, etc. A dimension table
has textual or descriptive attributes. For example, the product category and product name are two
attributes of the product dimension table.

7) What is the fact less fact?

Fact less fact is a table having no fact measurement. It contains only the dimension keys.

9) What is the difference between OLTP and OLAP?

OLTP OLAP
OLAP is an online analysis and data retrieving
OLTP is an online transactional system.
process.
It is characterized by a large number of
It is characterized by a large volume of data.
short online transactions.
OLTP uses traditional DBMS. OLAP uses a data warehouse.
Tables in OLTP database are normalized. The tables in OLAP are not normalized.
Its response time is in a millisecond. Its response time is in second to minutes.
OLTP is designed for real time business OLAP is designed for the analysis of business
operations. measures by category and attributes.

10) What is table?

The collection of rows and columns is called as table. Each and every column has a datatype.
Table contains related data in a tabular format.

11) What is column?

Column or field is a vertical arrangement of data that contain related information.

13) What is composite primary key?

Composite primary key is referred to the case where more than one table column is used as a part
of primary key.

14) What is primary key?

Primary key is a column or group of columns that unequally identify each and every row in the
table. The value of primary key must not be null. Every table must contain one primary key.

15) Explain foreign key

Foreign key is a group of attributes which is used to link parent and child table. The value of the
foreign key column, which is available in the child table, is referred to the value of the primary
key in the parent table.

17) What is data mart?

A data mart is a condensed version of a data warehouse and is designed for use by a specific
department, unit, or set of users in an organization. E.g., marketing sales, HR, or finance.

19) What are the examples of the OLTP system?

Example of OLTP system are:

 Sending a text message


 Add a book to shopping cart
 Online airline ticket booking
 Online banking
 Order entry

20) What is check constraint?


Check constraint is used to verify a range of values in a column.

21) List out the types of normalization?

Types of normalizations are: 1) first normal form, 2) second normal form, 3) third normal forms,
4) boyce-codd fourth, and 5) fifth normal forms.

22) What is forward data engineering?

Forward engineering is a technical term used to describe the process of translating a logical
model into a physical implement automatically.

23) What is PDAP?

It is a data cube that stores data as a summary. It helps the user to analyse data quickly. The data
in PDAP is stored in a way that reporting can be done with ease.

27) What is discrete and continuous data?

Discreet data is a finite data or defined data. E.g., gender, telephone numbers. Continuous data is
data that changes in a continuous and ordered manner. E.g., age.

28) What is the time series algorithm?

Time series algorithm is a method to predict continuous values of data in table. E.g.,
Performance one employee can forecast the profit or influence.

30) What is bit mapped index?

Bitmap indexes are a special type of database index that uses bitmaps (bit arrays) to answer
queries by executing bitwise operations.

31) Explain data warehousing in detail

Data warehousing is a process for collecting and managing data from varied sources. It provides
meaningful business enterprise insights. Data warehousing is typically used to connect and
analyse data from heterogeneous sources. It is the core of the BI system, which is built for data
analysis and reporting.

32) What is junk dimension?

Junk dimension combines two or more related cardinality into one dimension. It is usually
Boolean or flag values.

35) What is database cardinality?


Cardinality is a numerical attribute of the relationship between two entities or entity sets.

36) What are the different types of cardinal relationships?

Different types of key cardinal relationships are:

 One-to-One Relationships
 One-to-Many Relationships
 Many-to-One Relationships
 Many-to-Many Relationships

38) What is data mining?

Data mining is a multi-disciplinary skill that uses machine learning, statistics, AI, and database
technology. It is all about discovering unsuspected / previously unknown relationships amongst
the data.

42) Explain relational data modelling

Relational data modelling is representation of objects in a relational database, which is usually


normalized.

44) What is the difference between logical data model and physical data model?

Logical data model Physical data model


A physical data model provides information
A logical data model can design the
about the target database source and its
requirement of business logically.
properties.
A physical data model helps you to create a new
It is responsible for the actual implementation
database model from existing and apply the
of data which is stored in the database.
referential integrity constraint.
It contains an entity, primary key attributes, A physical data model contains a table, key
Inversion keys, alternate key, rule, business constraints, unique key, columns, foreign key,
relation, definition, etc. indexes, default values, etc.

45) What are the different types of constraints?

A different type of constraint could be unique, null values, foreign keys, composite key or check
constraint, etc.

50) What are the advantages of using data modelling?

The advantages of using data modelling in data warehousing are:

 It helps you to manage business data by normalizing it and defining its attributes.
 Data modelling integrates the data of various systems to reduce data redundancy.
 It enables to create efficient database design.
 Data modelling helps the organization department to function as a team.
 It facilitates to access data with ease.

51) What are the disadvantages of using data modelling?

The disadvantages of using data modelling are:

 It has less structural independency


 It can make the system complex.

52) What is index?

Index is used for a column or group of columns to retrieve data fast.

53) What are the characteristics of a logical data model?

Characteristics of logical data model are:

 Describes data needs for a single project but could integrate with other logical data
models based on the scope of the project.
 Designed and developed independently from the DBMS.
 Data attributes will have datatypes with exact precisions and length.
 Normalization processes to the model, which is generally are applied typically till 3NF.

54) What are the characteristics of physical data model?

Characteristics of physical data model are:

 The physical data model describes data need for a single project or application. It may be
integrated with other physical data models based on project scope.
 Data model contains relationships between tables that address cardinality and nullability
of the relationships.
 Developed for a specific version of a DBMS, location, data storage, or technology to be
used in the project.
 Columns should have exact datatypes, lengths assigned, and default values.
 Primary and foreign keys, views, indexes, access profiles, and authorizations, etc. are
defined.

55) What are the two types of data modelling techniques?

Two types of data modelling techniques are: 1) entity-relationship (E-R) Model, and 2) UML
(Unified Modelling Language).

56) What is UML?


UML (Unified Modelling Language) is a general-purpose, database development, modelling
language in the field of software engineering. The main intention is to provide a generalized way
to visualize system design.

57) Explain object-oriented database model

The object-oriented database model is a collection of objects. These objects can have associated
features as well as methods.

58) What is a network model?

It is a model which is built on hierarchical model. It allows more than one relationship to link
records, which indicates that it has multiple records. It is possible to construct a set of parent
records and child records. Each record can belong to multiple sets that enable you to perform
complex table relationships.

59) What is hashing?

Hashing is a technique which is used to search all the index value and retrieve desired data. It
helps to calculate the direct location of data, which are recorded on disk without using the
structure of the index.

60) What is business or natural keys?

business or natural keys is a field that uniquely identifies an entity. For example, client ID,
employee number, email etc.

61) What is compound key?

When more than one field is used to represent a key, it is referred to as a compound key.

62) What is first normal form?

First normal form or 1NF is a property of a relation available in a relational database


management system. Any relation is called first normal form if the domain of every attribute
contains values which are atomic. It contains one value from that domain.

63) What is the difference between primary key and foreign key?

Primary key Foreign key


Primary key helps you to uniquely identify a Foreign key is a field in the table that is the primary
record in the table. key of another table.
Primary Key never accepts null values. A foreign key may accept multiple null values.
Primary key is a clustered index, and data in A foreign key cannot automatically create an index,
the DBMS table are physically organized in clustered, or non-clustered. However, you can
the sequence of the clustered index. manually create an index on the foreign key.
You can have the single Primary key in a
You can have multiple foreign keys in a table.
table.

64) What are the requirements of the second normal form?

The requirements of second normal form are:

 It should be in first normal form.


 It does not contain any non-prime attribute, which is functionally dependent on any
subset of candidate key of the table relation.

65) What are the rules for third normal form?

Rules for third normal forms are:

 It should be in second normal form


 It has no transitive functional dependencies.

66) What the importance of using keys?

 Keys help you to identify any row of data in a table. In a real-world application, a table
could contain thousands of records.
 Keys ensure that you can uniquely identify a table record despite these challenges.
 Allows you to establish a relationship between and identify the relation between tables
 Help you to enforce identity and integrity in the relationship.

67) What is a Surrogate Key?

An artificial key which aims to uniquely identify each record is called a surrogate key. These
kinds of key are unique because they are created when you don’t have any natural primary key.
They do not lend any meaning to the data in the table. Surrogate key is usually an integer.

68) Explain alternate key in detail

Alternate key is a column or group of columns in a table that uniquely identifies every row in
that table. A table can have multiple choices for a primary key, but only one can be set as the
primary key. All the keys which are not primary key are called an Alternate Key.

69) What is fourth normal form in DBMS?

Fourth normal form is a level of database normalization where there must not have non trivial
dependency other than candidate key.

70) What is a database management system?


Database management system or DBMS is a software for storing and retrieving user data. It
consists of a group of programs which manipulate the database.

71) What is the rule of fifth normal form?

A table is in 5th normal form only if it is in 4th normal form, and it cannot be decomposed into
any number of smaller tables without loss of data.

72) What is normalization?

Normalization is a database design technique that organizes tables in a manner that reduces
redundancy and dependency of data. It divides larger tables into smaller tables and links them
using relationships.

73) Explain the characteristics of a database management system

 Provides security and removes redundancy


 Self-describing nature of database system
 Insulation between programs and data abstraction
 Support of multiple views of data.
 Sharing of data and multiuser transaction processing
 DBMS allows entities and relations among them to form tables.
 It follows the ACID concept (Atomicity, Consistency, Isolation, and Durability).
 DBMS supports a multi-user environment that allows users to access and access and
manipulate data in parallel.

74) List out popular DBMS software

Popular DBMS software is:

 MySQL
 Microsoft Access
 Oracle
 PostgreSQL
 dbase
 FoxPro
 SQLite
 IBM DB2
 Microsoft SQL Server.

75) Explain the concept of RDBMS

Relational Database Management System is a software which is used to store data in the form of
tables. In this kind of system, data is managed and stored in rows and columns, which is known
as tuples and attributes. RDBMS is a powerful data management system and is widely used
across the world.
76) What are the advantages of data model?

Advantages of the data model are:

 The main goal of a designing data model is to make sure that data objects offered by the
functional team are represented accurately.
 The data model should be detailed enough to be used for building the physical database.
 The information in the data model can be used for defining the relationship between
tables, primary and foreign keys, and stored procedures.
 Data Model helps businesses to communicate within and across organizations.
 Data model helps to documents data mappings in the ETL process
 Help to recognize correct sources of data to populate the model

77) What are the disadvantages of Data Model?

Disadvantages of Data model are:

 To develop Data model, one should know physical data stored characteristics.
 This is a navigational system that produces complex application development,
management. Thus, it requires knowledge of the biographical truth.
 Even smaller changes made in structure require modification in the entire application.
 There is no set of data manipulation language in DBMS.

78) Explain various types of fact tables

There are three types of fact tables:

 Additive: It is a measure that is added to any dimension.


 Non-additive: It is a measure that can’t be added to any dimension.
 Semi-additive: It is a measure that can be added to a few dimensions.

79) What is aggregate table?

The aggregate table contains aggregated data that can be calculated using functions such as: 1)
Average 2) MAX, 3) Count, 4) SUM, 5) SUM, and 6) MIN.

80) What is Confirmed dimension?

A conformed dimension is a dimension which is designed in a way that can be used across many
fact tables in various areas of a data warehouse.

81) List types of Hierarchies in data modelling

There are two types of Hierarchies: 1) Level based hierarchies and 2) Parent-child hierarchies.

82) What is the difference between a data mart and data warehouse?
Data mart Data warehouse
Data mart focuses on a single subject area Data warehouse focuses on multiple areas of
of business. business.
It is used to make tactical decisions for
It helps business owners to take a strategic decision
business growth.
Data mart follows the bottom-up model Data warehouse follows a top-down model
Data source comes from more than one
Data source comes from one data source
heterogeneous data sources.

83) What is XMLA?

XMLA is an XML analysis that is considered as standard for accessing data in Online Analytical
Processing (OLAP).

84) Explain junk dimension

Junk dimension helps to store data. It is used when data is not proper to store in schema.

85) Explain chained data replication

The situation when a secondary node selects target using ping time or when the closest node is a
secondary, it is called as chained data replication.

86) Explain Virtual Data Warehousing

A virtual data warehouse gives a collective view of the completed data. A virtual data warehouse
does not have historical data. It is considered as a logical data model having metadata.

87) Explain snapshot of data warehouse

Snapshot is a complete visualization of data at the time when data extraction process begins.

88) What is a bi-directional extract?

The ability of system to extract, cleanse, and transfer data in two directions is called as a
directional extract.

==================================*================================

Q #1) What do you understand by Data Modelling?

Answer: Data Modelling is the diagrammatic representation showing how the entities are
related to each other. It is the initial step towards database design. We first create the conceptual
model, then the logical model and finally move to the physical model.
Generally, the data models are created in data analysis & design phase of software development
life cycle.

Q #2) Explain your understanding of different data models?

Answer: There are three types of data models – conceptual, logical and physical. The level of
complexity and detail increases from conceptual to logical to a physical data model.

The conceptual model shows a very basic high level of design while the physical data model
shows a very detailed view of design.

 Conceptual Model will be just portraying entity names and entity relationships. Figure 1
shown in the later part of this article depicts a conceptual model.
 Logical Model will be showing up entity names, entity relationships, attributes, primary
keys and foreign keys in each entity. Figure 2 shown inside question#4 in this article
depicts a logical model.
 Physical Data Model will be showing primary keys, foreign keys, table names, column
names and column data types. This view actually elaborates how the model will be
actually implemented in the database.

Q #3) Throw some light on your experience in Data Modelling with respect to projects you
have worked on till date?

Note: This was the very first question in one of my Data Modelling interviews. So, before you
step into the interview discussion, you should have a very clear picture of how data modeling fits
into the assignments you have worked upon.

Answer: I have worked on a project for a health insurance provider company where we have
interfaces build in Informatica that transforms and process the data fetched from Facets database
and sends out useful information to vendors.

Note: Facets is an end to end solution to manage all the information for health care industry.
The facets database in my project was created with SQL server 2012.

We had different entities that were linked together. These entities were subscriber, member,
healthcare provider, claim, bill, enrollment, group, eligibility, plan/product, commission,
capitation, etc.

Below is the conceptual Data Model showing how the project looked like on a high-level

Figure 1:
Each of the data entities has its own data attributes. For Example, a data attribute of the provider
will be provider identification number, few data attributes of the membership will be subscriber
ID, member ID, one of the data attribute of claim will claim ID, each healthcare product or plan
will be having a unique product ID and so on.

Q #4) What are the different design schemas in Data Modelling? Explain with the
example?

Answer: There are two different kinds of schemas in data modeling

 Star Schema
 Snowflake Schema

Now, I will be explaining each of these schemas one by one.

The simplest of the schemas is star schema where we have a fact table in the center that
references multiple dimension tables around it. All the dimension tables are connected to the fact
table. The primary key in all dimension tables acts as a foreign key in the fact table.

The ER diagram (see Figure 2) of this schema resembles the shape of a star and that is why this
schema is named as a star schema.

Figure 2:
The star schema is quite simple, flexible and it is in de-normalized form.

In a snowflake schema, the level of normalization increases. The fact table here remains the
same as in the star schema. However, the dimension tables are normalized.  Due to several layers
of dimension tables, it looks like a snowflake, and thus it is named as snowflake schema.

Figure 3:

Q #5) Which scheme did you use in your project & why?

Q #6) Which schema is better – star or snowflake?


Answer: (Combined for Q #5&6): The choice of a schema always depends upon the project
requirements & scenarios.

Since star schema is in de-normalized form, you require fewer joins for a query. The query is
simple and runs faster in a star schema. Coming to the snowflake schema, since it is in
normalized form, it will require a number of joins as compared to a star schema, the query will
be complex and execution will be slower than star schema.

Another significant difference between these two schemas is that snowflake schema does not
contain redundant data and thus it is easy to maintain. On the contrary, star schema has a high
level of redundancy and thus it is difficult to maintain.

Now, which one to choose for your project? If the purpose of your project is to do more of
dimension analysis, you should go for snowflake schema. For Example, if you need to find out
that “how many subscribers are tied to a particular plan which is currently active?” – go with
the snowflake model.

If the purpose of your project is to do more of a metrics analysis, you should go with a star
schema. For Example, if you need to find out that “what is the claim amount paid to a
particular subscriber?” – go with a star schema.

In my project, we used snowflake schema because we had to do analysis across several


dimensions and generate summary reports for the business. Another reason for using snowflake
schema was it is less memory consumption.

Q #7) What do you understand by dimension and attribute?

Answer: Dimensions represent qualitative data. For Example, plan, product, class are all
dimensions.

A dimension table contains descriptive or textual attributes. For Example, the product category
& product name are the attributes of the product dimension.

Q #8) What is a fact & a fact table?

Answer: Facts represent quantitative data.

For Example, the net amount due is a fact. A fact table contains numerical data and foreign keys
from related dimensional tables. An example of the fact table can be seen from Figure 2 shown
above.

Q #9) What are the different types of dimensions you have come across? Explain each of
them in detail with an example?

Answer: There are typically five types of dimensions.


a) Conformed dimensions: A Dimension that is utilized as a part of different areas are called a
conformed dimension. It might be utilized with different fact tables in a single database or over
numerous data marts/warehouses.

For Example, if the subscriber dimension is connected to two fact tables – billing and claim
then the subscriber dimension would be treated as a conformed dimension.

b) Junk Dimension: It is a dimension table comprising of attributes that don’t have a place in
the fact table or in any of the current dimension tables. Generally, these are properties like flags
or indicators.

For Example, it can be a member eligibility flag set as ‘Y’ or ‘N’ or any other indicator set as
true/false, any specific comments, etc. if we keep all such indicator attributes in the fact table
then its size gets increased. So, we combine all such attributes and put in a single dimension
table called a junk dimension having unique junk IDs with a possible combination of all the
indicator values.

c) Role-Playing Dimension: These are the dimensions that are utilized for multiple purposes in
the same database.

For Example, a date dimension can be used for “Date of Claim”, “Billing date” or “Plan Term
date”. So, such a dimension will be called a Role-playing dimension. The primary key of the
Date dimension will be associated with multiple foreign keys in the fact table.

d) Slowly Changing Dimension (SCD): These are most important amongst all the dimensions.
These are the dimensions where attribute values vary with time. Below are the varies types of
SCDs

 Type-0: These are the dimensions where attribute value remains steady with time. For
Example, Subscriber’s DOB is a type-0 SCD because it will always remain the same
irrespective of the time.
 Type-1: These are the dimensions where the previous value of the attribute is replaced by
the current value. No history is maintained in the Type-1 dimension. For
Example, Subscriber’s address (where the business requires to keep the only current
address of subscriber) can be a Type-1 dimension.
 Type-2: These are the dimensions where unlimited history is preserved. For Example,
Subscriber’s address (where the business requires to keep a record of all the previous
addresses of the subscriber). In this case, multiple rows for a subscriber will be inserted
in the table with his/her different addresses. There will be some column(s) that will
identify the current address. For Example, ‘Start date’ and ‘End date’. The row where
‘End date’ value will be blank would contain the subscriber’s current address and all
other rows will be having previous addresses of the subscriber.
 Type-3: These are the type of dimensions where limited history is preserved. And we use
an additional column to maintain the history. For Example, Subscriber’s address (where
the business requires to keep a record of current & just one previous address). In this
case, we can dissolve the ‘address’ column into two different columns – ‘current address’
and ‘previous address’. So, instead of having multiple rows, we will be having just one-
row showing current as well as the previous address of the subscriber.
 Type-4: In this type of dimension, the historical data is preserved in a separate table. The
main dimension table holds only the current data. For Example, the main dimension
table will have only one row per subscriber holding its current address. All other previous
addresses of the subscriber will be kept in the separate history table. This type of
dimension is hardly ever used.

e) Degenerated Dimension: A degenerated dimension is a dimension that is not a fact but


presents in the fact table as a primary key. It does not have its own dimension table. We can also
call it as a single attribute dimension table.

But, instead of keeping it separately in a dimension table and putting an additional join, we put
this attribute in the fact table directly as a key. Since it does not have its own dimension table, it
can never act as a foreign key in the fact table.

Q #10) Give your idea regarding factless fact? And why do we use it?

Answer: Factless fact table is a fact table that contains no fact measure in it. It has only the
dimension keys in it.

At times, certain situations may arise in the business where you need to have a factless fact table.

For Example, suppose you are maintaining an employee attendance record system, you can
have a factless fact table having three keys.

Employee_ID
Department_ID
Time_ID

You can see that the above table does not contain any measure. Now, if you want to answer the
below question, you can do easily using the above single factless fact table rather than having
two separate fact tables:

“How many employees of a particular department were present on a particular day?”

So, the factless fact table offers flexibility to the design.

Q #11) Distinguish between OLTP and OLAP?

Answer: OLTP stands for the Online Transaction Processing System & OLAP stands for the
Online Analytical Processing System. OLTP maintains the transactional data of the business &
is highly normalized generally. On the contrary, OLAP is for analysis and reporting purposes &
it is in de-normalized form.
This difference between OLAP and OLTP also gives you the way to choosing the design of
schema. If your system is OLTP, you should go with star schema design and if your system is
OLAP, you should go with snowflake schema.

Q #12) What do you understand by data mart?

Answer: Data marts are for the most part intended for a solitary branch of business. They are
designed for the individual departments.

For Example, I used to work for a health insurance provider company that had different
departments in it like Finance, Reporting, Sales and so forth.

We had a data warehouse that was holding the information pertaining to all these departments
and then we have few data marts built on top of this data warehouse. These DataMart were
specific to each department. In simple words, you can say that a DataMart is a subset of a data
warehouse.

Q #13) What are the different types of measures?

Answer: We have three types of measures, namely

 Non- additive measures


 Semi- additive measures
 Additive measures

Non-additive measures are the ones on top of which no aggregation function can be applied. For
Example, a ratio or a percentage column; a flag or an indicator column present in fact table
holding values like Y/N, etc. is a non-additive measure.

Semi- additive measures are the ones on top of which some (but not all) aggregation functions
can be applied. For Example, fee rate or account balance.

Additive measures are the ones on top of which all aggregation functions can be applied. For
Example, units purchased.

Q # 14) What is a Surrogate key? How is it different from a primary key?

Answer: Surrogate Key is a unique identifier or a system-generated sequence number key that
can act as a primary key. It can be a column or a combination of columns. Unlike a primary key,
it is not picked up from the existing application data fields.

Q #15) Is this true that all databases should be in 3NF?

Answer: It is not mandatory for a database to be in 3NF. However, if your purpose is the easy
maintenance of data, less redundancy, and efficient access then you should go with a de-
normalized database.
Q #16) Have you ever came across the scenario of recursive relationships? If yes, how did
you handle it?

Answer: A recursive relationship occurs in the case where an entity is related to itself. Yes, I
have come across such a scenario.

Talking about the health care domain, it is a possibility that a health care provider (say, a doctor)
is a patient to any other health care provider. Because, if the doctor himself falls ill and needs
surgery, he will have to visit some other doctor for getting the surgical treatment.

So, in this case, the entity – health care provider is related to itself. A foreign key to the health
insurance provider’s number will have to present in each member’s (patient) record.

Q #17) List out a few common mistakes encountered during Data Modelling?

Answer: Few common mistakes encountered during Data Modelling are:

 Building massive data models: Large data models are like to have more design faults.
Try to restrict your data model to not more than 200 tables.
 Lack of purpose: If you do not know that what is your business solution is intended for,
you might come up with an incorrect data model. So having clarity on the business
purpose is very important to come up with the right data model.
 Inappropriate use of surrogate keys: Surrogate key should not be used unnecessarily.
Use surrogate key only when the natural key cannot serve the purpose of a primary key.
 Unnecessary de-normalization: Don’t denormalize until and unless you have a solid &
clear business reason to do so because de-normalization creates redundant data which is
difficult to maintain.

Q #18) What is the number of child tables that can be created out from a single parent
table?

Answer: The number of child tables that can be created out of the single parent table is equal to
the number of fields/columns in the parent table that are non-keys.

Q #19) Employee health details are hidden from his employer by the health care provider.
Which level of data hiding is this? Conceptual, physical or external?

Answer: This is the scenario of an external level of data hiding.

Q #20) What is the form of fact table & dimension table?

Answer: Generally, the fact table is in normalized form and the dimension table is in de-
normalized form.

Q #21) What particulars you would need to come up with a conceptual model in a health
care domain project?
Answer: For a health care project, below details would suffice the requirement to design a basic
conceptual model

 Different categories of health care plans and products.


 Type of subscription (group or individual).
 Set of health care providers.
 Claim and billing process overview.

Q #22) Tricky one:  If a unique constraint is applied to a column then will it throw an error
if you try to insert two nulls into it?

Answer: No, it will not throw any error in this case because a null value is unequal to another
null value. So, more than one null will be inserted in the column without any error.

Q #23) Can you quote an example of a sub-type and super-type entity?

Answer: Yes, let’s say we have these different entities – vehicle, car, bike, economy car, family
car, sports car.

Here, a vehicle is a super-type entity. Car and bike are its sub-type entities. Furthermore,
economy cars, sports cars, and family cars are sub-type entities of its super-type entity- car.

A super-type entity is the one that is at a higher level. Sub-type entities are ones that are grouped
together on the basis of certain characteristics. For Example, all bikes are two-wheelers and all
cars are four-wheelers. And since both are vehicles, so their super-type entity is ‘vehicle’.

Q #24) What is the significance of metadata?

Answer: Metadata is data about data. It tells you what kind of data is actually stored in the
system, what is its purpose and for whom it is intended.

===========================*===============================================

You might also like