Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Datawarefaqs

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 13

1) what is the difference between star schema and snow flake schema ?and when we use those schema's?

A)Star Schema : Star Schema is a relational database schema for representing multimensional data. It is the simplest form of data warehouse schema that contains one or more dimensions and fact tables. It is called a star schema because the entity-relationship diagram between dimensions and fact tables resembles a star where one fact table is connected to multiple dimensions. The center of the star schema consists of a large fact table and it points towards the dimension tables. The advantage of star schema are slicing down, performance increase and easy understanding of data. Snowfla e Schema : A snowfla e schema is a term that describes a star schema structure normali!ed through the use of outrigger tables. i.e dimension table hierachies are bro en into simpler tables. In a star schema every dimension will have a primary ey. " In a star schema, a dimension table will not have any parent table. " #hereas in a snow fla e schema, a dimension table will have one or more parent tables. " $ierarchies for the dimensions are stored in the dimensional table itself in star schema. " #hereas hierachies are bro en into separate tables in snow fla e schema. These hierachies helps to drill down the data from topmost hierachies to the lowermost hierarchies. %) Star schema : In this star schema fact table in normali!ed format and dimension table is in de normali!ed format. It also nown as basic star schema. Snow fla e schema: In this both dimension and fact table is in normali!ed format only. It is also nwon as &'tended star schema. If u r ta ing the snow fla e it re(uires more dimensions, more foreign eys, and it will reduse the (uery performance, but it normali!es the records., depends on the re(uirement we can choose the schema

)) %oth these schemas are generally used in *atawarehousing. Star schema as the name indicates resembles the form of star as there is only one fact table which is associated with numerous dimension tables +with the help of foreign eys).This schema depicts a high level of denormali!ed view of data. $owever in Snowfla e schema the dimension tables are further normali!ed into different tables +fact table is single in this schema also).This schema stores data in a more normali!ed form . It depends on scenario as how much data is generally there in the datawarehouse,generally star schema is preferred.

,)Summari!e the differene between -.T/,-*S A0* *ATA #A1&$-2S& 3 A)oltp - means online transaction processiing ,it is nothing but a database ,we are calling oracle,sqlserver,db2 are olap tools. OLTP databases, as the name implies, handle real time transactions which inherentl have some special requirements. O!"- stands #or Operational !ata "tore.$ts a #inal integration point %TL process we load the data in O!" be#ore ou load the values in target.. *ata#are$ouse- *atawarehouse is collection of integrated,time varient,non volotile and time varient collection of data which is used to ta e management decisions.

B) -*S: this is operational data stores, which means the real time transactional databases. In data warehouse, we e'tract the data from -*S, transform in the stagging area and load into the target data warehouse. I thin , earlier comments on the -*S is little bit confusing.

4)#hat is the role of surrogate eys in data warehouse and how will u generate them3 A) A surrogate ey is a simple /rimary ey which maps one to one with a 0atural compound /rimary ey. The reason for using them is to alleviate the need for the (uery writer to now the full compound ey and also to speed (uery processing by removing the need for the 1*%5S to process the full compound ey when considering a 6oin. 7or e'ample, an shipment could have a natural ey of -1*&1 8 IT&5 8 S$I/5&0T9S&:. %y giving it a uni(ue S$I/5&0T9I*, subordinate tables can access it with a single attribute, rather than 4. $owever, it;s important to create a uni(ue inde' on the natural ey as well.

<) what is data cleaning? how is it done? A)I can simply say it as /urifying the data. Data Cleansing: the act o# detecting and removing and&or correcting a database's dirt data (i.e., data that is incorrect, out-o#-date, redundant, incomplete, or #ormatted incorrectl ) )) data clensing is nothing but standidi!ing and reformatting + encoding,decoding,data type conversion) the data before we store the data in the warehouse. =) A )6un dimension is a collection of random transcational codes, flags and te't attributes that are unrelated to any particular dimension.The 6un dimension is simply a structure that provides the convienent place to store the 6un dimension. *) A >6un > dimension is a collection of random transactional codes, flags and?or te't attributes that are unrelated to any particular dimension. The 6un dimension is simply a structure that provides a convenient place to store the 6un attributes.where asA degenerate dimension is data that is dimensional in nature but stored in a fact table.

*)6un dimension the column which we are using rarely or not used, these columns are formed a dimension is called 6un dimension degenerative dimension the column which we use in dimension are degenerative dimension e' emp table has empno.ename,sal,6ob,deptno but we are tal ing only the column empno,ename from the emp table and forming a dimension this is called degenerative dimension !) @un dimension: Arouping of 1andom flags and te't Attributes in a dimension and moving them to a separate sub dimension. *egenerate *imension: Beeping the control information on 7act table e': )onsider a *imension table with fields li e order number and order line number and have C:C relationship with 7act table, In this case this dimension is removed and the order information will be directly stored in a 7act table inorder eliminate unneccessary 6oins while retrieving order information.. E)A junk dimension is a convenient grouping of flags and indicators. It's helpful, but not absolutely re uired, if there's a positive correlation among the values. !heir benefits" #rovide a recogni$able, user%intuitive location for related codes, indicators and their descriptors in a dimensional frame&ork 'lean up a cluttered design that already has too many dimensions. !here might be five or more indicators that could be collapsed into a single (%byte integer surrogate key in the fact table

#rovide a smaller, uicker point of entry for ueries compared to performance from constraining directly on these attributes in the fact table. If your database supports bit% mapped indices, this potential benefit may be irrelevant, although the others are still valid. D) .--B2/3 A)#hen a table is used to chec for some data for its presence prior to loading of some other data or the same data to another table, the table is called a .--B2/ Table. %) #hen a value for the column in the target table is loo ed up from another table apart from the source tables, that table is called the loo up table. )) when we want to get related value from some other table based on particular value... suppose in one table A we have two columns emp9id,name and in other table % we have emp9id adress in target table we want to have emp9id,name,address we will ta e source as table A and loo up table as % by matching &5p9id we will get the result as three columns...emp9id,name,address %) A loo up table is nothing but a ;loo up; it give values to referenced table +it is a reference), it is used at the run time, it saves 6oins and space in terms of transformations. &'ample, a loo up table called states, provide actual state name +;Te'as;) in place of TE to the output. +) hi,The .oo 2p table provides the detailed information about the attributes.7or e'ample, the loo up table for the (uarter attribute would include a list of all the (uarters available in the data warehouse.i.e., first (uarter of ,FFC may be represented as >:C ,FFC> or >,FFC :C>.%G&. A)hi if the data is not available in the source systems then we have to get the data by some reference tables which are present in the database.these tables are called loo uptables for e'ample while loading the data from oltp to olap,we have only natural eys in oltp we don;t have the respected wh eys so we ta e

the targetdimension table as loo up if the two natural eys match then get the value of the wh ey

H) A)In *ataware house we manually load the time dimension %) &very *atawarehouse maintains a time dimension. It would be at the most granular level at which the business runs at +e': wee day, day of the month and so on). *epending on the data loads, these time dimensions are updated. #ee ly process gets updated every wee and monthly process, every month. )) Time dimension in DWH must be load Manually. we load data into Time dimension using pl/sql scripts. *) Aenerally we load the Time dimension by using SourceStage as a Se( 7ile and we use one passive stage in that transformer stage we will manually write functions as 5onth and Gear 7unctions to load the time dimensions but for the lower level i.e., *ay also we have one function to implement loading of Time *imension. &) create a procedure to load data into Time *imension. The procedure needs to run only once to popullate all the data. 7or eg, the code below fills up till ,FC=. Gou can modify the code to suit the feilds in ur table. create or replace procedure :IS-*S.Insert9#9*AG9*9/1 as .astSe(I* number default FI loaddate *ate default to9date+;C,?4C?CJHJ;,;mm?dd?yyyy;)I begin .oop .astSe(I* :K .astSe(I* 8 CI loaddate :K loaddate 8 CI I0S&1T into :IS-*S.#9*AG9* values+ .astSe(I*, Trunc+loaddate), *ecode+T-9)$A1+loaddate,;:;),;C;,C,decode+to9char+loaddate,;:;),;,;, C,,) ), T-97.-AT+T-9)$A1+loaddate, ;55;)), T-97.-AT+T-9)$A1+loaddate, ;:;)), trunc++1-20*+T-9*&)I5A.+to9char+loaddate,;***;))) 8 1-20*+T-9*&)I5A.+to9char+trunc+loaddate, ;GGGG;), ;*;)))8 =) ? H), T-97.-AT+T-9)$A1+loaddate, ;GGGG;)), T-97.-AT+T-9)$A1+loaddate, ;**;)),

T-97.-AT+T-9)$A1+loaddate, ;*;)), T-97.-AT+T-9)$A1+loaddate, ;***;)), C, C, C, C, C, T-97.-AT+T-9)$A1+loaddate, ;@;)), ++T-97.-AT+T-9)$A1+loaddate, ;GGGG;)) 8 <HC4) L C,) 8 T-9number+T-9)$A1+loaddate, ;55;)), ++T-97.-AT+T-9)$A1+loaddate, ;GGGG;)) 8 <HC4) L <) 8 T-9number+T-9)$A1+loaddate, ;:;)), T-97.-AT+T-9)$A1+loaddate, ;@;))?H, T-97.-AT +T-9)$A1 +loaddate,;GGGG;)) 8 <HC4, T-9)$A1+load9date, ;*ay;), T-9)$A1+loaddate, ;5onth;), *ecode+To9)har+loaddate,;*;),;H;,;wee end;,;D;,;wee end;,;wee day;), Trunc+loaddate,;*AG;) 8 C, *ecode+.ast9*ay+loaddate),loaddate,;y;,;n;), to9char+loaddate,;GGGG55;), to9char+loaddate,;GGGG;) MM ; $alf; MM *ecode+T-9)$A1+loaddate,;:;),;C;,C,decode+to9char+loaddate,;:;),;,;, C,,) ), T-9)$A1+loaddate, ;GGGG ? 55;), T-9)$A1+loaddate, ;GGGG;) MM; : ; MM T120)+T-9number+ T-9)$A1+loaddate, ;:;)) ) , T-9)$A1+loaddate, ;GGGG;) MM; #ee ;MM T120)+T-9number+ T-9)$A1+loaddate, ;##;))), T-9)$A1+loaddate,;GGGG;))I If loaddateKto9*ate+;C,?4C?,FC=;,;mm?dd?yyyy;) Then &'itI &nd IfI &nd .oopI commitI end Insert9#9*AG9*9/1I N) Difference between Snow flake and Star Schema. What are situations where Snow flake Schema is better a) star schema and snowfla e both serve the purpose of dimensional modeling when it come to datawarehouses. star schema is a dimensional model with a fact table + large) and

a set of dimension tables + small) . the whole set-up is totally denormali!ed. however in cases where the dimension table are split to many table that is where the schema is slighly inclined towards normali!ation + reduce redundancy and dependency) there comes the snow fla e schema. the nature?purpose of the data that is to be feed to the model is the ey to your (uestion as to which is better. b)Star schema contains the dimesion tables mapped around one or more fact tables. It is a denormalised model. 0o need to use complicated 6oins. :ueries results fastly. Snowfla e schema It is the normalised form of Star schema. contains indepth 6oins ,bcas the tbales r splitted in to many pieces.#e can easily do modification directly in the tables. #e hav to use comlicated 6oins ,since we hav more tables . There will be some delay in processing the :uery . )) Star Schema means A centrali!ed fact table and sarounded by diffrent dimensions Snowfla e means In the same star schema dimensions split into another dimensions Star Schema contains $ighly *enormali!ed *ata Snow fla e contains /artially normali!ed Star can not have parent table %ut snow fla e contain parent tables

#hy need to go there Star: $ere C)less 6oiners contains ,)simply database 4)support drilling up options #hy nedd to go Snowfla e schema: $ere some times we used to provide seperate dimensions from e'isting dimensions that time we will go to snowfla e *is Advantage -f snowfla e: :uery performance is very low because more 6oiners is there &n6oy n all the best d) veepee #rote: star schema and snowflake both serve the purpose of dimensional modeling when it come to datawarehouses. star schema is a dimensional model with a fact table ( large) and a set of dimension tables ( small) . the whole set-up is totally denormalized. however in cases where the dimension table are split to many table that is where the schema is slighly inclined towards normalization ( reduce redundancy and dependency) there comes the snow flake schema. the nature/purpose of the data that is to be feed to the model is the key to your question as to which is better. e) %oth represent the dimensional model, in case of star schema the dimensons does not split ....where as in the case of snowfla e u can see the further split in dimension for eg: if u r using more than one telephone at ur des and it is available to more than one and at the same time the telephone gives the facility of usage more than one member then in this case we need further split in the table, because we need in depth analysis.. ,)what is odsa) -*S stands for -nline *ata Storage. It is used to maintain, store the current and up to date information

and the transactions regarding the source databases ta en from the -.T/ system. It is directly connected to the source staging area.
database

systems instead of to the

It is further connected to data warehouse and moreover can be treated as a part of the data warehouse database. Edit by Admin *ata Storage -*S Stands for -perational *ata Store not -nline

b) -*S stands for -perational *ata Store. It is the final integration point in the &T. process before loading the data into the *ata #arehouse. c) -*S stands for -perational *ata Store. It contains near real time data. In typical data warehouse architecture, sometimes -*S is used for analytical reporting as well as souce for *ata #arehouse d) -perationa *ata Services is $ybrid structure that has some aspects of a data warehouse and other aspects of an -perational system. )ontains integrated data. It can support *SS processing. It can also support $igh transaction processing. /laced in between #arehouse and #eb to support web users. e) The form that data warehouse ta es in the operational environment. -perational data stores can be updated, do provide rapid constant time,and contain only limited amount of historical data #) An -perational *ata Store presents a consistent picture of the current data stored and managed by transaction processing system. As data is modified in the source system, a copy of the changed data is moved into the -*S. &'isting data in the -*S is updated to reflect the current status of the source system -*S means -perational *ata Store It is used to store current data through transactional webpplications,sap,5: series )ureent data means particular data from one date into onedate

ods contains 4F-JF data g) hi,An -perational *ata Store is a collection of data in support of an organi!ations need for upto operational, intergrated, collective information. -*S is purely operational construct to address the operational needs of a corporation. #hile loading data from Stagging to -*S we do the process of data scrubbing, data validation. CF) #hat is S)*C , S)*, , S)*4 -OS)* C: )omplete overwrite S)* ,: /reserve all history. Add row S)* 4: /reserve some history. Add additional column for ol?new. -OS)* Type C, the attribute value is overwritten with the new value, obliterating the historical attribute values.7or e'ample, when the product roll-up changes for a given product, the roll-up attribute is merely updated with the current value. S)* Type ,,a new record with the new attributes is added to the dimension table. $istorical fact table rows continue to reference the old dimension ey with the old roll-up attributeI going forward, the fact table rows will reference the new surrogate ey with the new roll-up thereby perfectly partitioning history. S)*Type 4, attributes are added to the dimension table to support two simultaneous roll-ups - perhaps the current product roll-up as well as Pcurrent version minus oneQ, or current version and original. -OS)*:-------- The value of dimensions is used change very rarely, That is called Slowly )hanging dimensions $ere mainly 4 C)S)*C:1eplace the old values overwrite by new values ,)S)*,:@ust )reating Additional records 4)S)*4:It;s maintain 6ust previous and recent In the S)*, again 4

C)Rersioning ,)7lagvalue 4)&ffective *ate range Rersioning:$ere the updated dimensions inserted in to the target along with version number The new dimensions will be inserted into the target along with /rimary ey 7lagvalue:The updated dimensions insert into the target along with F and new dimensions inset into the target along with C -OS)*C,S)*, and S)*4 can be also Type I,Type II,Type III *imensions: Type I -)hanged attribute overwrites the e'isting one. eg: If income of customer changes from <FFF to =FFF it will simply replace <FFF by =FFF. Type II *imension - 7or the changed attribute a new record is created. eg: If the income of customer is changed from <FFF to =FFF,then a new record is created with income =FFF and the previous one will remain as itis.This will help us to record the history of data. Type III *imension -$ere a new column will be added to capture the change. eg: If the income of customer increases from <FFF to =FFF,then a new column will, be added to the e'isting row titled >new income>.So in that record , cols will be there >income> and >new income>. ..) #hat is the *ifference between -.T/ and -.A/ -O)urrent data Short database transactions -nline update?insert?delete

0ormali!ation is promoted $igh volume transactions Transaction recovery is necessary -.A/ )urrent and historical data .ong database transactions %atch update?insert?delete *enormali!ation is promoted .ow volume transactions Transaction recovery is not necessary -O-.T/ is nothing but -n.ine !ransaction /rocessing ,which contains a normalised tables and online data,which have fre(uent insert?updates?delete. %ut -.A/+-nline Analtical /rogramming) contains the history of -.T/ data, which is, non-volatile ,acts as a *ecisions Support System and is used for creating forecasting reports. -O$ey add this point also, Inde' -.T/ : 7&# -.A/ : 5A0G @-I0S -.T/ : 5A0G -.A/ : 7&# *eepa -OIn -ltp;s, *ata )an be insert,update and *elete.7ollows &1 5odeling In -.ap;s *ata cannot be insert,update and *etete. 7ollows *imensional 5odeling.

You might also like