SQLInterview Q&As
SQLInterview Q&As
By
www.questpond.com
Dedication ........................................................................................................................... 9
About the author ....................................................................................................... 10
Features of the book .................................................................................................. 10
Introduction ............................................................................................................... 11
Chapter 1: Database Design ............................................................................................. 19
What is normalization and what are the benefits of the same? ................................. 21
What is 1st normal form, second normal form and 3rd normal form? ..................... 22
What is denormalization? ......................................................................................... 23
What is the difference between OLTP and OLAP system? ...................................... 24
For what kind of systems is normalization better as compared to denormalization? 25
What are Facts, Dimension and Measures tables? .................................................... 25
What are cubes? ........................................................................................................ 25
What is the difference between star schema and snow flake design? ....................... 26
Chapter 2 :- Data types ..................................................................................................... 28
How many bytes does “char” consume as compared to “nchar”? ............................ 29
What is the difference between “char” and “varchar” data types? ........................... 29
What is the use of ‘hierarchyid’ data type in SQL Server? ...................................... 29
If you wish to store financial values which SQL Server data type is more suitable ?
................................................................................................................................... 33
Chapter 3 :- SQL Queries ................................................................................................. 34
Chapter 4:- Page, Extent and Splits .................................................................................. 34
There are 2 physical files MDF and LDF, what are they? ........................................ 34
What are page and extents in SQL Server? .............................................................. 35
How does SQL Server actually store data internally? .............................................. 36
What is page split? .................................................................................................... 37
Chapter 3:- Indexes (Clustered and Non-Clustered) ......................................................... 40
Why do we need Indexes? ........................................................................................ 40
How does index makes your search faster? .............................................................. 40
How does Balance tree make your search faster? ..................................................... 40
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are page splits in indexes ? .............................................................................. 42
So does page split affect performance? .................................................................... 44
So how do we overcome the page split performance issue? ..................................... 44
What exactly is fill factor? ........................................................................................ 44
What are “Table Scan’s” and “Index Scan’s”? ......................................................... 44
(Q) What are the two types of indexes and explain them in detail? ......................... 45
(DB) What is “FillFactor” concept in indexes? ........................................................ 48
(DB) What is the best value for “FillFactor”? .......................................................... 48
(Q) What are “Index statistics”? ............................................................................... 48
(DB) How can we see statistics of an index? ............................................................ 49
(DB) How do you reorganize your index, once you find the problem? ................... 51
(Q) What is Fragmentation?...................................................................................... 52
(DB) How can we measure Fragmentation? ............................................................. 53
(DB) How can we remove the Fragmented spaces? ................................................. 53
(Q) What are the criteria you will look in to while selecting an index? ................... 54
(DB) What is “Index Tuning Wizard”? .................................................................... 54
(Q) How do you see the SQL plan in textual format? .............................................. 62
(DB) What is Nested join, Hash join and Merge join in SQL Query plan? ............. 62
(Q) What joins are good in what situations? ............................................................. 65
(DB) What is RAID and how does it work? ............................................................. 65
Chapter 4:- Stored procedures , Views Cursors , Functions and triggers ........................ 66
What are triggers and what are the different kinds of triggers ? ............................... 66
In what scenarios will you use instead of trigger and after trigger? ......................... 67
What are inserted and deleted tables? ....................................................................... 68
What is a SQL Server view? ..................................................................................... 68
How do you create a view? ....................................................................................... 69
What are the benefits of using a view? ..................................................................... 69
Are SQL Server views updatable? ............................................................................ 70
Chapter 2:- SQL server Data types ................................................................................... 71
Chapter 2:- Constraints (Primary keys, unique keys) ....................................................... 71
Is it possible to insert NULL value in to unique keys ? ............................................ 72
Chapter 4:- MSBI ( SSIS , SSAS and SSRS) ................................................................... 72
Explain Business intelligence and ETL? .................................................................. 72
What is the difference between data warehouse and data mart? .............................. 73
What is the difference between OLTP and OLAP system? ...................................... 73
What is the difference between star schema and snow flake design?....................... 74
What are Facts, Dimension and Measures tables? .................................................... 74
What are Cubes? ....................................................................................................... 74
Can you explain ROLAP, MOLAP and HOLAP? ................................................... 74
Where does SSIS, SSAS and SSRS fits in? .............................................................. 76
Chapter 4:- Business intelligence (SSIS) .......................................................................... 77
What role does SSIS play in BI?............................................................................... 77
What is a package, control flow and data flow? ....................................................... 77
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Can you explain architecture of SSIS (SQL Server integration services)? .............. 78
What are the different locations of storing SSIS packages? ..................................... 80
How can we execute SSIS packages? ....................................................................... 81
What are the different types of variables in SSIS? ................................................... 81
Explain difference between “For loop container” and “Foreach loop container”? .. 82
What are precedence constraints in SSIS? ................................................................ 83
What are sequence containers in SSIS and how do they benefit? ............................ 84
How can we consume web services in SSIS? ........................................................... 86
How to check quality of data using SSIS? ................................................................ 86
What kind of profile requests exists in SSIS? ........................................................... 87
What is the difference between Merge and Merge join transformation? .................. 89
If we have data unsorted will merge and merge join work ? .................................... 89
How can you send a single data source output to multiple SSI controls? ................ 89
You have millions of records in production, you want to sample some data to test a
SSIS package ? .......................................................................................................... 90
What is the use of SCD ? .......................................................................................... 90
Using SSIS how can we standardize “Indian”,”India” and “Ind” to “Ind”? ............. 90
How can we convert “string” to “int” data type in SSIS ?........................................ 90
What is the use of “Audit” component ? .................................................................. 91
Chapter 5:- Business intelligence (SSAS) ........................................................................ 91
How can we apply scale-out architecture for SQL Server Analysis Services? ........ 92
How do you create cubes SSAS SQL Server Analysis Services? ............................ 93
You want your cube to support localization ? .......................................................... 94
What kind of tables will go in fact and dimension tables ? ...................................... 94
In what kind of scenario will you use a KPI ? .......................................................... 94
How can you create a pre-calculated measure in SSAS ? ........................................ 94
Chapter 6:- Business intelligence (SSRS)......................................................................... 94
Can you explain SSRS architecture? ........................................................................ 94
Chapter 2:- SQL Server 2012 ........................................................................................... 95
What are the new features which are added in SQL Server 2012? ........................... 95
What are column store indexes? ............................................................................... 95
(DB) Can we have a different collation for database and table? ............................ 108
Chapter 2: SQL ............................................................................................................. 108
(Q) Revisiting basic syntax of SQL? ...................................................................... 109
(Q) What are “GRANT” and “REVOKE’ statements? .......................................... 110
(Q) What is Cascade and Restrict in DROP table SQL? ........................................ 110
(Q) How to import table using “INSERT” statement? ........................................... 110
(Q) What is a DDL, DML and DCL concept in RDBMS world? .......................... 110
(Q) What are different types of joins in SQL? ........................................................ 110
(Q) What is “CROSS JOIN”? ................................................................................. 111
(Q) You want to select the first record in a given set of rows? .............................. 111
(Q) How do you sort in SQL? ................................................................................. 111
(Q) How do you select unique rows using SQL? ................................................... 112
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) Can you name some aggregate function is SQL Server? ................................. 112
(Q) What is the default “SORT” order for a SQL? ................................................. 112
(Q) What is a self-join? ........................................................................................... 112
What is the difference between DELETE and TRUNCATE? ................................ 112
(Q) Select addresses which are between ‘1/1/2004’ and ‘1/4/2004’?..................... 113
(Q) What are Wildcard operators in SQL Server? .................................................. 113
(Q) What is the difference between “UNION” and “UNION ALL”? .................... 114
(Q) What are cursors and what are the situations you will use them? .................... 116
(Q) What are the steps to create a cursor? .............................................................. 116
(Q) What are the different Cursor Types? .............................................................. 117
(Q) What are “Global” and “Local” cursors? ......................................................... 119
(Q) What is “Group by” clause? ............................................................................. 119
(Q) What is ROLLUP? ........................................................................................... 120
(Q) What is CUBE? ................................................................................................ 122
(Q) What is the difference between “HAVING” and “WHERE” clause? .............. 122
(Q) What is “COMPUTE” clause in SQL?............................................................. 123
(Q) What is “WITH TIES” clause in SQL? ............................................................ 123
(Q) What does “SET ROWCOUNT” syntax achieves? ......................................... 125
What are Sub-Queries ? .......................................................................................... 125
What are co-related queries? ................................................................................... 126
What is the difference between co-related query and sub query? .......................... 127
Can you explain Coalesce in SQL Server ? ............................................................ 128
What is CTE ( Common table expression)? ........................................................... 129
Can we use CTE multiple times in a single execution ? ......................................... 129
Can you give some real time examples where CTE is useful ? .............................. 130
How to delete duplicate records which does not have primary key ? ..................... 130
Temp variables VS Temp tables ............................................................................. 132
(Q) What is “ALL” and “ANY” operator? ............................................................. 133
(Q) What is a “CASE” statement in SQL? ............................................................. 133
(Q) What does COLLATE Keyword in SQL signify? ........................................... 133
(Q) What is TRY/CATCH block in T-SQL? .......................................................... 133
(Q) What is PIVOT feature in SQL Server? ........................................................... 134
(Q) What is UNPIVOT? ......................................................................................... 135
(Q) What are RANKING functions? ...................................................................... 135
(Q) What is ROW_NUMBER()? ............................................................................ 135
(Q) What is RANK()? ............................................................................................. 135
(Q) What is DENSE_RANK()? .............................................................................. 136
(Q) What is NTILE()? ............................................................................................. 136
(DB) What is SQl injection? ................................................................................... 137
Chapter 3: .NET Integration ........................................................................................ 137
(Q) What are steps to load a .NET code in SQL SERVER 2005?.......................... 137
(Q) How can we drop an assembly from SQL SERVER?...................................... 138
(Q) Are changes made to assembly updated automatically in database? ............... 138
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) Why do we need to drop assembly for updating changes? .............................. 138
(Q) How to see assemblies loaded in SQL Server? ................................................ 138
(Q) I want to see which files are linked with which assemblies? ........................... 138
(Q) Does .NET CLR and SQL SERVER run in different process?........................ 139
(Q) Does .NET controls SQL SERVER or is it vice-versa? ................................... 139
(Q) Is SQLCLR configured by default?.................................................................. 140
(Q) How to configure CLR for SQL SERVER? ..................................................... 140
(Q) Is .NET feature loaded by default in SQL Server?........................................... 141
(Q) How does SQL Server control .NET run-time? ............................................... 141
(Q) In previous versions of .NET it was done via COM interface
“ICorRuntimeHost”. 141
In .NET 2.0 it is done by “ICLRRuntimeHost”. ..................................................... 141
(Q) What is a “SAND BOX” in SQL Server 2005? ............................................... 141
(Q) What is an application domain? ....................................................................... 142
(Q) How is .NET Appdomain allocated in SQL SERVER 2005? .......................... 143
(Q) What is Syntax for creating a new assembly in SQL Server 2005? ................. 143
(Q) Do Assemblies loaded in database need actual .NET DLL? ............................ 143
(Q) You have an assembly, which is dependent on other assemblies; will SQL
Server load the dependent assemblies? ................................................................... 144
(Q) Does SQL Server handle unmanaged resources? ............................................. 144
(Q) What is Multi-tasking? ..................................................................................... 144
(Q) What is Multi-threading?.................................................................................. 144
(Q) What is a Thread? ............................................................................................. 144
(Q) Can we have multiple threads in one App domain? ......................................... 144
(Q) What is Non-preemptive threading?................................................................. 144
(Q) What is pre-emptive threading? ....................................................................... 145
(Q) Can you explain threading model in SQL Server? ........................................... 145
(Q) How does .NET and SQL Server thread work? ............................................... 145
(Q) How is exception in SQLCLR code handled?.................................................. 145
(Q) Are all .NET libraries allowed in SQL Server?................................................ 145
(Q) What is “Hostprotectionattribute” in SQL Server 2005? ................................. 146
(Q) How many types of permission level are there for an assembly? .................... 146
(Q) In order that an assembly gets loaded in SQL Server what type of checks are
done? ....................................................................................................................... 146
(Q) Can you name system tables for .NET assemblies? ......................................... 147
(Q) Are two version of same assembly allowed in SQL Server? ........................... 148
(Q) How are changes made in assembly replicated? .............................................. 148
(Q) In one of the projects following steps where done, will it work? .................... 149
(Q) What does Alter assembly with unchecked data signify? ................................ 149
(Q) How do I drop an assembly? ............................................................................ 149
(Q) Can we create SQLCLR using .NET framework 1.0? ..................................... 150
(Q) While creating .NET UDF what checks should be done .................................. 150
(Q) How do you define a function from the .NET assembly? ................................ 150
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) Can you compare between T-SQL and SQLCLR? .......................................... 150
(Q) With respect to .NET is SQL SERVER case sensitive?................................... 151
(Q) Does case sensitive rule apply for VB.NET? ................................................... 151
(Q) Can nested classes be accessed in T-SQL? ...................................................... 151
(Q) Can we have SQLCLR procedure input as array? ........................................... 151
(Q) Can object data type be used in SQLCLR? ...................................................... 151
(Q) How is precision handled for decimal data types in .NET? ............................. 152
(Q) How do we define INPUT and OUTPUT parameters in SQLCLR? ............... 152
(Q) Is it good to use .NET data types in SQLCLR? ............................................... 153
(Q) How to move values from SQL to .NET data types? ....................................... 153
(Q) What is SQLContext? ....................................................................................... 153
(Q) Can you explain essential steps to deploy SQLCLR? ...................................... 154
(Q) How do create function in SQL Server using .NET? ....................................... 158
(Q) How do we create trigger using .NET? ............................................................ 158
(Q) How to create User Define Functions using .NET? ......................................... 159
(Q) How to create aggregates using .NET? ............................................................ 159
(Q) What is Asynchronous support in ADO.NET? ................................................ 159
(Q) What is MARS support in ADO.NET? ............................................................ 160
(Q) What is SQLbulkcopy object in ADO.NET? ................................................... 160
(Q) How to select range of rows using ADO.NET? ............................................... 160
(Q) If we have multiple AFTER Triggers on table how can we define the sequence
of the triggers. ......................................................................................................... 161
(Q) How can you raise custom errors from stored procedure? ............................... 161
Chapter 6: Service Broker ............................................................................................... 162
(Q) What do we need Queues? ............................................................................... 162
(Q) What is “Asynchronous” communication? ...................................................... 162
(Q) What is SQL Server Service broker? ............................................................... 163
(Q) What are the essential components of SQL Server Service broker? ................ 163
(Q) What is the main purpose of having Conversation Group?.............................. 163
(Q) How to implement Service Broker? ................................................................. 164
(Q) How do we encrypt data between Dialogs? ..................................................... 168
(Q) What is XML? .................................................................................................. 168
(Q) What is the version information in XML? ....................................................... 169
(Q) What is ROOT element in XML? .................................................................... 169
(Q) If XML does not have closing tag will it work?............................................... 169
(Q) Is XML case sensitive?..................................................................................... 169
(Q) What is the difference between XML and HTML? ......................................... 169
(Q) Is XML meant to replace HTML? .................................................................... 169
(Q) Can you explain why your project needed XML? ........................................... 169
(Q) What is DTD (Document Type definition)?..................................................... 170
(Q) What is well formed XML?.............................................................................. 170
(Q) What is a valid XML? ...................................................................................... 170
(Q) What is CDATA section in XML? ................................................................... 170
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is CSS? .................................................................................................... 170
(Q) What is XSL? ................................................................................................... 170
(Q) What is Element and attributes in XML? ......................................................... 170
(Q) Can we define a column as XML? ................................................................... 170
(Q) How do we specify the XML data type as typed or untyped? ......................... 171
(Q) How can we create the XSD schema? .............................................................. 171
(Q) How do I insert in to a table that has XSD schema attached to it? .................. 172
(Q) What is maximum size for XML data type? .................................................... 173
(Q) What is Xquery? ............................................................................................... 173
(Q) What are XML indexes?................................................................................... 173
(Q) What are secondary XML indexes? ................................................................. 174
(Q) What is FOR XML in SQL Server? ................................................................. 174
(Q) Can I use FOR XML to generate SCHEMA of a table and how?.................... 174
(Q) What is the OPENXML statement in SQL Server? ......................................... 174
(Q) I have huge XML file which we want to load in database? ............................. 174
(Q) How to call stored procedure using HTTP SOAP? .......................................... 174
(Q) What is XMLA? ............................................................................................... 175
Chapter 8: Data Warehousing / Data Mining ................................................................. 175
(Q) What is “Data Warehousing”? ......................................................................... 175
(Q) What are Data Marts? ....................................................................................... 175
(Q) What are Fact tables and Dimension Tables? .................................................. 176
(DB)What is Snow Flake Schema design in database? .......................................... 178
(DB) What is ETL process in Data warehousing? .................................................. 179
(DB) How can we do ETL process in SQL Server? ............................................... 179
(Q) What is “Data mining”? ................................................................................... 180
(Q) Compare “Data mining” and “Data Warehousing”? ........................................ 180
(Q) What is BCP? ................................................................................................... 181
(Q) How can we import and export using BCP utility? .......................................... 182
(Q) During BCP we need to change the field position or eliminate some fields how
can we achieve this?................................................................................................ 183
(Q) What is Bulk Insert? ......................................................................................... 184
(Q) What is DTS? ................................................................................................... 186
(DB) Can you brief about the Data warehouse project you worked on? ................ 187
(Q) What is an OLTP (Online Transaction Processing) System?........................... 187
(Q) What is an OLAP (On-line Analytical processing) system? ............................ 187
(Q) What is Conceptual, Logical and Physical model? .......................................... 187
(DB) What is Data purging? ................................................................................... 188
(Q) What is Analysis Services? .............................................................................. 188
(DB) What are CUBES? ......................................................................................... 188
(DB) What are the primary ways to store data in OLAP? ...................................... 188
(DB) What is META DATA information in Data warehousing projects? ............. 189
(DB) What is multi-dimensional analysis? ............................................................. 189
(DB) What is MDX? ............................................................................................... 190
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) How did you plan your Data warehouse project? .......................................... 191
(Q) What are different deliverables according to phases? ...................................... 193
(DB) Can you explain how analysis service works? .............................................. 194
(Q) What are the different problems that “Data mining” can solve? ...................... 206
(Q) What are different stages of “Data mining”? ................................................... 207
(DB) What is Discrete and Continuous data in Data mining world? ...................... 209
(DB) What is MODEL is Data mining world? ....................................................... 209
DB) How are models actually derived? .................................................................. 210
(DB) What is a Decision Tree Algorithm? ............................................................. 210
(DB) Can decision tree be implemented using SQL? ............................................. 212
(DB) What is Naïve Bayes Algorithm? .................................................................. 212
(DB) Explain clustering algorithm? ........................................................................ 213
(DB) Explain in detail Neural Networks? ............................................................... 213
(DB) What is Back propagation in Neural Networks? ........................................... 216
(DB) What is Time Series algorithm in data mining? ............................................ 216
(DB) Explain Association algorithm in Data mining? ............................................ 217
(DB) What is Sequence clustering algorithm?........................................................ 217
(DB) What are algorithms provided by Microsoft in SQL Server? ........................ 217
(DB) How does data mining and data warehousing work together? ...................... 218
(Q) What is XMLA? ............................................................................................... 220
(Q) What is Discover and Execute in XMLA? ....................................................... 220
Chapter 9: Integration Services / DTS ............................................................................ 220
(Q) What is Integration Services import / export wizard? ...................................... 220
(Q) What are prime components in Integration Services? ...................................... 224
(Q) How can we develop a DTS project in Integration Services? .......................... 226
Chapter 10: Replication .................................................................................................. 237
(Q) Whats the best way to update data between SQL Servers? .............................. 237
(Q) What are the scenarios you will need multiple databases with schema? ......... 237
(DB) How will you plan your replication? ............................................................. 238
(Q) What are publisher, distributor and subscriber in “Replication”? .................... 239
(Q) What is “Push” and “Pull” subscription? ......................................................... 239
(DB) Can a publication support push and pull at one time? ................................... 240
(Q) What are different models / types of replication? ........................................... 240
(Q) What is Snapshot replication? .......................................................................... 240
(Q) What are the advantages and disadvantages of using Snapshot replication? ... 240
(Q) What type of data will qualify for “Snapshot replication”? ............................. 240
(Q) What is the actual location where the distributor runs? ................................... 241
(Q) Can you explain in detail how exactly “Snapshot Replication” works? .......... 241
(Q) What is merge replication? ............................................................................... 241
(Q) How does merge replication works? ................................................................ 241
(Q) What are advantages and disadvantages of Merge replication? ....................... 242
(Q) What is conflict resolution in Merge replication? ............................................ 242
(Q) What is a transactional replication?.................................................................. 243
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) Can you explain in detail how transactional replication works? ...................... 243
(Q) What are data type concerns during replications? ............................................ 243
Chapter 11: Reporting Services ...................................................................................... 248
(Q) Can you explain how can we make a simple report in reporting services?...... 248
(Q) How do I specify stored procedures in Reporting Services?............................ 254
(Q) What is the architecture for “Reporting Services “? ........................................ 255
Chapter 13: Transaction and Locks ................................................................................ 256
(Q) What is a “Database Transactions “? ............................................................... 256
(Q) What is ACID? ................................................................................................. 257
(Q) What is “Begin Trans”, “Commit Tran”, “Rollback Tran” and “SaveTran”? . 257
(DB) What are “Checkpoint’s” in SQL Server? ..................................................... 258
(DB) What are “Implicit Transactions”? ................................................................ 259
(DB) Is it good to use “Implicit Transactions”? ..................................................... 259
(Q) What is Concurrency? ...................................................................................... 259
(Q) How can we solve concurrency problems? ...................................................... 259
(Q) What kind of problems occurs if we do not implement proper locking strategy?
................................................................................................................................. 260
(Q) What are “Dirty reads”? ................................................................................... 260
(Q) What are “Unrepeatable reads”? ...................................................................... 261
(Q) What are “Phantom rows”? .............................................................................. 262
(Q) What are “Lost Updates”? ................................................................................ 264
(Q) What are different levels of granularity of locking resources? ........................ 265
(Q) What are different types of Locks in SQL Server? .......................................... 265
(Q) What are different Isolation levels in SQL Server? ......................................... 267
(Q) What are different types of Isolation levels in SQL Server? ............................ 268
(Q) If you are using COM+ what “Isolation” level is set by default? .................... 268
(Q) What are “Lock” hints? .................................................................................... 269
(Q) What is a “Deadlock”? ..................................................................................... 269
(Q) What are the steps you can take to avoid “Deadlocks”? .................................. 269
(DB) How can I know what locks are running on which resource? ....................... 270
What is the use of SQL Server governor? .............................................................. 270
How to combine table row in to a single column / variable ?................................. 271
What is hashing? ..................................................................................................... 271
What is CDC (Change data capture) in SQL Server? ............................................. 272
How to enable CDC on SQL Server ? .................................................................... 272
How can we know in CDC what kind of operations have been done on a record? 273
Will CDC work if SQL Server Agent is not running ? ........................................... 274
Dedication
This book is dedicated to my kid Sanjana, whose dad’s playtime has been stolen and given to this
book. I am thankful to my wife for constantly encouraging me and to BPB Publication to give
newcomer a platform to perform. Finally, at the top of all thanks to two old eyes my mom and
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
dad for always blessing me. I am blessed to have Raju as my brother who always keeps my
momentum moving on.
I am grateful to Bhavnesh Asar who initially conceptualized the idea I believe concept thinking is
more important than execution. Tons of thanks to my reviewers whose feedback provided an
essential tool to improve my writing capabilities.
Just wanted to point out Miss Kadambari. S. Kadam took all the pain to review for the left outs
with out which this book would have never seen the quality light.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Reporting and Analysis services, which can really surprise developers during interviews,
are also dealt with great care.
• A complete chapter on ADO.NET makes it stronger from a programmer aspect. In
addition, new ADO.NET features are also highlighted which can be pain points for the
new features released with SQL Server.
• Must for developers who are looking to crack SQL Server interview for DBA position or
programmer position.
• Must for fresher’s who want to avoid some unnecessary pitfall during interview.
• Every answer is precise and to the point rather than hitting around the bush. Some
questions are answered to greater detail with practical implementation in mind.
• Every question is classified in DB and NON-DB level. DB level question are mostly for
guys who are looking for high profile DBA level jobs. All questions other than DB level
are NON-DB level, which is must for every programmer to know.
• Tips and tricks for interview, resume making and salary negotiation section takes this
book to a greater height.
Introduction
When my previous book ".NET Interview Questions" reached the readers, the only voice heared
was more “SQL Server”. Ok guys we have heard it louder and clearer, so here is my complete
book on SQL Server: - “SQL Server Interview Questions”. However, there is a second stronger
reason for writing this book, which stands taller than the readers demand and that is SQL Server
itself. Almost 90 % projects in software industry need databases or persistent data in some or
other form. When it comes to .NET persisting data SQL Server is the most preferred database to
do it. There are projects, which use ORACLE, DB2 and other database product, but SQL Server
still has the major market chunk when language is .NET and especially operating system is
windows. I treat this great relationship between .NET, SQL Server and Windows OS as a family
relationship.
In my previous book, we had only one chapter, which was dedicated to SQL Server, which is
complete injustice to this beautiful product.
So why an interview question book on SQL Server? If you look at any .NET interview conducted
in your premises both parties (Employer and Candidate) pay no attention to SQL Server even
though when it is such an important part of development project. They will go talking about stars
(OOP, AOP, Design patterns, MVC patterns, Microsoft Application blocks, Project Management
etc.) but on database side, there would be rare questions. I am not saying these things are not
important but if you see in development or maintenance majority time, you will be either in your
IDE or in SQL Server.
Secondly many candidates go really as heroes when answering questions of OOP , AOP , Design
patterns , architecture , remoting etc etc but when it comes to simple basic question on SQL
Server like SQL , indexes ( forget DBA level questions) they are completely out of track.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Third very important thing IT is changing people expect more out of less. That means they expect
a programmer should be architect, coder, tester and yes and yes a DBA also. For mission critical
data there will always be a separate position for a DBA. Now many interviewers expect
programmers to also do a job of DBA, Data warehousing etc. This is the major place where
developers lack during facing these kinds of interview.
Therefore, this book will make you walk through those surprising questions, which can sprang
from SQL Server aspect. I have tried to not go too deep, as that will defeat the complete purpose
of an Interview Question book. I think that an interview book should make you run through those
surprising question and make you prepare in a small duration (probably with a night or so). I hope
this book really points those pitfalls that can come during SQL Server Interview’s.
I hope this book takes you to a better height and gives you extra confidence boost during
interviews. Best of Luck and Happy Job-Hunting.............
These rating are given by Author and can vary according to companies
and individuals.
Compared to my previous book “.NET Interview Questions” which had three levels (Basic,
Intermediate and Advanced) this book has only two levels (DBA and NON-DBA) because of the
subject. While reading you can come across section marked as “Note” , which highlight special
points of that section. You will also come across tags like “TWIST”, which is nothing , but
another way of asking the same question, for instance “What is replication?” and “How do I
move data between two SQL Server database?” , point to the same answer.
All questions with DBA level are marked with (DB) tag. Questions, which do not have tags, are
NON-DBA levels. Every developer should have a know how of all NON-DBA levels question.
But for DBA guys every question is important. For instance if you are going for a developer
position and you flunk in simple ADO.NET question you know the result. Vice versa if you are
going for a DBA position and you cannot answer basic query optimization questions probably,
you will never reach the HR round.
So the best way to read this book is read the question and judge yourself do you think you will be
asked these types of questions? For instance, many times you know you will be only asked about
data warehousing and rather than hitting the bush around you would like to target that section
more. In addition, many a times, you know your weakest area and you would only like to brush
up those sections. You can say this book is not a book that has to be read from start to end you
can start from a chapter or question and when you think you are ok close it.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
looking for a DBA position you will be asked around 20% ADO.NET questions and 80%
questions on query optimization, profiler, replication, data warehousing, data mining and others.
Note: - In small scale software house and mid scale software companies
there are chances where they expect a developer to a job of programming
, DBA job , data mining and everything. But in big companies you can
easily see the difference where DBA job are specifically done by
specialist of SQL Server rather than developers. But now a days some
big companies believe in a developer doing multitask jobs to remove
dependencies on a resource.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Above is a figure of a general hierarchy across most IT companies (Well not always but I hope
most of the time). Because of inconsistent HR, way of working you will see difference between
companies.
Note: - There are many small and medium software companies which do not
follow this hierarchy and they have there own ADHOC way of defining
positions in the company.
So why there is a need of hierarchy in an interview?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Resume Preparation Guidelines
First impression the last impression
Before even the interviewer meets you, he will first meet your resume. Interviewer looking at
your resume is almost a 20% interview happening with out you knowing it. I was always a bad
guy when it comes to resume preparation. But when I looked at my friends resume they where
gorgeous. Now that I am writing series of book on interviews, I thought this would be a good
point to put in. You can happily skip it if you are confident about your resume. There is no hard
and fast rule that you have to follow the same pattern but just see if these all checklist are
attended.
• Use plain text when you are sending resumes through email. For instance, you sent your
resume using Microsoft word and what if the interviewer is using Linux he will never be
able to read your resume. You cannot be sure both wise, you sent your resume in Word
2000 and the guy has Word 97…uuhhh.
• Attach a covering letter it really impresses and makes you look traditionally formal. Yes
even if you are sending your CV through e-mail, send a covering letter.
Checklist of content you should have in your resume:-
• Start with an objective or summary, for instance, “Working as a Senior Database
administrator for more than 4 years. Implemented quality web based application.
Followed the industry’s best practices and adhered and implemented processes, which
enhanced the quality of technical delivery. Pledged to deliver the best technical solutions
to the industry.”
• Specify your Core strengths at the start of the resume by which the interviewer can make
a quick decision are you eligible for the position. For example :-
o Looked after data mining and data warehousing department independently.
Played a major role in query optimization.
o Worked extensively in database design and ER diagram implementation.
o Well versed with CMMI process and followed it extensively in projects.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
give an overview to the interviewer what type of companies you have associated your
self.
Now its time to mention all your projects you have worked till now. Best is to start in descending
order that is from your current project and go backwards. For every project try to put these
things:-
• Project Name / Client name (It is sometimes unethical to mention clients name; I leave it
to the readers).
• Number of team members.
• Time span of the project.
• Tools, language, RDBMS and technology used to complete the project.
• Brief summary of the project.
Senior people who have huge experience will tend to increase there CV with putting in summary
for all project. Best for them is to just put down description of the first three projects in
descending manner and rest they can say verbally during interview. I have seen CV above 15
pages… I doubt who can read it.
• Finally comes your education and personal details.
• Trying for onsite, do not forget to mention your passport number.
• Some guys tend to make there CV large and huge. I think an optimal size should be not
more than 4 to 5 pages.
• Do not mention your salary in CV. You can talk about it during interview with HR or the
interviewer.
• When you are writing your summary for project make it effective by using verbs like
managed a team of 5 members, architected the project from start to finish etc. It brings
huge weight.
• This is essential very essential take 4 to 5 Xerox copies of your resume you will need it
now and then.
• Just in case, take at least 2 passport photos with you. You can escape it but many times
you will need it.
• Carry you are all current office documents specially your salary slips and joining letter.
Salary Negotiation
Ok that is what we all do it for money… not every one right. This is probably the weakest area
for techno savvy guys. They are not good negotiators. I have seen so many guys at the first
instance they will smile say “NEGOTIABLE SIR”. So here are some points:-
• Do a study of what‘s the salary trend? For instance, have some kind of baseline. For
example, what is the salary trend on number of year of experience? Discuss this with
your friends out.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Do not mention your expected salary on the resume?
• Let the employer first make the salary offer. Try to delay the salary discussion until the
end.
• If they say, what do you expect? , come with a figure with a little higher end and say
negotiable. Remember never say negotiable on something which you have aimed, HR
guys will always bring it down. So negotiate on AIMED SALARY + some thing extra.
• The normal trend is that they look at your current salary and add a little it so that they can
pull you in. Do your home work my salary is this much and I expect this much so
whatever it is now I will not come below this.
• Do not be harsh during salary negotiations.
• It is good to aim high. For instance, I want 1 billion dollars / month but at the same time
be realistic.
• Some companies have those hidden cost attached in salary clarify that rather to be
surprised at the first salary package.
• Many of the companies add extra performance compensation in your basic that can be
surprising at times. So have a detail break down. Best is to discuss on hand salary rather
than NET.
• Talk with the employer in what frequency does the hike happen.
• Take everything in writing , go back to your house and have a look once with a cool head
is the offer worth it of what your current employer is giving.
• Do not forget once you have job in hand you can come back to your current employer for
negotiation so keep that thing in mind.
• Remember the worst part is cribbing after joining the company that your colleague is
getting this much. So be careful while interview negotiations or be sportive to be a good
negotiator in the next interview.
• One very important thing the best negotiation ground is not the new company where you
are going but the old company, which you are leaving. So once you have offer on hand
get back to your old employee, show them the offer, and then make your next move. It is
my experience that negotiating with the old employer is easy than with the new
one….Frankly if approached properly rarely any one will say no. Just do not be
aggressive or egoistic that you have an offer on hand.
Top of all some time some things are worth above money: - JOB SATISFACTION. So whatever
you negotiate if you think you can get JOB SATISFACTION aspect on higher grounds go for it. I
think its worth more than money.
Points to remember
• One of the first questions asked during interview is “Can you say something about
yourself”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Can you describe about your self and what you have achieved till now?
• Why you want to leave the current company?
• Where do you see yourself after three years?
• What are your positive and negative points?
• How much do you rate yourself in .NET and SQL Server in one out of ten?
• Are you looking for onsite opportunities? (Be careful do not show your desperation of
abroad journeys)
• Why have you changed so many jobs? (Prepare a decent answer do not blame companies
and individuals for your frequent change).
• Never talk for more than 1 minute straight during interview.
• Have you worked with previous version of SQL Server?
• Would you be interested in a full time Database administrator job?
• Do not mention client name’s in resume. If asked say that it’s confidential which brings
ahead qualities like honesty
• When you make your resume keep your recent projects at the top.
• Find out what the employer is looking for by asking him questions at the start of interview
and best is before going to interview. Example if a company has projects on server
products employer will be looking for BizTalk, CS CMS experts.
• Can you give brief about your family background?
• As you are fresher, do you think you can really do this job?
• Have you heard about our company? Say five points about our company? Just read at least
once what company you are going for?
• Can you describe your best project you have worked with?
• Do you work on Saturday and Sunday?
• Which is the biggest team size you have worked with?
• Can you describe your current project you have worked with?
• How much time will you need to join our organization? What is notice period for your
current company?
• What certifications have you cleared?
• Do you have passport size photos, last year mark sheet, previous companies employment
letter, last months salary slip, passport and other necessary documents.
• What is the most important thing that motivates you?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Why you want to leave the previous organization?
• Which type of job gives you greatest satisfaction?
• What is the type of environment you are looking for?
• Do you have experience in project management?
• Do you like to work as a team or as individual?
• Describe your best project manager you have worked with?
• Why should I hire you?
• Have you been ever fired or forced to resign?
• Can you explain some important points that you have learnt from your past project
experiences?
• Have you gone through some unsuccessful projects, if yes can you explain why did the
project fail?
• Will you be comfortable with location shift? If you have personal problems say no right at
the first stage.... or else within two months you have to read my book again.
• Do you work during late nights? Best answer if there is project deadline yes. Do not show
that it is your culture to work during nights.
• Any special achievements in your life till now...tell your best project which you have done
best in your career.
• Any plans of opening your own software company...Beware do not start pouring your bill
gate’s dream to him...can create a wrong impression.
This is an interesting topic and yes it’s the most discussed one when it comes to SQL
Server interviews.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
In SQL Server interviews database design conversation goes in to two wide discussions
one is Normalization and the other is de-normalization.
So in normalization section interviewer can ask you questions around the 3 normal forms
i.e. 1st normal form, second normal form and 3rd normal form. This looks to be a very
simple question but you would be surprised to know even veteran database designers
forget the definition thus giving an impression to the interviewer that they do not know
database designing.
Irrespective you are senior or a junior everyone expects you to answer all the 3 normal
forms. There are exceptions as well where interviewer has asked about 4th and 5th normal
form as well but you can excuse those if you wish. I personally think that’s too much to
ask for.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
When it comes to database designing technique interviewer can query the other side of
the coin i.e. de-normalization. One of the important questions which interviewers can
query is around Difference between de-normalization and normalization. The expectation
from most of the interviewers when answering the differences is from the perspective of
performance and type of application.
As people discuss ahead there is high possibility of getting in to OLTP and OLAP
discussions which can further trigger discussions around database designing techniques
Star and Snow flake schema.
It’s a database design technique to avoid repetitive data and maintain integrity of the
data.
Note: - After that sweet one line, I can bet the interviewer will ask
you to clarify more on those two words repetitive and integrity word.
Let’s first start with repetitive. Let’s say you have a simple table of user as shown below.
You can see how the city is repeated again and again. So you would like to improve on
this.
So to solve the problem, very simple you apply normalization. You split that repetitive
data in to separate table (city master) and put a reference foreign key as shown in the
below figure.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Now the second word “Data integrity”. “Data integrity” means how much accurate and
consistent your data is.
For instance in the below figure you can see how the name of the country is inconsistent.
“Ind” and “India” means the same thing, “USA” and “United States” means the same the
thing. This kind of inconsistency leads to more complication and problems in
maintenance.
One of the most important thing in technical interviews like SQL , .NET
, Java etc is that you need to use proper technical vocabulary. For
example in the above answer the word “Data integrity” attracts the
interviewer than the word “Inaccurate”. Right technical vocabulary
will make you shine as compared to people who use plain English.
What is 1st normal form, second normal form and 3rd normal form?
• First normal form is all about breaking data in to smaller logical pieces.
• In Second normal form all column data should depend fully on the key and not
partially.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• In Third normal form no column should depend on other columns.
For in-depth explanation you can also see the video :- Can you explain
First, Second & Third Normal forms in database? provided in the DVD.
What is denormalization?
But what if the we want to analyze historical data, do forecasting, do heavy calculations
etc. For these kinds of requirements normalization design is not suited. Forecasting and
analyzing are heavy operations and with historical data it becomes heavier. If your
design is following normalization then your SQL needs to pull from different tables
making the select process slow.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Normalization is all about reducing redundancy while denormalization is all about
increasing redundancy and minimizing number of tables.
In the above figure you can see at the left hand side we have a normalized design while in
the right hand side we have denormalized design. A query on the right side denormalized
table will be faster as compared to the left hand side because there are more tables
involved.
You will use denormalized design when your application is more meant to do reporting,
forecasting and analyzing historical data where read performance is more important.
Both OLTP and OLAP are types of IT systems.OLTP (Online transaction processing
system) deals with transactions ( insert, update , delete and simple search ) while OLAP
(Online analytical processing) deals with analyzing historical data, forecasting etc.
OLTP OLAP
Design Normalized. (1st normal form, Denormalized (Dimension
second normal form and third and Fact design).
normal form).
Source Daily transactions. OLTP.
Motive Faster insert, updates, deletes and Faster analysis and search
improve data quality by reducing by combining tables.
redundancy.
SQL complexity Simple and Medium. Highly complex due to
analysis and forecasting.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
For what kind of systems is normalization better as compared to
denormalization?
The most important goal of OLAP application is analysis on data. The most important
thing in any analysis are “NUMBERS”. So with OLAP application we would like to get
those numbers , forecast them, analyze them for better business growth. These numbers
can be total sales, number of customers etc.
These numbers are termed as “Measures” and measures are mostly stored in “Fact”
tables.
“Dimension” describes what these measures actually mean. For example in the below
table you can see we have two measures 3000 units and 1500 $. One dimension is
“ProductWiseSales” and the other dimension is “AgewiseSalary’.
Dimension Measures
ProductWiseSales 3000 units
AgeWiseSalary 1500 $
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Pants 600 2011 Nepal
Pants 1200 2012 Nepal
So if we change the dimensions and measures in a cube format we would get a better
picture. In other words cube is intersection of multiple measures and their dimensions. If
you visualize in a graphical model below image is how it looks like.
What is the difference between star schema and snow flake design?
Star schema consists of fact and dimension tables. The fact tables have the measures and
dimension tables give more context to the fact tables.
In the below figure “Star design” you can see we have four dimension tables and each
one of them are referencing the fact tables for measure values. The references between
dimension and fact tables are done using simple foreign key relationships.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure: - Star design
Snow flake design is very much similar to star design. The exception is the dimension
table. In snow flake dimension tables are normalized as shown in the below figure “Snow
flake design”. The below design is very much similar to the star design shown previously
but the products table and vendor tables are separate tables.
The relationship is more of a normalized format. So summing in other words Star design
is pure denormalized design while snow flake can have normalized dimension tables.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure: - Snow flake design
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
How many bytes does “char” consume as compared to “nchar”?
“char” data types consumes 1 byte while “nchar” consumes 2 bytes. The reason is
because “char” stores only ASCII characters while “nchar” can store UNICODE
contents.
In case you are new to ASCII and UNICODE. ASCII accommodates 256 characters
i.e.english letters ,punctuations , numbers etc.But if we want to store a Chinese character
or some other language characters then it was difficult to the same with ASCII , that’s
where UNICODE comes in to picture. An ASCII character needs only 1 byte to
represent a character while UNICODE needs to 2 bytes.
“Char” is a fixed length data type. That means if you create a char of 10 length, it always
consumes 10 bytes, irrespective you store 1 character or 10 character. While “varchar” is
a variable length data type. That means if you create a “varchar” of 10 length, it will
consume length equivalent to the number of characters. So if we store 3 characters, it will
only consume 3 bytes.
So if you have data like country code which always consumes 3 characters (USA, IND,
NEP etc) “char” data type is a good choice. If you are not sure about the number of
characters like “Name” of a person, “varchar” is a better fit.
Many times we need to store data which has a deep tree structure hierarchy. Below is a
simple example which represents a sales team organization hierarchy. Now if you want to
get all people who work under “Shaam” you will really need to work hard on both
database design and logic.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
In the past many developers used to create database design using self-reference primary
and foreign key relationship. You can see in the below figure we have a simple table
which stores sales person. Every sales person is marked with a primary key called as
“SalesId”. We have one more column called as “Salesid_fk” which references the
primary key “SalesId”.
Now to establish the tree structure hierarchy we need to enter data in a linked list format.
For instance you can see in the below data entry snapshot. The first row “Shiv” indicates
the top sales person in the hierarchy. Now because “Shiv” is the top sales person in the
hierarchy the “SalesIdFk” value is null. “Raju” and “Shaam” report to “Shiv”, so they
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
have “Salesidfk’ value as “1” which is nothing but primary key value of “Shiv”. Using
this approach we can represent any deep hierarchy.
This approach is great but it needs cryptic DB design and some complicated logic to
process data. For instance if I want to get how many people work under “Shaam” , I
really need to write some complicated recursive logic at the backend.
So here’s a good news, SQL Server has support of hierarchy data type which can
accommodate such complex tree structure.
So first step is to get rid of “SalesIdfk” column and add “SalesHid” column which is of
datatype “Hierarchyid”.
Now the HID (Hierarchy id) data type column uses the below format to represent tree
structure data:-
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• All data in the HID column should start with “/” and end with “/”.
• The top level root is represented by “/”.
• The next level below the root is represented by “/1/”. If you have one more
person on the same level you will enter it has “/2/”.
• If you want to enter one more child below “/1/”, you need to enter “/1/1/”.
Below is a pictorial representation of how HID values map with the tree structure levels.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Now if you want to search who are below “Shaam”. You can fire the below query. The
“IsDescendantOf” function evaluates to true if the records are child’s of that level. This is
easy as compared to creating a custom design and writing logic yourself.
If you wish to store financial values which SQL Server data type is
more suitable ?
Money data type is the most suited data type for storing financial values as they are
accurate ten-thousandth of the unit they represent. In financial figures accuracy matters.
Small cents add up to millions later.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Further money data type has two variations “money” and “smallmoney”. Money has 8
bytes of storage while small money has 4 bytes of storage.
There are 2 physical files MDF and LDF, what are they?
MDF file is the primary database file where the actual user data and schema gets stored.
In other words table data, table structure, stored procedures, views and all SQL Server
objects gets stored in to this file.
LDF file is a transaction log file to store transaction logs. They do not have the actual
user data or schema definition. Every MDF will have associated LDF. While recovering
the database it’s always advisable to keep the LDF so that you do not lose any transaction
information.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are page and extents in SQL Server?
Extent and pages define how data is stored in SQL Server. The actual data is stored in
pages. Each one of these pages are of 8 KB size. You can also visualize pages as the
fundamental unit to store data. In other words your table row data gets stored actually in
pages.
Pages are further grouped in to extent. One extent is collection of eight pages. So the size
of each extent is 64 KB (i.e. 8 pages x 8 KB size).
Finally these extents sum up in to the MDF physical file which you see on your hard disk.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are heap tables ?
Heap tables are those tables which do not have clustered indexes.
SQL Server stores data in 8 KB pages. This 8 KB page is further divided in to 3 sections
Page header, Data row and row offset, see the below figure for visual’s.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
1. Page header stores information about the page like page type, next and previous
page if it’s an index page, free space in the page etc.
2. After the page header data row section follows. This is where you data is actually
stored.
3. Row offset information is stored at the end of the page i.e. after the data row
section. Every data row has a row offset and the size of row offset is 2 bytes per
row. Row offset stores information about how far the row is from the start of the
page.
Putting in simple words the complete page equation comes as shown below.
Page (8 KB/8192 bytes) = Page header (96 bytes) + Actual data (Whatever bytes) + Row
offset (2 bytes per row).
The actual data (table row data) is stored in pages. Pages are of size 8 KB. So on a page
when data exceeds 8 KB SQL Server creates a new page to accommodate this data. This
phenomenon is termed as Page split.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
SELECT in_row_data_page_count FROM
sys.dm_db_partition_stats
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
WHERE object_id = OBJECT_ID('dbo.table1');
Chapter 3:- Indexes (Clustered and Non-Clustered)
When you create an index on a column it creates a “B-Tree” (Balanced-Tree) structure for the
column which results in faster searches.
Its very important to answer to the point , short and sweet. For
example in the above question many people start explaining the b-tree
search which becomes lengthy and boring for the interviewer. See the
interviewer’s interest as well.
When you search data which are not indexed it searches sequentially. For instance in the below
figure you can see if you want to search “50” the search engine has to browse through all the
records sequentially until he gets “50”.
Figure: - Sequential
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure: - B-tree structure
In B-tree data is divided in to root node, non-leaf node and leaf node. With this structure in place
if you want to search “50” the following steps happen:-
• It first searches the root node and compares with the first node i.e. 30. It first checks if 50 is
equal or less than 30, it’s not. So it by passed the complete 30 root node and proceeds to
50 root node.
• It then compares the second root node i.e. 50. Now 50 is equal to 50 so it moves ahead to
non-leaf nodes of 50 i.e. 40 and 50.
• In the non-leaf node again the comparison is done, it bypasses the complete 40 node and
follow’s the non-leaf node 50.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Finally it sequentially travels from 40 to 50 values to find the exact match.
If you see the steps the closely it has only scanned 10 records as comparison to the sequential
scan where it scans 49 records.
Note :- If you are serious about the job do not shy away from drawing
diagrams to communicate better.
Indexes are organized in B-Tree structure divided in to root nodes, intermediate nodes and leaf
nodes. The leaf node of the B-tree actually contains data. The leaf index node is of 8 KB size i.e.
8192 bytes. So if data exceed over 8 KB size it has to create new 8 KB pages to fit in data. This
creation of new page for accommodating new data is termed as page split.
Let me explain you page split in more depth. Let’s consider you have a simple table with two
fields “Id” and “MyData” with data type as “int” and “char(2000)” respectively as shown in the
below figure. “Id” column is clustered indexed.
That means each row is of size 2008 bytes (2000 bytes for “MyData” and 8 bytes for “Id”).
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
So if we have four records the total size will be 8032 bytes (2008 * 4) that leaves 160 bytes free.
Do look at the above image for visual representation.
So if one more new record is added there is no place left to accommodate the new record and the
index page is forced to go for an index page split.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
So does page split affect performance?
If your database is highly insert intensive it can lead to large number of page splits. In
order to perform those page splits your CPU has to drain some power which can lead to
performance issues.
These are ways by which SQL Server searches a record or data in table. In “Table Scan” SQL
Server loops through all the records to get to the destination. For instance if you have 1, 2, 5, 23,
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
63 and 95. If you want to search for 23 it will go through 1, 2 and 5 to reach it. Worst if it wants
to search 95 it will loop through all the records.
While for “Index Scan’s” it uses the “B-TREE” fundamental to get to a record. For “B-TREE”,
refer previous questions.
(Q) What are the two types of indexes and explain them in detail?
Twist: - What is the difference between clustered and non-clustered indexes?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
figure number here), there where leaf level and non-leaf level. Leaf level holds the key, which is
used to identify the record. Moreover, non-leaf level actually point to the leaf level.
In clustered index, the non-leaf level actually points to the actual data.
In Non-Clustered index the leaf nodes point to pointers (they are rowid’s) which then point to
actual data.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.4: - Non-Clustered Index has pointers.
So here is what the main difference is in clustered and non-clustered, in clustered when we reach
the leaf nodes we are on the actual data. In non-clustered indexes, we get a pointer, which then
points to the actual data.
Therefore, after the above fundamentals following are the basic differences between them:-
• Also, note in clustered index actual data as to be sorted in same way as the clustered
indexes are. While in non-clustered indexes as we have pointers, which is logical
arrangement we do need this compulsion.
• So we can have only one clustered index on a table as we can have only one physical
order while we can have more than one non-clustered indexes.
If we make non-clustered index on a table, which has clustered indexes, how does the architecture
change?
The only change is that the leaf node point to clustered index key. Using this clustered index key
can then be used to finally locate the actual data. So the difference is that leaf node has pointers
while in the next half it has clustered keys. So if we create non-clustered index on a table which
has clustered index it tries to use the clustered index.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) What is “FillFactor” concept in indexes?
When SQL Server creates new indexes, the pages are by default full. “FillFactor” is a percentage
value (from 1 – 100) which says how much full your pages will be. By default “FillFactor” value
is zero.
Note: - If you want to create index you can use either the “Create
Index” statement or you can use the GUI.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.6: - Index Details
Note: - Before reading this you should have all the answers of the
pervious section clear. Especially about extent, pages and indexes.
DECLARE
@ID int,
@IndexID int,
@IndexName varchar(128)
-- input your table and index name
SELECT @IndexName = 'AK_Department_Name'
SET @ID = OBJECT_ID('HumanResources.Department')
SELECT @IndexID = IndID
FROM sysindexes
WHERE id = @ID AND name = @IndexName
--run the DBCC command
DBCC SHOWCONTIG (@id, @IndexID)
Just a short note here “DBCC” i.e. “Database consistency checker” is used for checking heath of
lot of entities in SQL Server. Now here we will be using it to see index health. After the
command is run you will see the following output. You can also run “DBCC
SHOWSTATISTICS” to see when was the last time the indexes rebuild.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.7 DBCC SHOWSTATISTICS
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
This is the percentage ratio of Best count / Actual count. Best count is number of extent changes
when everything is perfect. It is like a baseline. Actual count is the actual number of extent
changes on that type of scenario.
Logical Scan Fragmentation
Percentage of out-of-order pages returned from scanning the leaf pages of an index. An out of
order page is one for which the next page indicated is different page than the page pointed to by
the next page pointer in the leaf page. .
Extent Scan Fragmentation
This one is telling us whether an extent is not physically located next to the extent that it is
logically located next to. This just means that the leaf pages of your index are not physically in
order (though they still can be logically), and just what percentage of the extents this problem
pertains to.
Avg. Bytes free per page
This figure tells how many bytes are free per page. If it’s a table with heavy inserts or highly
transactional then more free space per page is desirable, so that it will have less page splits.
If it's just a reporting system then having this closer to zero is good as SQL Server can then read
data with less number of pages.
Avg. Page density (full)
Average page density (as a percentage). It’s is nothing but:-
1 - (Avg. Bytes free per page / 8096)
8096 = one page is equal to 8096 bytes
Note: - Read every of the above sections carefully, incase you are
looking for DBA job you will need the above fundamentals to be very
clear. Normally interviewer will try to shoot questions like “If you
see the fill factor is this much, what will you conclude? , If you see
the scan density this much what will you conclude?
(DB) How do you reorganize your index, once you find the problem?
You can reorganize your index using “DBCC DBREINDEX”. You can either request a particular
index to be re-organized or just re-index the all indexes of the table.
This will re-index your all indexes belonging to “HumanResources.Department”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
DBCC DBREINDEX ([HumanResources.Department],[AK_Department_Name],70)
You can then again run DBCC SHOWCONTIG to see the results.
Now over a period of time some Extent and Pages data undergo some delete. Here is the modified
database scenario. Now one observation you can see is that some page’s are not removed even
when they do not have data. Second If SQL server wants to fetch all “Females” it has to span
across to two extent and multiple pages within them. This is called as “Fragmentation” i.e. to
fetch data you span across lot of pages and extents. This is also termed as “Scattered Data”.
What if the fragmentation is removed, you only have to search in two extent and two pages.
Definitely, this will be faster as we are spanning across less entities
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.11: - Fragmentation removed
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.12: - sp_updatestats in action
• DBCC INDEXFRAG: - This is not the effective way of doing fragmentation it only
does fragmenting on the leaf nodes.
(Q) What are the criteria you will look in to while selecting an index?
Note: - Some answers what I have got for this question.
• I will create index wherever possible.
• I will create clustered index on every table.
That is why DBA’s are always needed.
• How often the field is used for selection criteria. For example in a “Customer” table, you
have “CustomerCode” and “PinCode”. Most of the searches are going to be performed
on “CustomerCode” so it’s a good candidate for indexing rather than using “PinCode”. In
short you can look in to the “WHERE” clauses of SQL to figure out if it’s a right choice
for indexing.
• If the column has higher level of unique values and is used in selection criteria again is a
valid member for creating indexes.
• If “Foreign” key of table is used extensively in joins (Inner, Outer, and Cross) again a
good member for creating indexes.
• If you find the table to be highly transactional (huge insert, update and deletes) probably
not a good entity for creating indexes. Remember the split problems with Indexes.
• You can use the “Index tuning wizard” for index suggestions.
Note: - This book refers to SQL Server 2005, so probably if you have
SQL Server 2000 installed you will get the SQL Profiler in Start –
Programs – Microsoft SQL Server -- Profiler. But in this whole book we
will refer only SQL Server 2005.We will going step by step for this
answer explaining how exactly “Index Tuning Wizard” can be used.
Ok before we define any indexes let’s try to understand what is “Work Load File”. “Work Load
File” is the complete activity that has happened on the server for a specified period of time. All
the activity is entered in to a “.trc” file, which is called as “Trace File”. Later “Index Tuning
Wizard” runs on the “Trace File” and on every query fired it tries to find which columns are valid
candidates for indexes depending on the Indexes.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Following are the step has to use “Index Tuning Wizard”:-
• Create the Trace File using “SQL Profiler”.
• Then use “Database Tuning Advisor” and the “Trace File” for what columns to be
indexed.
Create Trace File
Once you have opened the “SQL Profiler” click on “New Trace”.
It will alert for giving you all trace file details for instance the “Trace Name”, “File where to
save”. After providing, the details click on “Run” button provided below. I have provided the file
name of the trace file as “Testing.trc” file.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.14: - Trace File Details
HUH and the action starts. You will notice that profiler has started tracing queries which are
hitting “SQL Server” and logging all those activities in to the “Testing.trc” file. You also see the
actual SQL and the time when the SQL was fired.
Let the trace run for some but of time. In actually practical environment, I run the trace for almost
two hours in peak to capture the actual load on server. You can stop the trace by clicking on the
red icon given above.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.16: - Stop Trace File.
You can go the folder and see your “.trc” file created. If you try to open it in notepad, you will see
binary data. It can only be opened using the profiler. So now that we have the load file we have to
just say to the advisor hey advisor here’s my problem (trace file) can you suggest me some good
indexes to improve my database performance.
Using Database Tuning Advisor
In order to go to “Database Tuning Advisor” you can go from “Tools” – “Database Tuning
Advisor”.
In order to supply the workload file you have to start a new session in “Database tuning advisor”.
After you have said “New Session”, you have to supply all details for the session. There are two
primary requirements you need to provide to the Session:-
• Session Name
• “Work Load File” or “Table”
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Note either you can create a trace file or you can put it in SQL
Server table while running the profiler).
I have provided my “Testing.trc” file, which was created when I ran the SQL profiler. You can
also filter for which database you need index suggestions. At this moment, I have checked all the
databases. After all the details are filled in you have to click on “Green” icon with the arrow. You
can see the tool tip as “Start analysis” in the image below.
While analyzing the trace file it performs basic four major steps:-
• Submits the configuration information.
• Consumes the Work load data (that can be in format of a file or a database table).
• Start performing analysis on all the SQL executed in the trace file.
• Generates reports based on analysis.
• Finally give the index recommendations.
You can see all the above steps have run successfully which is indicated by “0 Error and 0
Warning”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.20: - Session completed with out Errors
Now its time to see what index recommendations SQL Server has provided us. Also, note it has
included two new tabs after the analysis was done “Recommendations” and “Reports”.
You can see on “AdventureWorks” SQL Server has given me huge recommendations. Example
on “HumanResources.Department” he has told me to create index on
“PK_Department_DepartmentId”.
In case you want to see detail reports you can click on the “Reports” tab and there are wide range
of reports which you can use to analyze how you database is performing on that “Work Load”
file.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.22: - Reports by Advisor
Note: - The whole point of putting this all step by step was that you
have complete understanding of how to do “automatic index decision”
using SQL Server. During interview one of the question’s that is very
sure “How do you increase speed performance of SQL Server? “ and
talking about the “Index Tuning Wizard” can fetch you some decent
points.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.23: - Click here to see execution plan
In bottom windowpane, you will see the complete break up of how your SQL Query will execute.
Following is the way to read it:-
• Data flows from left to right.
• Any execution plan sums to total 100 %. For instance in the below figure it is 18 + 28 + 1
+ 1 + 52. So the highest is taken by Index scan 52 percent. Probably we can look in to
that logic and optimize this query.
• Right most nodes are actually data retrieval nodes. I have shown them with arrows the
two nodes.
• In below figure you can see some arrows are thick and some are thin. More the thickness
more the data is transferred.
• There are three types of join logic nested join, hash join and merge join.
If you move your mouse gently over any execution, strategy you will see a detail breakup of how
that node is distributed.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.25: - Complete Break up estimate
(DB) What is Nested join, Hash join and Merge join in SQL Query
plan?
A join is whenever two inputs are compared to determine and output.
There are three basic types of strategies for this and they are:
• Nested loops join,
• Merge join and
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Hash join.
When a join happens the optimizer determines which of these three algorithms is best to use for
the given problem, however any of the three could be used for any join. All of the costs related to
the join are analyzed the most cost efficient algorithm is picked for use. These are in-memory
loops used by SQL Server.
Nested Join
If you have less data this is the best logic. It has two loops one is the outer and the other is the
inner loop. For every outer loop, its loops through all records in the inner loop. You can see the
two loop inputs given to the logic. The top index scan is the outer loop and bottom index seek is
the inner loop for every outer record.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
matches using the hash table created previously using the build table and does the processing and
gives the output.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 12.29: - Merge Join
Note: - Previously we have discussed about table scan and index scan do
revise it which is also important from the aspect of reading query
plan.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
RAID 5 works by writing parts of data across all drives in the set (it requires at least three
drives). If a drive failed, the entire set would be worthless. To combat this problem, one of the
drives stores a "parity" bit. Think of a math problem, such as 3 + 7 = 10. You can think of the
drives as storing one of the numbers, and the 10 is the parity part. By removing any one of the
numbers, you can get it back by referring to the other two, like this: 3 + X = 10. Of course, losing
more than one could be evil. RAID 5 is optimized for reads.
RAID 10 is a bit of a combination of both types. It does not store a parity bit, so it is fast, but it
duplicates the data on two drives to be safe. You need at least four drives for RAID 10. This type
of RAID is probably the best compromise for a database server.
What are triggers and what are the different kinds of triggers ?
Triggers are special kind of stored procedure. They can be executed after or before data
modification happens on a table. There are two types of triggers “Instead of triggers” and
“After triggers”.
"Instead of triggers" executes prior to data modification while "after trigger" executes
after data modification.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure: - Triggers
In what scenarios will you use instead of trigger and after trigger?
You will use “INSTEAD trigger” to take alternative actions before the update happens.
While “AFTER trigger” is useful when you want to execute trigger logic after the data
has been updated.
• For recording Audit trail where you want new and old values to be inserted in to
audit table.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Updating values after the updation has happened. For instance in a sales table if
number of products and per product is inserted, you can create an after trigger
which will calculate the total sales amount using these two values.
During triggers we sometimes need old and new values. Insert and deleted tables are
temporary tables created by SQL server itself which have new and old values. Inserted
tables have one record for newly added data and deleted table has one record of the old
version of the data.
So for instance let’s say we add a new record called as “Shiv”. The inserted table will
have “Shiv” and deleted table will have nulls because the record did not exist. Now let’s
say some user updates “Shiv” to “Raju”. Then inserted table will have “Raju” and deleted
tables will have “Shiv”.
A view is a virtual table which contains data from different base tables. A base table
means actual physical tables. For instance you can see in the below figure we have a view
which takes data from two base tables “Customer” and “Address”. From customer the
view fetches “Name” field and from “Address” it fetches the phone number field.
In real essence view does not contain data it’s an encapsulated query which fetches data
from multiple tables.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
How do you create a view?
To create a view right click on the “Views” folder and then the write the query for the
view.
Or you can also use the “Create View” statement as shown in the below code snippet.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
There are three big benefits of using a view:-
• Simplifying things i.e. hide complexity: - So if you have a complex SQL rather
than writing them again and again create a view and keep calling the view.
• Security: - Sometimes you would like to give a controlled access to a table. So
you can create a view and provide specific security accordingly. For instance you
want a particular user to access only certain fields, so you can create a view of
those fields and give access on that view. On the actual physical table the user is
not allowed to access.
• Changes: - Due to unavoidable situation we sometimes change field names. Now
if those tables are used in queries, the queries will crash. So what we can do is
create views and use those views in the queries. Now if any field name changes
we just need to change in the view and not across all queries.
You can only update columns of view if they come from the same base table. If your
view is using multiple base tables, you cannot update multiple columns from multiple
base tables.
For instance if you have a view which uses “Customer” and “Address” base table , you
can either update “Customer” base tables columns or “Address” base table columns at
time.But not both of them in one go.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
In case you trying to update multiple base tables you will get an error as shown in the
above figure.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Is it possible to insert NULL value in to unique keys ?
The whole point about unique keys is to have unique value. So it treats NULL has one the
values and allows you to insert one NULL value (stressing it one NULL value) in to the
table.
• Collect data: - Data in an enterprise can be stored in various formats. Now these
formats can vary from normalized structured RDBMS to excel sheets or probably
unstructured file formats. So the first step in BI is to collect all these unstructured
and scattered data and bring them in to one uniform format.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What is the difference between data warehouse and data mart?
Data warehouse is a database which is used to store data for reporting and data analysis.
Data in data ware house come from disparate sources like structured RDBMS, Files or
any other source. ETL fetches data from these sources and loads it in to data warehouse.
Data warehouse can be further divided in to data marts. Data warehouse focuses on wide
data while data mart focuses on single process.
Data warehouse are also termed as OLAP systems. The database designs for these
systems do not follow conventional normalization (1st, 2nd or 3rd normal form) design.
Most them use denormalized design like star schema and snow flake design. They
normally store data in fact and dimension tables.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What is the difference between star schema and snow flake design?
ROLAP stands for Relational Online Analytical Processing, MOLAP stands for
Multidimensional Online Analytical Processing and HOLAP stands for Hybrid Online
Analytical Processing.
When we talk about BI it has two kinds of data, one is the actual data and other is the
aggregated data. For instance you can see in the below table we have customer name,
quantity, per price and total amount. In this case the total amount column is a derived or
aggregated data while all the other fields are actual data (detail data).
ROLAP, MOLAP and HOLAP define storage structure for Business intelligence. It
defines how the actual (detail) data and aggregated data are stored.
ROLAP stores data and aggregated data in relational format. So in this case query
performance is low and also latency is low. Low latency means you get
aggregated/calculated data instantly.
MOLAP stores data in cubes i.e. multi-dimensional format. It uses facts, measures and
dimension table structure to form a cube (read the previous question for cubes). Multi-
dimensional database design format is optimized for better query performance. So the
performance is much better as compared to ROLAP but the latency is high.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
The latency is high in MOLAP because it needs to fetch data from relational database
(operational data), do calculations / aggregations / and convert the relational structure
data to cube structure data.
HOLAP is a hybrid approach. It’s a combination of MOLAP and ROLAP. HOLAP stores
detail data in to relational database i.e. (ROLAP) and the aggregated data is stored in to
MOLAP (cubes).
Due to this approach query is faster than ROLAP but not as fast as MOLAP. Latency is
less than MOLAP but higher than ROLAP.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Storage structure Query performance Latency
ROLAP Relational Low Low
MOLAP Cubes High High
HOLAP Relational + Cubes Medium Low
SSIS (SQL Server integration services) helps to collect data. In other words it does ETL
(Extract transformation and loading) as explained in the previous question.
SSAS (SQL Server analysis services) helps us to analyze data by creating cubes, facts ,
dimensions and measures.
SSRS ( SQL Server reporting services) helps us to view this analyzed data in different
formats like graphical , tabular etc.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Chapter 4:- Business intelligence (SSIS)
Data flow defines flow of data from a source to a destination. In other words it defines
individual ETL.
Control flow defines work flow (iterations, conditional checks etc.) and component
execution (example: - send email, copy files, invoke web services, FTP etc.). Control
flow has nothing to do with data. The data flow only deal with data i.e. movement,
transformation and loading.
For instance below is a simple example, lets understand where data flow and control flow
will fit in.In the below example we need to read files from a folder and load the same in
SQL Server. Once this file is loaded, delete the file and proceed with the next file in the
folder. This continues until there are no more files present in the folder.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
In the above process data flow will take care of reading the file, mapping and loading it to
SQL Server. While control flow will delete the file once data flow has loaded the file in
to database. It will check if there more files in the folder and the accordingly invoke the
data flow.
Data flow task: - It defines source, mapping and transformation. It defines the core ETL
process.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Control flow task: - This section helps to define logic and invoke task like data flow
tasks, send email task, web service task etc.
Package: - It’s a collection of control flow tasks. It’s the unit of work which is retrieved,
executed and saved.
Client: - SSIS system can be connected via various clients like SSIS designer which
comes with BI IDE, custom applications, SSIS wizard etc. These clients use the object
model to communicate with the SSIS system.
Connection manager and data sources: - They help us to define connection objects
with data sources which can be reused in data flow task.
DTSX files, MSDB and Integration service: - The complete package can be stored in
DTSX files or database. These packages can later be connected and invoked using
integration service which runs at the background as windows services.
Event handlers: - When you run any SSIS package you would like to trap various events
like OnError, OnPostExecute, OnPreExecute etc. to run certain logic like logging in to
database, send emails etc.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are the different locations of storing SSIS packages?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
You can store SSIS packages in three locations Physical files (File system), SSIS package
store or SQL Server.
There are two types of variables user defined and system defined variables. System
defined variables are those variables which are ready made and given by the system.
Some of the examples of system defined variables are package name, last modified etc.
User defined variables are created by SSIS developers. User defined variables can be
created by right clicking on package / data flow – variables – add variables.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
User defined variables can have a package scope, data flow scope, container scope etc.
The “For Loop Container” executes specified number of times like 10 times, 20 times
until the specified condition is met.
The “Foreach Loop Container” runs over an iterator. This iterator can be files from a
folder, records from ADO, data from a variable etc.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are precedence constraints in SSIS?
Precedence constraints links tasks/ containers and also specifies conditions on which
those tasks/containers should execute.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
To add a precedence constraint, right click on any task and click “Add precedence
constraint”. You can then specify precedence constraint using the editor as shown in the
below figure.
Sequence containers group’s set of tasks logically. They help to define multiple control
flows inside a package.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Following are the benefits of using sequence containers:-
• Makes your package more readable and easy to maintain. You can expand and
collapse the container for ease of reading during design mode.
• You can define transactions for a group of task by specifying transaction attribute
at the sequence container level.
• Define variables which are scoped only for the container.
• You can do focus debugging on a particular container and disable other
containers.
• Manage properties at container level rather than on individual tasks. For instance
you can enable and disable a sequence container rather than enabling and
disabling each task.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
How can we consume web services in SSIS?
We can consume web services in SSIS by using the “Webservice” task. The data which
is received from the web service can be directed to a file or a SSIS variable.
Many times you get raw data (as the one shown below) and you would like to understand
what kind of quality does this data have ?. For example for the below data you would
probably like to know:-
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Yadav 91-022-2130928933 8/13/1983 1000 IND AQPR D010 IND 5
Dinesh dinesh@yahoo.com 1/17/1966 5000 IND E007 D011 IND 5
This can be achieved by using data profiling task. Data profiling task is available in the
control flow toolbox.
Following steps needs to be followed:-
• Create profile request in data profiling task.
• Once you run the data profiling task it creates a XML output.
• You can then view the XML output using data profile viewer. Data profile viewer
exists in “C:\Program Files (x86)\Microsoft SQL Server\100\DTS\Binn”
directory.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Below are more details of what kind of data analysis is performed by these 8 profile
requests.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
columns ?. Helps to detect a likely foreign key
column ?.
In merge the data from two inputs are merged as one. In merge join two inputs are
merged on a basis of a common key. In merge join you can specify left join, right join or
inner join depending on the join key.
A big “NO”.
How can you send a single data source output to multiple SSI
controls?
This can be achieved by using Multicast control. For instance you can see in the below
figure we have a single ADO.NET source. Data coming from this single data source
needs to be used in lookup control (search customer), sort control (for sorting by
customer name) and conditional split control (check sales > 10000).
You can see how SSIS multicast control has three outputs which is broadcasted to all
these three controls.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
You have millions of records in production, you want to sample some
data to test a SSIS package ?
Many times we would like to test SSIS project against production test data. Normally
Production data is huge in quantity. If you run the project against the full production data
it can take ages for running and testing your SSIS project.
So for those kinds of scenarios we have two great SSIS components “Percentage
sampling” and “Row Sampling”. “Row sampling” extracts exact number of records. So
for example you have 150 records and you want to sample only 50 “Row sampling” is the
way to go.
But in some scenarios we want to sample percentage of data rather than absolute data.
For instance from 150 records we want to sample 22% of 150 which comes to 33 records.
So for those kinds of scenarios we can use “Percentage sampling”.
Both these components are found in the SSIS toolbox as shown in the below figure.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What is the use of “Audit” component ?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
How can we apply scale-out architecture for SQL Server Analysis
Services?
Before we answer this question lets first define scalability, scale-up and scale-out.
Scalability is the ability of a system to handle growing amount of load without degrading
performance. Consider you have SSAS system which runs with 100 users efficiently.
Let’s say over a period of time your load increases to 500 users, your system still has the
ability to handle the load and provide the same efficiency.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
You can achieve scalability by two ways either you can Scale-up or Scale-out. In scale-
up we have only one machine and we increase the processing power of the machine by
adding more processor, more ram, more hard disk etc. So if your load increases you add
more ram , more processor etc , but you do not add extra physical machines.
In Scale-out, as your load increases you add more computers and you have load balancers
in front of those machines to distribute load appropriately.
Now when we talk about sql server analysis services we have two big tasks one is
querying the analysis service cube and other is processing of the cube. So we can create
scale-out architecture by having dedicated machines for processing the SQL Server
analysis services queries and dedicated machine for processing the cube.
Below is how the physical architecture shapes up. The boxes represent physical
computers. So you can see how we have separate physical machines to handle sql server
analysis services queries and separate physical machines which call pull data from data
source, process the cube and replicate the cube data to query physical servers.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
You want your cube to support localization ?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Chapter 2:- SQL Server 2012
What are the new features which are added in SQL Server 2012?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Relational database store data “row wise”. These rows are further stored in 8 KB page
size.
For instance you can see in the below figure we have table with two columns “Column1”
and “Column2”. You can see how the data is stored in two pages i.e. “page1” and
“page2”. “Page1” has two rows and “page2” also has two rows.
Now if you want to fetch only “column1”, you have to pull records from two pages i.e.
“Page1” and “Page2”, see below for the visuals.
As we have to fetch data from two pages its bit performance intensive.
If somehow we can store data column wise we can avoid fetching data from multiple
pages. That’s what column store indexes do. When you create a column store index it
stores same column data in the same page.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
You can see from the below visuals, we now need to fetch “column1” data only from one
page rather than querying multiple pages.
Let’s discuss the second point. When data is stored row wise it’s not ideal for
compressing data as you get disparate data from various columns. When you create
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
column store indexes compression algorithm exploits repeating/ similar data and makes
compression more efficient. You can expect 50% or more benefit in space saving when
you use column store indexes.
To create column store indexes, expand the table, right click on the indexes folder and
click on “Non-Clustered column stored index”. Once you click on this menu you can
specify the columns on which column store indexes can be applied. See the below figure
for visuals.
Column stored indexes was mainly created for OLAP applications to increase the select
performance. Due to this focused implementation below are some the important limitations:-
• Insert, update and delete SQL commands do not work on table which have column store
indexes.
• Column store indexes do not support the following data types :-
o decimal greater than 18 digits
o binary and varbinary
o BLOB
o CLR
o (n)varchar(max)
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
o datetimeoffset with precision greater than 2
• Replication cannot be implemented.
• Indexed views cannot be applied.
BISM stands for Business intelligence semantic model. It’s actually unifies Multi-dimensional
model plus tabular model. In the previous SSAS ( SQL Server analysis ) version if we want to do
analysis, your database design structure needs to comply with OLAP model. In other words we
need to define the structure in Fact and measures (star schema or snow flake).
But then there are certain classes (developers, people working with excel) of people who are very
comfortable with tabular formats. Where people think in terms of tables and relationships. That’s
where tabular model is introduced.
So if you are pure corporate BI guy, use your Multi-dimensional model and if you are
happy going personal BI person (developer, simple end user who uses excel) you can use
Tabular model
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are Sequence objects ?
A sequence object generates sequence of unique numeric values as per specifications.
Many developers would have now got a thought we have something similar like this
called “Identity” columns. But the big difference is sequence object is independent of a
table while identity columns are attached to a table.
Below is a simple code to create a sequence object. You can see we have created a
sequence object called as “MySeq” with the following specification:-
• Starts with value 1.
• Increments with value 1
• Minimum value it should start is with zero.
• Maximum it will go to 100.
• No cycle defines that once it reaches 100 it will throw an error. If you want to restart it
from 0 you should provide “cycle”.
• “cache 50” specifies that till 50 the values are already incremented in to cache to
reduce IO. If you specify “no cache” it will make input output on the disk.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
minvalue 0 -- Minimum value to start is zero
maxvalue 100 -- Maximum it can go to 100
no cycle -- Do not go above 100
cache 50 -- Increment 50 values in memory rather than
incrementing from IO
To increment the value we need to call the below select statement. This is one more big
difference as compared to identity.In identity the values increment when rows are added
here we need to make an explicit call.
SELECT NEXT VALUE FOR dbo.MySequence AS seq_no;
These are new commands in SQL Server 2012. They help to do pagination. See the next
question which answers the same in more detail.
There are instances when you want to display large result sets to the end user. The best
way to display large result set is to split them i.e. apply pagination. So developers had
their own hacky ways of achieving pagination using “top”, “row_number” command etc.
But from SQL Server 2012 onwards we can do pagination by using “OFFSET” and
“FETCH’ commands.
For instance let’s says we have the following customer table which has 12 records. We
would like to split the records in to 6 and 6.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Second specify how many rows you want to fetch by using “FETCH” command.
You can see in the below code snippet we have used “OFFSET” to mark the start of row
from “0”position. A very important note order by clause is compulsory for “OFFSET”
command.
select * from
tblcustomer order by customercode
offset 0 rows
In the below code snippet we have specified we want to fetch “6” rows.
fetch next 6 rows only
Now if you run the above SQL you should see 6 rows.
To fetch the next 6 rows just change your “OFFSET” position. You can see in the below
code snippet I have modified the offset to 6. That means the row start position will from
“6”.
select * from
tblcustomer order by customercode
offset 6 rows
The above code snippet displays the next “6” records , below is how the output looks.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are contained databases in SQL Server?
This is a great feature for people who have to go through pain of SQL Server database
migration again and again. One of the biggest pains in migrating databases is user
accounts. SQL Server user resides either in windows ADS or at SQL Server level as
SQL Server users. So when we migrate SQL Server database from one server to other
server these users have to be recreated again. If you have lot’s of users you would need
one dedicated person sitting creating one’s for you.
So one of the requirements from easy migration perspective is to create databases which
are self-contained. In other words, can we have a database with meta-data information,
security information etc with in the database itself. So that when we migrate the database,
we migrate everything with it. There’s where “Contained” database where introduced in
SQL Server 2012.
Step 1:- First thing is to enable contained database at SQL Server instance level. You can
do the same by right clicking on the SQL Server instance and setting “Enabled
Contained Database” to “true”.
You can achieve the same by using the below SQL statements as well.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
sp_configure 'show advanced options',1
GO
RECONFIGURE WITH OVERRIDE
GO
sp_configure 'contained database authentication', 1
GO
RECONFIGURE WITH OVERRIDE
GO
Step 2:- The next step is to enable contained database at database level. So when create a
new database set “Containment type” to partial as shown in the below figure.
You can also create database with “containment” set to “partial” using the below SQL
code.
CREATE DATABASE [MyDb]
CONTAINMENT = PARTIAL
ON PRIMARY
( NAME = N'My', FILENAME = N'C:\My.mdf')
LOG ON
( NAME = N'My_log', FILENAME =N'C:\My_log.ldf')
Step 3:- The final thing now is to test if “contained” database fundamental is working or
not. Now we want the user credentials to be part of the database , so we need to create
user as “SQL User with password”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
You can achieve the same by using the below script.
Now if you try to login with the user created, you get an error as shown in the below
figure. This proves that the user is not available at SQL Server level.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Now click on options and specify the database name in “connect to database” , you
should be able to login , which proves that user is part of database and not SQL Server
instance.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is Collation in SQL Server?
Collation refers to a set of rules that determine how data is sorted and compared. Character data is
sorted using rules that define the correct character sequence, with options for specifying case-
sensitivity, accent marks, kana character types, and character width.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 1.19: - Collation according to language
Case sensitivity
If A and a, B and b, etc. are treated in the same way then it is case-insensitive. A computer treats
A and a differently because it uses ASCII code to differentiate the input. The ASCII value of A is
65, while A is 97. The ASCII value of B is 66 and b is 98.
Accent sensitivity
If “a” and “A”, o and “O” are treated in the same way, then it is accent-insensitive. A computer
treats “a” and “A” differently because it uses ASCII code for differentiating the input. The ASCII
value of “a” is 97 and “A” 225. The ASCII value of “o” is 111 and “O” is 243.
Kana Sensitivity
When Japanese kana characters Hiragana and Katakana are treated differently, it is called Kana
sensitive.
Width sensitivity
When a single-byte character (half-width) and the same character when represented as a double-
byte character (full-width) are treated differently then it is width sensitive.
Chapter 2: SQL
Note: - This is one of the crazy things which I did not want to put in
my book. But when I did sampling of some real interviews conducted
across companies I was stunned to find some interviewer judging
developers on syntaxes. I know many people will conclude this is
childish but it’s the interviewer’s decision. If you think that this
chapter is not useful you can happily skip it. But I think on fresher’s
level they should not
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Note: - I will be heavily using the “AdventureWorks” database which is
a sample database shipped (in previous version we had the
famous’NorthWind’ database sample) with SQL Server 2005. Below is a
view expanded from “SQL Server Management Studio”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Select * from Person.Address
Select AddressLine1, City from Person.Address
Select AddressLine1, City from Person.Address where city ='Sammamish'
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Orders.Customerid. So this SQL will only give you result with customers who have orders. If the
customer does not have order, it will not display that record.
(Q) You want to select the first record in a given set of rows?
Select top 1 * from sales. salesperson
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
select * from sales.salesperson order by salespersonid desc
Delete Truncate
Filters In delete we can specify a Truncate deletes all records, we can not
where clause. specify filters.
Data removal Delete removes data one row Truncate removes data by removing pages.
procedure at a time. Pages are 8 KB units which stores row data.
Performance Slower, as delete happens Truncate is faster as compared to delete as
row wise. data is removed in pages.
Triggers Triggers are executed in In truncate triggers are disabled.
delete as the data is removed
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
row wise.
Identity In delete the identity value is In truncate the identity is reset back to zero.
retained.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Select AddressLine1 from person.address where AddressLine1 like '_h%'
So all data where second letter is “h” is returned.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
This returns 39228 rows (“unionall” does not check for duplicates so returns double the record
show up)
Note: - Selected records should have same data type or else the syntax
will not work.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What are cursors and what are the situations you will use them?
SQL statements are good for set at a time operation. So it is good at handling set of data. But
there are scenarios where you want to update row depending on certain criteria. You will loop
through all rows and update data accordingly. There is where cursors come in to picture.
This is a small sample, which uses the “person.address” class. This T-SQL program will only
display records, which have “@Provinceid” equal to “7”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
DECLARE @provinceid int
-- Declare Cursor
DECLARE provincecursor CURSOR FOR
SELECT stateprovinceid
FROM Person.Address
-- Open cursor
OPEN provincecursor
-- Fetch data from cursor in to variable
FETCH NEXT FROM provincecursor
INTO @provinceid
WHILE @@FETCH_STATUS = 0
BEGIN
-- Do operation according to row value
if @Provinceid=7
begin
PRINT @Provinceid
end
-- Fetch the next cursor
FETCH NEXT FROM provincecursor
INTO @provinceid
END
-- Finally do not forget to close and deallocate the cursor
CLOSE provincecursor
DEALLOCATE provincecursor
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
[FOR UPDATE [OF column_list]]
STATIC
STATIC cursor is a fixed snapshot of a set of rows. This fixed snapshot is stored in a temporary
database. As the cursor is using private snapshot any changes to the set of rows external will not
be visible in the cursor while browsing through it. You can define a static cursor using “STATIC”
keyword.
KEYSET
In KEYSET the key values of the rows are saved in tempdb. For instance, let us say the cursor
has fetched the following below data. So only the “supplierid” will be stored in the database. Any
new inserts happening is not reflected in the cursor. But any updates in the key-set values are
reflected in the cursor. Because the cursor is identified by key values you can also absolutely
fetch them using “FETCH ABSOLUTE 12 FROM mycursor”
DYNAMIC
In DYNAMIC cursor you can see any kind of changes happening i.e. either inserting new records
or changes in the existing and even deletes. That’s why DYNAMIC cursors are slow and have
least performance.
FORWARD_ONLY
As the name suggest they only move forward and only a one time fetch is done. In every fetch the
cursor is evaluated. That means any changes to the data are known, until you have specified
“STATIC” or “KEYSET”.
FAST_FORWARD
These types of cursor are forward only and read-only and in every fetch they are not re-evaluated
again. This makes them a good choice to increase performance.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What are “Global” and “Local” cursors?
Cursors are global for a connection. By default cursors are global. That means you can declare a
cursor in one stored procedure and access it outside also. Local cursors are accessible only inside
the object (which can be a stored procedure, trigger or a function). You can declare a cursor as
“Local” or “Global” in the “DECLARE” cursor syntax. Refer the “DECLARE” statement of the
cursor in the previous sections.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 2.9: - Group by applied
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 2.10: - Salesorder displayed with out ROLLUP
So after using ROLLUP you can see the sub-total. The first row is the grand total or the main
total, followed by sub-totals according to each combination of “Productid” and “Specialofferid”.
ROLLUP retrieves a result set that contains aggregates for a hierarchy of values in selected
columns.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
.Figure 2.11: - Subtotal according to product using ROLLUP
select sales.salesterritory.name ,
count(sales.salesperson.territoryid) as numberofsalesperson
from sales.salesperson
inner join sales.salesterritory on
sales.salesterritory.territoryid=sales.salesperson.territoryid
group by sales.salesperson.territoryid,sales.salesterritory.name
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
having count(sales.salesperson.territoryid) >= 2
Note:- You can see the having clause applied. In this case you can not
specify it with “WHERE” clause it will throw an error. In short
“HAVING” clause applies filter on a group while “WHERE” clause on a
simple SQL.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 2.14: - Actual Data
So when we do a TOP 3 on the “ProductCost” table we will see three rows as show below. But
even p3 has the same value as p4. SQL just took the TOP 1. So if you want to display tie up data
like this you can use “WITH TIES”.
Note: - You should have an “ORDER CLAUSE” and “TOP” keyword specified
or else “WITH TIES” is not of much use.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What does “SET ROWCOUNT” syntax achieves?
Twist: - What is the difference between “SET ROWCOUNT” and “TOP” clause
in SQL?
“SET ROWCOUNT” limits the number of rows returned. Its looks very similar to “TOP” clause,
but there is a major difference the way SQL is executed. The major difference between “SET
ROWCOUNT” and “TOP” SQL clause is following:-
“SET ROWCOUNT is applied before the order by clause is applied. So if "ORDER BY" clause
is specified it will be terminated after the specified number of rows is selected. ORDER BY
clause is not executed”
Subquery is a Query inside query. Many times we would like to have chain of SQL statements,
where output of one SQL statement serves as an input to the other SQL statement.
For example in the below figure you can see we have two queries ( Query1 and Query 2). Query
1 ( Inner query) fetches record whose salaries are greater than 150 and that is fed to Query 2 (
outer query ). The outer query takes data given by inner query and displays address and phone
number.
This type of query is called as Sub query.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What are co-related queries?
Co-related queries are same like subquery i.e. its query inside query. But in correlated queries the
data passes to and fro between inside and outside query.
One of the scenarios where co-related queries comes handy is to find second highest, third lowest
etc from a table. Below is a simple co-related query which finds the second highest from a table.
So first the record from the outer query is passed to the inner query, inner query then evaluates
and if the evaluation is true then output is displayed or this process continues until all records are
finished.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What is the difference between co-related query and sub query?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Can you explain Coalesce in SQL Server ?
Coalesce returns the first non-null column from more than one columns. For instance you can see
in the below table either “FirstName” has null values or “SurName” as null value. Now you
would like to pull values using coalesce function and it will return the column which will have
null values.
So when:-
• If “Surname” is NULL and “FirstName” is not NULL , then “FirstName” is returned.
• If “FirstName”is NULL and “SurName” is not NULL , then “SurName” will be retuned.
Below is the output/
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
What is CTE ( Common table expression)?
CTE is a temporary result set which can be used within a execution of a SINGLE
insert,update,delete or select query.
CTE can be used only once in the same execution. You can see the below code snippet
where we have created a simple CTE called as “MyTemp”. In the same execution I have
tried to use it 2 times and you can see how did not identify the “MyTemp” in the second
select execution.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Can you give some real time examples where CTE is useful ?
• Complex SQL queries can be broken down using CTE which will make your code
more readable. Note it does have side effect of performance.
• Recursive query.
• Replacement for views if you do not want to store the metadata.
• You want use aggregate functions in WHERE clause.
• You can group by scalar values which are derived from a result set.
How to delete duplicate records which does not have primary key ?
Let me first explain what this question is all about. Let’s say you have table which has names as
shown in the below figure. Now in the names table you have duplicate records (ex. Shiv) and this
table does not have a primary key.
So the question is how can we delete duplicate records and keep one of the records from the
duplicates. For example from the below table how can we delete one “Shiv” and keep the other
one.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Note: - One more constraints many SQL Server interviewer puts here is
you cannot add an identity column to the table.
Now there are lots of ways of doing and the best I found personally was by using “Row_Number”
and “CTE” (common table expression). In case you are not aware of CTE and row_number
please refer previous questions.
It’s a 2 step process:-
• Create a temp result set using CTE which has a new column which uses “row_number”
using partition.
Now the “row_number” with partition will create unique numbers for unique records
with same. In case the records are same it will increment the number. For instance you
can see for duplicate Shiv record, he has numbered them 1 and 2.But for “Raju” and
“Shyam” he has created fresh number sequence.
• Once the CTE is created delete the records whose row sequence number is greater than one.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Below is the complete code snippet for the same.
with TempNames as
(
select row_number() over (partition by Name order by name) as
RowNo,Name from Names
)
Delete from Tempnames where rowno>1
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
key.)
Truncate Yes No
Alter Table Yes No it’s just variable.
Affected by SQL Server Yes No
optimization
Parallelism Yes No.
BEGIN TRY
DELETE table1 WHERE id=122
END TRY
BEGIN CATCH
SELECT
ERROR_NUMBER() AS ErrNum,
ERROR_SEVERITY() AS ErrSev,
ERROR_STATE() as ErrSt,
ERROR_MESSAGE() as ErrMsg;
END CATCH
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is PIVOT feature in SQL Server?
PIVOT feature converts row data to column for better analytical view. Below is a simple PIVOT
fired using CTE. Ok the first section is the CTE which is the input and later PIVOT is applied
over it.
WITH PURCHASEORDERHEADERCTE(Orderdate,Status,Subtotal) as
(
Select year(orderdate),Status,isnull(Subtotal,0) from
purchasing.PURCHASEORDERHEADER
)
Select Status as OrderStatus,isnull([2001],0) as 'Yr 2001'
,isnull([2002],0) as 'Yr 2002' from PURCHASEORDERHEADERCTE
pivot (sum(Subtotal) for Orderdate in ([2001],[2002])) as pivoted
You can see from the above SQL the top WITH statement is the CTE supplied to the PIVOT.
After that PIVOT is applied on subtotal and orderdate. You have to specify in what you want the
pivot (here it is 2001 and 2002). So below is the output of CTE table.
After the PIVOT is applied, you can see the rows are now grouped column wise with the subtotal
assigned to each. You can summarize that PIVOT summarizes your data in cross tab format.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is UNPIVOT?
It’s exactly the vice versa of PIVOT. That means you have a PIVOTED data and you want to
UNPIVOT it.
The ROW_NUMBER() function adds a column that displays a number corresponding the row's
position in the query result . If the column that you specify in the OVER clause is not unique, it
still produces an incrementing column based on the column specified in the OVER clause. You
can see in the figure below I have applied ROW_NUMBER function over column col2 and you
can notice the incrementing numbers generated.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 2.20: - RANK
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 2.22: - NTILE in Action
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Write the managed code and compile it to a DLL / Assembly.
• After the DLL is compiled using the “CREATE ASSEMBLY” command you can load
assembly in to SQL SERVER. Below is the create command which is loading
“mycode.dll” in to SQL SERVER using the “CREATE ASSEMBLY” command.
(Q) I want to see which files are linked with which assemblies?
Assembly files system tables have the track about which files are associated with what
assemblies.
Note: - You can create SQL SERVER projects using VS 2005 which
provides ready made templates to make development life easy.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 3.1 : - Creating SQL SERVER Project using VS2005
(Q) Does .NET CLR and SQL SERVER run in different process?
.NET CLR engine (hence all the .NET applications) and SQL SERVER run in the same process
or address space. This “Same address space architecture” is implemented so that there no speed
issues. If the architecture were implemented the other way (i.e. SQL SERVER and .NET CLR
engine running in different memory process areas), there would have been reasonable speed
issue.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 3.2:- CLR Controlled by Host Control
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Note: - You can see after running the SQL “clr enabled” property is
changed from 0 to 1 , which indicates that the CLR was successfully
configured for SQL SERVER.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
“Sandbox is a safe place for running semi-trusted programs or scripts,
often originating from a third party.”
Now for SQL Server it is .NET the external third party that is running and SQL Server
has to ensure that .NET runtime crashes does not affect his working. So in order that SQL
Server runs properly there are three sandboxes that user code can run:-
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 3.4:- Application Domain architecture
Note: - This can be pretty confusing during interviews so just make one
note “One Appdomain per Owner Identity per Database”.
(Q) What is Syntax for creating a new assembly in SQL Server 2005?
CREATE ASSEMBLY customer FROM 'c:\customers\customer.dll'
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) You have an assembly, which is dependent on other assemblies;
will SQL Server load the dependent assemblies?
Ok. Let me make the question clearer. If you gave “Assembly1.dll” who is using
“Assembly2.dll” and you try cataloging “Assembly1.dll” in SQL Server will it catalog
“Assembly2.dll” also? Yes it will catalog it. SQL Server will look in to the manifest for
the dependencies associated with the DLL and load them accordingly.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is pre-emptive threading?
In pre-emptive threading operating system schedules, which thread should run, rather
threads making there own decisions.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is “Hostprotectionattribute” in SQL Server 2005?
As said previously .NET 2.0 provides capability to host itself and that is how SQL Server
interacts with the framework. But there should be some mechanism by which the host who is
hosting .NET assemblies should be alerted if there is any out of serious code running like
threading, synchronization etc. This is what exactly the use of “Hostprotectionattribute”. It acts
like a signal to the outer host saying what type of code it has. When .NET Framework 2.0 was in
development Microsoft tagged this attribute on many assemblies , so that SQL Server can be
alerted to load those namespaces or not. Example if you look at System. Windows you will see
this attribute.
So during runtime SQL Server uses reflection mechanism to check if the assembly has valid
protection or not.
(Q) How many types of permission level are there for an assembly?
There are three types of permission levels for an assembly:-
External Access
It is like safe but you can access network resources like files from network, file system, DNS
system, event viewer’s etc.
Unsafe
In unsafe code you can run anything, you want. You can use PInvoke; call some external
resources like COM etc. Every DBA will like to avoid this and every developer should avoid
writing unsafe code unless very much essential. When we create an assembly we can give the
permission set at that time.
Note: - We had talked about sand boxes in the previous question. Just
small note sandboxes are expressed by using the permission level
concepts.
(Q) In order that an assembly gets loaded in SQL Server what type of
checks are done?
SQL Server uses the reflection API to determine if the assembly is safe to load in SQL Server.
Following are the checks done while the assembly is loaded in SQL Server:-
• It does the META data and IL verification, to see that syntaxes are appropriate of the IL.
• If the assembly is marked as safe and external then following checks are
• Check for static variables, it will only allow read-only static variables
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Some attributes are not allowed for SQL Server and those attributes are checked.
• Assembly has to be type safe that means no unmanaged code or pointers are allowed.
• No finalizer’s are allowed.
Note: - SQL Server checks the assembly using the reflection API, so the
code should be IL compliant.
You can do this small exercise to check if SQL Server validates your code or not. Compile the
simple below code, which has static variable defined in it. Now because the static variable is not
read-only it should throw an error.
using System; namespace StaticDll { public class Class1 { static int i;
} }
After you have compiled the DLL, use the Create Assembly syntax to load the DLL in SQL
Server. While cataloging the DLL you will get the following error:-
Msg 6211, Level 16, State 1, Line 1 CREATE ASSEMBLY failed because type
'StaticDll.Class1' in safe assembly 'StaticDll' has a static field 'i'.
Attributes of static fields in safe assemblies must be marked readonly
in Visual C#, ReadOnly in Visual Basic, or initonly in Visual C++ and
intermediate language.
We can do a small practical hand on to see how the assembly table looks like. Let us try to create
a simple class class1. Code is as shown below.
using System;
using System.Collections.Generic;
using System.Text;
namespace Class1
{
public class Class1
{
}
}
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 3.6: - Assembly related system files.
Then we create the assembly by name “X1” using the create assembly syntax. In above image is
the query output of all three main tables in this sequence sys.assemblies, sys.assembly_files, and
sys.assembly_references.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) In one of the projects following steps where done, will it work?
Twist: - Are public signature changes allowed in “Alter assembly” syntax?
Following are the steps:-
• Created the following class and method inside it.
public class clscustomer
{
Public void add()
{
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) Can we create SQLCLR using .NET framework 1.0?
No at this moment only .NET 2.0 versions and above is supported with SQL Server.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
where written in probably assembly and “C” which makes them much faster for data
access.
• Pure Non-data access code like computation, string-parsing logic etc should be written in
.NET. If you want to access web services or want to exploit OOP is programming for
better reusability and read external files its good to go for .NET.
We can categorize our architect decision on there types of logic:-
• Pure Data access functionality – Go for T-SQL.
• Pure NON-Data access functionality – Go for .NET.
• Mixture of data access and NON-Data access – Needs architecture decision.
If you can see, the first two decisions are straightforward. But the third one is where you will
have do a code review and see what will go the best. Probably also run it practically, benchmark
and see what will be the best choice.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) How is precision handled for decimal data types in .NET?
Note: - Precision is actually the number of digits after point which
determines how accurate you want to see the result. Example in “9.29”
we have precision of two decimal places (.29).
In .NET, we declare decimal data types with out precision. However, in SQL Server you can
define the precision part also.
decimal i; --> .NET Definition
decimal (9,2) --> SQL Server Definition
This creates a conflict when we want the .NET function to be used in T-SQL as SQLCLR and we
want the precision facility.
Here is the answer you define the precision in SQL Server when you use Create syntax. So even
if .NET does not support the precision facility we can define the precision in SQL Server.
.NET definition
func1(decimal x1)
{
}
SQL Server definition
create function func1(@x1 decimal(9,2))
returns decimal
as external name CustomerAssembly.[CustomerNameSpace.ClsCustomer].func1
If you see in the above code, sample func1 is defined as simple decimal but later when we are
creating the function definition in SQL Server, we are defining the precision.
But for “out” types of parameters there are no mappings defined. Its logical “out” types of
parameter types does not have any equivalents in SQL Server.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) Is it good to use .NET data types in SQLCLR?
No, it is always recommended to use SQL Server data types for SQLCLR code as it implements
better integration. Example for “int” data type in .NET we cannot assign NULL value it will
crash, but using SQL data type SqlInt32 NULLS will be handled. All SQL data types are
available in “system.data.SQLtypes”, so you have to refer this namespace in order to get
advantage of SQL data types.
SqlInt32 x = 3;
int y = x.Value;
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
means that there is already a connection existing so that you can access the SQLContext. And any
connections created to access SQLContext are a waste as there is already a connection opened to
SQL Server.
These all things are handled by SQLContext.
Which are the four static methods of SQLContext?
Below are the four static methods in SQLContext:-
Get Connection ():- This will return the current connection
Get Command ():- Get reference to the current batch
Get Transaction ():- If you have used transactions, this will get the current transaction
Get Pipe ():- This helps us to send results to client. The output is in Tabular Data stream format.
Using this method you can fill in data reader or data set, which can later be used by client to
display data.
Note: - In top question I had shown how we can manually register the
DLL’s in SQL Server but in real projects no body would do that rather
we will be using the VS.NET studio to accomplish the same. So we will
run through a sample of how to deploy DLL’s using VS.NET and paralelly
we will also run through how to use SQLContext.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 3.8: - Template dialog box
As these DLL’s need to be deployed on the server, you will need to specify the server details also.
So for the same you will be prompted to specify database on which you will deploy the .NET
stored procedure. Select the database and click ok. In case you do not see the database, you can
click on “Add reference” to add the database to the list.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Once you specify the database, you are inside the visual studio.net editor. At the right hand side,
you can see the solution explorer with some basic files created by visual studio in order to deploy
the DLL on the SQL Server. Right click on SQL Server project and click on ADD --> New items
are displayed as shown in figure below.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Get the command from the context.
• Set the command text, at this moment we need to select everything from “Production.
Product” table.
• Finally get the Pipe and execute the command.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 3.14: - SelectProductall listed in database
Just to test I have executed the stored procedure and everything working fine.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
// Put your code here
SqlTriggerContext objtrigcontext =
SqlContext.GetTriggerContext();
SqlPipe objsqlpipe = SqlContext.GetPipe();
SqlCommand objcommand = SqlContext.GetCommand();
if (objtrigcontext.TriggerAction == TriggerAction.Insert)
{
objcommand.CommandText = "insert into table1
values('Inserted')";
objsqlpipe.Execute(objcommand);
}
}
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
process. After you finish the process, you can come back and see your results. Long running
queries can really be benefited from Asynchronous support.
conn.Open();
IAsyncResult myResult = mycommand.BeginExecuteReader();
while (!myResult.IsCompleted)
{
// execute some other process
}
// Finally process the data reader output
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
command.ExecutePageReader(CommandBehavior.Default, 1, 10);
You can see in the above example I have selected 10 rows and starts from one. This functionality
will be used mainly when you want to do paging on UI side. For instance, you want to show 10
records at a time to the user this can really ease of lot of pain.
(Q) How can you raise custom errors from stored procedure?
The RAISERROR statement is used to produce an ad hoc error message or to retrieve a custom
message that is stored in the sysmessages table. You can use this statement with the error
handling code presented in the previous section to implement custom error messages in your
applications. The syntax of the statement is shown here.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
RAISERROR ('An error occurred updating the Nonfatal table', 10,1)
--Results--
An error occurred updating the nonfatal table
The statement does not have to be used in conjunction with any other code, but for our purposes,
it will be used with the error handling code presented earlier. The following alters the
ps_NonFatal_INSERT procedure to use RAISERROR.
USE tempdb
go
ALTER PROCEDURE ps_NonFatal_INSERT
@Column2 int =NULL
AS
DECLARE @ErrorMsgID int
INSERT Nonfatal VALUES (@Column2)
SET @ErrorMsgID =@@ERROR
IF @ErrorMsgID <>0
BEGIN
RAISERROR ('An error occurred updating the Nonfatal table',10,1)
END
When an error-producing call is made to the procedure, the custom message is passed to the
client. The following shows the output generated by Query Analyzer.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
notification from the ends user. User just sends the message to queue, which is later picked up by
the mailing system and sent to the desired end-user.
Note: - MSMQ does the messaging and queuing, but now the queuing
functionality is leveraged to SQL Server 2005, due to its practical
needs.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• You can lock a conversation group during reading, so that no other process can read those
queue entries.
• The most difficult thing in an asynchronous message system is to maintain states. There is
huge delay between arrivals of two messages. So conversation groups maintains state
using state table. Its uses instance ID to identify messages in a group.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 6.2: - Message, contract and service
Above figure shows how SQL Server Service broker works. Client who want to use the queues
do not have to understand the complexity of queues. They only communicate with the logical
view of SQL Server Service broker objects (Messages, Contracts, and Services). In turn, these
objects interact with the queues below and shield the client from any physical complexities of
queues.
Below is a simple practical implementation of how this works. Try running the below statements
from a T-SQL and see the output.
-- Create a Message type and do not do any data type validation for
this
CREATE MESSAGE TYPE MessageType
VALIDATION = NONE
GO
-- Create Message contract what type of users can send these messages
at this moment we are defining current as an initiator
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
CREATE CONTRACT MessageContract
(MessageType SENT BY INITIATOR)
GO
-- Declare the two end points that’s sender and receive queues
CREATE QUEUE SenderQ
CREATE QUEUE ReceiverQ
GO
-- Create service and bind them to the queues
CREATE SERVICE Sender
ON QUEUE SenderQ
CREATE SERVICE Receiver
ON QUEUE ReceiverQ (MessageContract)
GO
-- Send message to the queue
DECLARE @conversationHandle UNIQUEIDENTIFIER
DECLARE @message NVARCHAR(100)
BEGIN
BEGIN TRANSACTION;
BEGIN DIALOG @conversationHandle
FROM SERVICE Sender
TO SERVICE 'Receiver'
ON CONTRACT MessageContract
- Sending message
SET @message = N'SQL Server Interview Questions by Shivprasad
Koirala';
SEND ON CONVERSATION @conversationHandle
MESSAGE TYPE MessageType (@message)
COMMIT TRANSACTION
END
GO
-- Receive a message from the queue
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
RECEIVE CONVERT(NVARCHAR(max), message_body) AS message
FROM ReceiverQ
-- Just dropping all the object so that this sample can run
successfully
DROP SERVICE Sender
DROP SERVICE Receiver
DROP QUEUE SenderQ
DROP QUEUE ReceiverQ
DROP CONTRACT MessageContract
DROP MESSAGE TYPE MessageType
GO
After executing the above T-SQL command you can see the output below.
Note:- In case your SQL Server service broker is not active you will
get the following error as shown below. In order to remove that error
you have to enable the service broker by using
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 6.4: - Error Service broker not active
7. XML Integration
Note: - In this chapter we will first just skim through basic XML
interview questions so that you do not get stuck up with simple
questions.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
<?xml version="1.0" encoding="ISO-8859-1"?>
<invoice>
<productname>Shoes</productname>
<qty>12</qty>
<totalcost>100</totalcost>
<discount>10</discount>
</invoice>
An XML tag is not something predefined but it is something you have to define according to your
needs. For instance in the above example of invoice all tags are defined according to business
needs. The XML document is self-explanatory; any one can easily understand looking at the
XML data what exactly it means.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
made in JAVA and the other in .NET. But both languages understand XML so one of the
applications will spit XML file which will be consumed and parsed by other applications
You can give a scenario of two applications, which are working separately and how you chose
XML as the data transport medium.
<invoice in number=1002></invoice>
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 7.1: - Specify XML data type
After you have created the schema, you see the MYXSD schema in the schema collections folder.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 7.2: - You can view the XSD in explorer of Management Studio
When you create the XML data type, you can assign the MyXsd to the column.
(Q) How do I insert in to a table that has XSD schema attached to it?
I know many developers will just say what the problem with simple insert statement. Well guys
it’s not easy with attaching the XSD its now a well formed datatype.The above table I have
named as xmltable. So we had specified in the schema two nodes one is ordered and the other
customer name. So here is the insert.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Insert into xmltable values ('<MyXSD
xmlns="http://MyXSD"><Orderid>1</Orderid><CustomerName>Shiv</CustomerNa
me></MyXSD>')
Well first thing, XQUERY is not that something Microsoft invented, it’s a language defined by
W3C to query and manipulate data in a XML. For instance in the above scenario we can use
XQUERY and drill down to specific element in XML.
So to drill down here’s the XQUERY
SELECT * FROM xmltable
WHERE TestXml.exist('declare namespace
xd=http://MyXSD/xd:MyXSD[xd:Orderid eq "4"]') = 1
Note: - It’s out of the scope of this book to discuss XQUERY. I hope
and only hope guys many interviewers will not bang in this section. In
case you have doubt visit www.w3c.org or SQL Server books online they
have a lot of material in to this.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
CREATE PRIMARY XML INDEX xmlindex ON xmltable(TestXML)
(Q) Can I use FOR XML to generate SCHEMA of a table and how?
The below SQL syntax will return the SCHEMA of the table.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
SITE = 'server'
)
FOR SOAP (
WEBMETHOD 'http://tempUri.org/'.'GetTotalSalesOfProduct'
(name='AdventureWorks.dbo.GetTotalSalesOfProduct',
schema=STANDARD ),
BATCHES = ENABLED,
WSDL = DEFAULT,
DATABASE = 'AdventureWorks',
NAMESPACE = 'http://AdventureWorks/TotalSales'
)
Note: - “Data mining” and “Data Warehousing” are concepts which are
very wide and it’s beyond the scope of this book to discuss it in
depth. So if you are specially looking for a “Data mining /
warehousing” job its better to go through some reference books. But
below questions can shield you to some good limit.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
achieve this IT department can setup data mart in all branch offices and a central data warehouse
where all data will finally reside.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.2: - Dimensional Modeling
In the above example, we have three tables, which are transactional tables:-
• Customer: - It has the customer information details.
• Salesperson: - Sales person who are actually selling products to customer.
• CustomerSales: - This table has data of which sales person sold to which customer and
what was the sales amount.
Below is the expected report Sales / Customer / Month. You will be wondering if we make a
simple join query from all three tables, we can easily get this output. However, imagine if you
have huge records in these three tables, it can really slow down your reporting process. So we
introduced a third dimension table “CustomerSalesByMonth” which will have foreign key of all
tables and the aggregate amount by month. So this table becomes the dimension table and all
other tables become fact tables. All major data warehousing design use Fact and Dimension
model.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.3: - Expected Report.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) What is ETL process in Data warehousing?
Twist: - What are the different stages in “Data warehousing”?
ETL (Extraction, Transformation and Loading) are different stages in Data warehousing. Like
when we do software development, we follow different stages like requirement gathering,
designing, coding and testing. In the similar fashion, we have for data warehousing.
Extraction:-
In this process, we extract data from the source. In actual scenarios data source can be in many
forms EXCEL, ACCESS, Delimited text, CSV (Comma Separated Files) etc. Therefore,
extraction process handle is the complexity of understanding the data source and loading it in a
structure of data warehouse.
Transformation:-
This process can also be called as cleaning up process. It is not necessary that after the extraction
process data is clean and valid. For instance, all the financial figures have NULL values but you
want it to be ZERO for better analysis. Therefore, you can have some kind of stored procedure
that runs through all extracted records and sets the value to zero.
Loading:-
After transformation, you are ready to load the information in to your final data warehouse
database.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Bulk Insert
• DTS (Data Transformation Services).DTS is now called as Integration Services.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.6: - Data Warehouse and Data mining
The above figure gives a picture how these concepts are quiet different. “Data Warehouse”
collects cleans and filters data through different sources like “Excel”, “XML” etc. But “Data
Mining” sits on the top of “Data Warehouse” database and generates intelligent reports. Now
either it can export to a different database or just generate report using some reporting tool like
“Reporting Services”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
There are times when you want to move huge records in and out of SQL Server, there is where
this old and cryptic friend will come to use. It is a command line utility. Below is the detail
syntax:-
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.7: - After executing BCP command prompts for some properties
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.9: - FMT file with fields eliminated
If we want to change the sequence, you have to just change the original sequence number. For
instance we have changed the sequence from 9 to 5 --> 5 to 9 , see the figure below.
Once you have changed the FMT file you can specify the .FMT file in the BCP command
arguments as shown below.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Below is a detailed syntax of BULK INSERT. You can run this from “SQL Server Management
Studio”, TSQL or ISQL.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is DTS?
Note : - It is now a part of integration service in SQL Server 2005.
DTS provides similar functionality as we had with BCP and Bulk Import. There are two major
problems with BCP and Bulk Import:-
• BCP and Bulk import do not have user friendly User Interface. Well some DBA does still
enjoy using those DOS prompt commands, which makes them feel doing something
worthy.
• Using BCP and Bulk imports, we can import only from files, what if we wanted to import
from other database like FoxPro, access, and oracle. That is where DTS is the king.
• One of the important things that BCP and Bulk insert misses is transformation, which is
one of the important parts of ETL process. BCP and Bulk insert allows you to extract and
load data, but does not provide any means by which you can do transformation. So for
example you are getting sex as “1” and “2”, you would like to transform this data to “M”
and “F” respectively when loading in to data warehouse
• It also allows you do direct programming and write scripts by which you can have huge
control over loading and transformation process.
• It allows lot of parallel operation to happen. For instance while you are reading data you
also want the transformation to happen in parallel , then DTS is the right choice.
You can see DTS Import / Export wizard in the SQL Server 2005 menu.
Note: - DTS is the most used technology when you are during Data
warehousing using SQL Server. In order to implement the ETL fundamental
properly Microsoft has rewritten the whole DTS from scratch using .NET
and named it as “Integration Services”. There is a complete chapter
which is dedicated to “Integration Services” which will cover DTS
indirectly in huge details. Any interviewer who is looking for data
warehousing professional in SQL Server 2005 will expect that candidates
should know DTS properly.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) Can you brief about the Data warehouse project you worked on?
Note: - This question is the trickiest and shoot to have insight, from
where the interviewer would like to spawn question threads. If you have
worked with a data warehouse project you can be very sure of this. If
not then you really have to prepare a project to talk about…. I know
it’s unethical to even talk in books but?
I leave this to readers, as everyone would like to think of a project of his own. But just try to
include the ETL process which every interviewer thinks should be followed for a data warehouse
project.
Note: - OLTP systems are good at putting data in to database system but
serve no good when it comes to analyzing data.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Conceptual model involves with only identifying entities and relationship between. Fields /
Attributes are not planned at this stage. It is just an identifying stage but not in detail.
Logical model involves in actually identifying the attributes, primary keys, many-to-many
relationships etc of the entity. In short, it is the complete detail planning of what actually has to be
implemented.
Physical model is where you develop your actual structure tables, fields, primary keys, foreign
leys etc. You can say it is the actual implementation of the project.
Note: - To Design conceptual and logical model mostly VISIO is used and
some company combine this both model in one time. So you will not be
able to distinguish between both models.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
ROLAP
Relational OLAP (ROLAP) stores aggregates in relational database tables. ROLAP use of the
relational databases allows it to take advantage of existing database resources, plus it allows
ROLAP applications to scale well. However, ROLAP’s use of tables to store aggregates usually
requires more disk storage than MOLAP, and it is generally not as fast.
HOLAP
As its name suggests, hybrid OLAP (HOLAP) is a cross between MOLAP and ROLAP. Like
ROLAP, HOLAP leaves the primary data stored in the source database. Like MOLAP, HOLAP
stores aggregates in a persistent data store that is separate from the primary relational database.
This mix allows HOLAP to offer the advantages of both MOLAP and ROLAP. However, unlike
MOLAP and ROLAP, which follow well-defined standards, HOLAP has no uniform
implementation.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.12: - Single Dimension view.
The above table gives a three-dimension view; you can have more dimensions according to your
depth of analysis. Like from the above multi-dimension view I am able to predict that “Calcutta”
is the only place where “Shirts” and “Caps” are selling, other metros do not show any sales for
this product.
Note: - If you are planning for data warehousing position using SQL
Server 2005, MDX will be the favorite of the interviewers. MDX itself
is such a huge and beautiful beast that we cannot cover in this small
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
book. I will suggest at least try to grab some basic syntaxes of MDX
like select before going to interview.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• OLAP Cube Design
This is the place where you define your CUBES, DIMENSIONS on the data warehouse database,
which was loaded by the ETL process. CUBES and DIMENSIONS are done by using the
requirement specification. For example, you see that customer wants a report “Sales Per month”
so he can define the CUBES and DIMENSIONS which later will be absorbed by the front end for
viewing it to the end user.
• Front End Development
Once all your CUBES and DIMENSIONS are defined, you need to present it to the user. You can
build your front ends for the end user using C#, ASP.NET, VB.NET any language which has the
ability to consume the CUBES and DIMENSIONS. Front end stands on top of CUBES and
DIMENSION and delivers the report to the end users. With out any front end the data warehouse
will be of no use form user’s perspective.
• Performance Tuning
Many projects tend to overlook this process. However, just imagine a poor user sitting to view
“Yearly Sales” for 10 minutes….frustrating no. There are three sections where you can really
look why your data warehouse is performing slow:-
o While data is loading in database “ETL” process.
This is probably the major area where you can optimize your database. The best is to
look in to DTS packages and see if you can make it better to optimize speed.
o OLAP CUBES and DIMENSIONS.
CUBES and DIMENSIONS are something that will be executed against the data
warehouse. You can look in to the queries and see if some optimization can be done.
o Front-end code.
Front end are mostly coded by programmers and this can be a major bottleneck for
optimization. So you can probably look for loops and you see if the front end is
running too far away from the CUBES.]
• User Acceptance Test ( UAT )
UAT means saying to the customer “Is this product ok with you?” Either it is a testing phase,
which can be done by the customer (and mostly done by the customer), or by your own internal
testing department to ensure that its matches with the customer requirement that was gathered
during the requirement phase.
• Rolling out to Production
Once the customer has approved your UAT, its time to roll out the data warehouse in production
so that customer can get the benefit of it.
• Production Maintenance
I know the most boring aspect from programmer’s point of view, but the most profitable for an IT
company point of view. In data warehousing this will mainly involve doing back ups, optimizing
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
the system and removing any bugs. This can also include any enhancements if the customer wants
it.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Tool Selection: - POC (proof of concept) documents comparing each tool according to
project requirement.
Note: - POC means can we do?. For instance you have a requirement that,
2000 users at a time should be able to use your data warehouse. So you
will probably write some sample code or read through documents to
ensure that it does it.
• Data modeling: - Logical and Physical data model diagram. This can be ER diagrams or
probably some format that the client understands.
• ETL: - DTS packages, Scripts and Metadata.
• OLAP Design:-Documents which show design of CUBES / DIMENSIONS and OLAP
CUBE report.
• Front end coding: - Actual source code, Source code documentation and deployment
documentation.
• Tuning: - This will be a performance-tuning document. What performance level we are
looking at and how will we achieve it or what steps will be taken to do so. It can also
include what areas / reports are we targeting performance improvements.
• UAT: - This is normally the test plan and test case document. It can be a document, which
has steps how to create the test cases, and expected results.
• Production: - In this phase, normally the entire data warehouse project is the deliverable.
But you can also have handover documents of the project, hardware, network settings, in
short how is the environment setup.
• Maintenance: - This is an on going process and mainly has documents like error fixed,
issues solved, within what time the issues should be solved and within what time it was
solved.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.15: - Start Analysis Server
As said before we are going to use “North Wind” database for showing analysis server demo.
We are not going use all tables from “North Wind”. Below are the only tables we will be
operating using. Leaving the “FactTableCustomerByProduct” all other tables are self-
explanatory. Ok I know I have still not told you what we want to derive from this whole exercise.
We will try to derive a report how much products are bought by which customer and how much
products are sold according to which country. So I have created the fact table with three fields
Customerid, Productid and the Total Products sold. All the data in Fact table I have loaded from
“Orders” and “Order Details”. Means I have taken all customerid and productid with there
respective totals and made entries in Fact table.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.17: - Fact Table
Ok I have created my fact table and populated using our ETL process. Now its time to use this
fact table to do analysis.
So let us start our BI studio as shown in figure below.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.19: - Select Analysis Services Project
I have name the project as “Analysis Project”. You can see the view of the solution explorer.
Data Sources: - This is where we will define our database and connection.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.21: - Create new data Source
After that click next and you have to define the connection for the data source, which you can do
by clicking on the new button. Click next to complete the data source process.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.23: - Create new Data source view
So here, we will select only two tables “Customers”, “Products” and the fact table.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
We had said previously fact table is a central table for dimension table. You can see products and
customers table form the dimension table and fact table is the central point. Now drag and drop
from the “Customerid” of fact table to the “Customerid” field of the customer table. Repeat the
same for the “productid” table with the products table.
Check “Auto build” as we are going to let the analysis service decide which tables he want to
decide as “fact” and “Dimension” tables.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.26: - Check Auto build
After that comes the most important step, which are the fact tables and which are dimension
tables. A SQL Analysis service decides by itself, but we will change the values as shown in figure
below.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.28: - Specify measures
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.29: - Deploy Solution
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Dimensions Works with the cube dimensions
• Calculations Works with calculations for the cube
• KPIs Works with Key Performance Indicators for the cube
• Actions Works with cube actions
• Partitions Works with cube partitions
• Perspectives Works with views of the cube
• Translations Defines optional transitions for the cube
• Browser Enables you to browse the deployed cube
Once you are done with the complete process drag drop the fields as shown by the arrows below.
Figure 8.32: - Drag and Drop the fields over the designer
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.33: - Final look of the CUBE
Once you have dragged dropped the fields you can see the wonderful information unzipped
between which customer has bought how many products.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
This is the second report that says in which country I have sold how many products.
(Q) What are the different problems that “Data mining” can solve?
There are basically four problems that “Data mining” can solve:-
Analyzing Relationships
This term is also often called as “Link Analysis”. For instance, one of the companies who sold
adult products did an age survey of his customers. He found his entire products where bought by
customers between age of 25 – 29. He further became suspicious that all of his customers must
have kids around 2 to 5 years as that’s the normal age of marriage. He analyzed further and found
that maximum of his customers where married with kids. Now the company can also try selling
kid products to the same customer as they will be interested in buying it, which can tremendously
boost up his sales. Now here the link analysis was done between the “age” and “kids” decide a
marketing strategy.
Choosing right Alternatives
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
If a business wants to make a decision between choices data mining can come to rescue. For
example one the companies saw a major resignation wave in his company. So the HR decided to
have a look at employee’s joining date. They found that major of the resignations have come
from employee’s who have stayed in the company for more than 2 years and there where some
resignation’s from fresher. So the HR made decision to motivate the fresher’s rather than 2 years
completed employee’s to retain people. As HR, thought it has easy to motivate fresher’s rather
than old employees.
Prediction
Prediction is more about forecasting how the business will move ahead. For instance company
has sold 1000 Shoe product items, if the company puts a discount on the product sales can go up
to 2000.
Improving the current process.
Past data can be analyzed to view how we can improve the business process. For instance for past
two years company has been distributing product “X” using plastic bags and product “Y” using
paper bags. Company has observed closely that product “Y” sold the same amount as product
“X” but has huge profits. Company further analyzed that major cost of product “X” was due to
packaging the product in plastic bags. Now the company can improve the process by using the
paper bags and bringing down the cost and thus increasing profits.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
This can also be called as loading and cleaning of data or to remove unnecessary information to
simplify data. For example you will be getting data for title as "Mr.", "M.r.", "Miss", "Ms" etc ...
Hmm can go worst if these data are maintained in numeric format "1", "2", "6" etc...This data
needs to be cleaned for better results.
You also need to consolidate data from various sources like EXCEL, Delimited Text files; any
other databases (ORACLE etc).
Microsoft SQL Server 2005 Integration Services (SSIS) contains tools, which can be used for
cleaning and consolidating from various services.
Exploring Models
Data mining / Explore models means calculating the min and max values, look in to any serious
deviations that are happening, and how is the data distributed. Once you see the data you can look
in to if the data is flawed or not. For instance normal hours in a day is 24 and you see some data
has more than 24 hours, which is not logical. You can then look in to correcting the same.
Data Source View Designer in BI Development Studio contains tools, which can let you analyze
data.
Building Models
Data derived from Exploring models will help us to define and create a mining model. A model
typically contains input columns, an identifying column, and a predictable column. You can then
define these columns in a new model by using the Data Mining Extensions (DMX) language or
the Data Mining Wizard in BI Development Studio.
After you define the structure of the mining model, you process it, populating the empty structure
with the patterns that describe the model. These are known as training the model. Patterns are
found by passing the original data through a mathematical algorithm. SQL Server 2005 contains a
different algorithm for each type of model that you can build. You can use parameters to adjust
each algorithm.
A mining model is defined by a data mining structure object, a data mining model object, and a
data mining algorithm.
Verification of the models.
By using viewers in Data Mining Designer in BI Development Studio, you can test / verify how
well these models are performing. If you find you need any refining in the model you have to
again iterate to the first step.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.36: - Data mining life Cycle.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
DB) How are models actually derived?
Twist: - What is Data Mining Algorithms?
Data mining Models are created using Data mining algorithm’s. So to derive a model you apply
Data mining algorithm on a set of data. Data mining algorithm then looks for specific trends and
patterns and derives the model.
Note : - Now we will go through some algorithms which are used in “Data
Mining” world. If you are looking out for pure “Data Mining” jobs,
these basic question will be surely asked. Data mining algorithm is not
Microsoft proprietary but is old math’s which is been used by Microsoft
SQL Server. The below section will look like we are moving away from
SQL Server but trust me…if you are looking out for data mining jobs
these questions can be turning point.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.38: - First Iteration Decision Tree
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• If we target customer with Age group 18-25 we will have good sales.
• All income drawers above 5000 will always have sales.
Classification
• Customer classification by Age.
• Customer classification depending on income amount.
P (Shirts) = 0.1
P (Pants) = 0.01
Now suppose we a customer comes to buys pants how much is the probability he will buy a shirt
and vice-versa. According to theorem:-
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
So you can see if the customer is buying shirts there is a huge probability that he will buy pants
also. So you can see naïve bayes algorithm is use for predicting depending on existing data.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.41: - Artificial Neuron Model
Above is the figure, which shows a neuron model. We have inputs (I1, I2 … IN) and for every
input there are weights (W1, W2 …. WN) attached to it. The ellipse is the “NEURON”. Weights
can have negative or positive values. Activation value is the summation and multiplication of all
weights and inputs coming inside the nucleus.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.42: - Neural Network Data
For instance, take the case of the top customer sales data. Below is the neural network defined for
the above data.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.44: - Practical Neural Network
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) Explain Association algorithm in Data mining?
Association algorithm tries to find relation ship between specific categories of data. In
Association first it scans for unique values and then the frequency of values in each transaction is
determined. For instance if lets say we have city master and transactional customer sales table.
Association algorithm first find unique instance of all cities and then see how many city
occurrences have occurred in the customer sales transactional table.
Note: - UUUh I understand algorithm are dreaded level question and will
never be asked for programmer level job, but guys looking for Data
mining jobs these questions are basic. It’s difficult to cover all
algorithms existing in data mining world, as its complete area by
itself. As been an interview question book I have covered algorithm
which are absolutely essential from SQL Server point of view. Now we
know the algorithms we can classify where they can be used. There are
two important classifications in data mining world Prediction /
Forecasting and grouping. So we will classify all algorithms which are
shipped in SQL server in these two sections only.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Microsoft Decision Trees Algorithm
Finding groups of similar items, for example, to segment demographic data into groups to better
understand the relationships between attributes.
• Microsoft Clustering Algorithm
• Microsoft Sequence Clustering Algorithm
Why we went through all these concepts is when you create data mining model you have to
specify one the algorithms. Below is the snapshot of all SQL Server existing algorithms.
Note: - During interviewing it’s mostly the theory that counts and the
way you present. For datamining I am not showing any thing practical as
such probably will try to cover this thing in my second edition. But
it’s a advice please do try to run make a small project and see how
these techniques are actually used.
(DB) How does data mining and data warehousing work together?
Twist: - What is the difference between data warehousing and data mining?
This question will be normally asked to get an insight how well you know the whole process of
data mining and data warehousing. Many new developers tend to confuse data mining with
warehousing (especially fresher’s). Below is the big picture which shows the relation between
“data warehousing” and “data mining”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 8.46: - Data mining and Data Warehousing
Let us start from the most left hand side of the image. First section comes is the transaction
database. This is the database in which you collect data. Next process is the ETL process. This
section extracts data from the transactional database and sends to your data warehouse, which is
designed using STAR or SNOW FLAKE model. Finally, when your data warehouse data is
loaded in data warehouse, you can use SQL Server tools like OLAP, Analysis Services, BI,
Crystal reports or reporting services to finally deliver the data to the end user.
Note: - Interviewer will always try goof you up saying why should not
we run OLAP, Analysis Services, BI, Crystal reports or reporting
services directly on the transactional data. That is because
transactional database are in complete normalized form which can make
the data mining process complete slow. By doing data warehousing we
denormalize the data which makes the data mining process more
efficient.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is XMLA?
XML for Analysis (XMLA) is fundamentally based on web services and SOAP. Microsoft SQL
Server 2005 Analysis Services uses XMLA to handle all client application communications to
Analysis Services.
XML for Analysis (XMLA) is a Simple Object Access Protocol (SOAP)-based XML protocol,
designed specifically for universal data access to any standard multidimensional data source
residing on the Web. XMLA also eliminates the need to deploy a client component that exposes
Component Object Model (COM) or Microsoft .NET Framework.
Note: - We had seen some question on DTS in the previous chapter “Data
Warehousing”. But in order to just make complete justice with this
topic I have included them in integration services.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.1: - Location of DTS Import and Export Wizard
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.3: - Specify the Data Source.
Next step is to specify the destination where the source will be moved. At this moment, we are
moving data inside “AdventureWorks” itself so specify the same database as the source.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.5: - Specify option
Finally choose which object you want to map where. You can map multiple objects if you want.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.6: - “Salesperson” is mapped to “SalesPersonDummy”
When everything goes successful you can see the below screen, which shows the series of steps
DTS has gone through.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.8: - Data Transformation Pipeline
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Container: - Container logically groups task. For instance you have a task to
load CSV file in to database. So you will have two or three task probably :-
o Parse the CSV file.
o Check for field data type
o Map the source field to the destination.
So you can define all the above work as task and group them logically in to a container called as
Container.
• Package: - Package are executed to actually do the data transfer.
DTP and DTR model expose API, which can be used in .NET language for better control.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.11: - New Project DTS
Give name to the project as “Salesperson” project. Before moving ahead let me give a brief about
what we are trying to do. We are going to use “Sales.SalesPerson” table from the
“adventureworks” database. “Sales.Salesperson” table has field called as “Bonus”. We have the
following task to be accomplished:-
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.13 : - Snapshot of my database with both tables
One you selected the “Data transformation project” , you will be popped with a designer explorer
as show below. I understand you must be saying its cryptic…it is. But let’s try to simplify it. At
the right hand, you can see the designer pane, which has lot of objects on it. At right hand side
you can see four tabs (Control flow, Data Flow, Event handlers and Package Explorer).
Control flow: - It defines how the whole process will flow. For example if you loading a CSV
file. Probably you will have task like parsing, cleaning and then loading. You can see lot of
control flow items, which can make your data mining task easy. First we have to define a task in
which we will define all our data flows. Therefore, you can see the curve arrow, which defines
what you have to drag and drop on the control flow designer. You can see the arrow tip, which
defines the output point from the task.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.15: - Multiple Task CSV
Data Flow: - Data flow say how the objects will flow inside a task. So Data flow is subset of a
task defining the actual operations.
Event Handlers: - The best of part of DTS is that we can handle events. For instance if there is
an error what action do you want it to do. Probably log your errors in error log table, flat file or be
more interactive send a mail.
Now that you have defined your task its time to define the actual operation that will happen with
in the task. We have to move data from “Sales.SalesPerson” to “Sales.SalesPerson5000” (if their
“Bonus” fields are equal to 5000) and “Sales.SalesPersonNot5000” (if their “Bonus” fields are
not equal to 5000). In short, we have “Sales.SalesPerson” as the source and other two tables as
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Destination. So click on the “Data Flow” tab and drag the OLEDB Source data flow item on the
designer, we will define source in this item. You can see that there is some error which is shown
by a cross on the icon. This signifies that you need to specify the source table that is
“Sales.Salesperson”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.19: - Connection Manager
If the connection credentials are proper you can see the connection in the “Connections” tab as
shown in below figure.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.20: - Connection Added Successfully
Now that we have defined the connection, we have to associate that connection with the OLE DB
source. So right click and select the “Edit” menu.
Once you click edit, you will see a dialog box as shown below. In data access mode select “Table
or View” and select the “Sales.Salesperson” table. To specify the mapping click on “Columns”
tab and then press ok.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.22: - Specify Connection Values
If the credentials are ok you can see the red Cross is gone and the OLE DB source is not ready to
connect further. As said before we need to move data to appropriate tables on condition that
“Bonus” field value. So from the data flow item drag and drop the “Conditional Split” data flow
item.
Right click on the “Conditional Split” data flow item so that you can specify the criteria. It also
gives you a list of fields in the table which you can drag drop. You can also dragdrop the
operators and specify the criteria. I have made two outputs from the conditional split one, which
is equal to 5000 and second, not equal to 5000.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.24: - Specifying Conditional Split Criteria
Conditional split now has two outputs one which will go in “Sales.SalesPerson5000” and other in
“Sales.SalesPersonNot5000”. Therefore, you have to define two destination and the associate
respective tables to it. So drag two OLE DB destination data flow items and connect it the two
outputs of conditional split.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.25: - Specify Destination
When you drag from the conditional split items over OLEDB destination items it will pop up a
dialog to specify which output this destination has to be connected. Select the one from drop
down and press ok. Repeat this step again for the other destination object.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 9.27: - Final DTS
Its time to build and run the solution, which you can do from the drop down. To run the DTS you
press the green icon as pointed by arrow in the below figure. After you run query both the tables
have the appropriate values or not.
Note: - You can see various data flow items on the right hand side;
it’s out of the scope to cover all items ( You must be wondering how
much time this author will say out of scope , but its fact guys
something you have to explore). In this sample project we needed the
conditional split so we used it. Depending on projects you will need to
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
explore the toolbox. It’s rare that any interviewer will ask about
individual items but rather ask fundamentals or general overview of how
you did DTS.
(Q) What are the scenarios you will need multiple databases with
schema?
Following are the situations you can end up in to multi-databases architecture:-
24x7 Hours uptime systems for online systems
This can be one of the major requirements for duplicating SQL Server’s across network. For
instance, you have a system, which is supposed to be 24 hours online. This system is hosted in a
central database, which is far away in terms of Geographic’s. As said first that this system should
be 24 hours online, in case of any break over from the central server we hosted one more server,
which is inside the premises. So the application detects that it cannot connect to the online server
so it connects to the premises server and continues working. Later in the evening using replication
all the data from the local SQL Server is sent to the central server.
License problems
SQL Server per user usage has a financial impact. So many of the companies decide to use
MSDE, which is free, so that they do not have to pay for the client licenses. Later every evening
or in some specific interval this all data is uploaded to the central server using replication.
Geographical Constraints
It is if the central server is far away and speed is one of the deciding criteria.
Reporting Server
In big multi-national sub-companies are geographically far away and the management wants to
host a central reporting server for the sales, which they want to use for decision making and
marketing strategy. So here, the transactional SQL Server’s entire database is scattered across the
sub-companies and then weekly or monthly we can push all data to the central reporting server.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 10.1: - Replication in Action
You can see from the above figure how data is consolidated in to a central server, which is hosted
in India using replication.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
and deploy to the client side. Any changes after this will be a new
version.
One of the primary requirements of a replication is that the schemas, which should be replicated
across, should be consistent. If you are keeping on changing schema of the server then replication
will have huge difficulty in synchronizing. So if you are going to have huge and continuous
changes in the database schema rethink over replication option. Or else a proper project
management will help you solve this.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) Can a publication support push and pull at one time?
A publication mechanism can have both. But a subscriber can have only one model for one
publication. In short, a subscriber can either be in push mode or pull mode for a publication, but
not both.
Note: - In snapshot you will also be sending data which has not
changed.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is the actual location where the distributor runs?
You can configure where the distributor will run from SQL Server. But normally if it’s a pull
subscription it runs at the subscriber end and for push subscription it runs on the publisher side.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 10.4: - Merge Replication
Merge agent stands in between subscriber and publisher. Any conflicts are resolved through
merge agent in turn, which uses conflict resolution. Depending how you have configured the
conflict resolution the conflicts are resolved by merge agent.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is a transactional replication?
Transactional replication as compared to snapshot replication does not replicate full data, but only
replicates when anything changes or something new is added to the database. So whenever on
publisher side we have INSERT, UPDATE and DELETE operations, these changes are tracked
and only these changes are sent to the subscriber end. Transactional Replication is one of the
most preferred replication methodologies as they send least amount of data across network.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 10. 6: - Create new publication
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 10.8: - Specify Type of replication
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 10.9: - Specify under which agent it will run under
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 10.11: - Specify which objects you want to replicate
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 10.12: - Replication in Action
Note: - I know every one screaming this is a part of Data mining and
warehousing. I echo the same voice with you my readers, but not
necessarily. When you want to derive reports on OLTP systems this is
the best way to get your work done. Secondly reporting services is used
so much heavily in projects now a day that it will be completely unfair
to discuss this topic in a short way as subsection of some chapter.
(Q) Can you explain how can we make a simple report in reporting
services?
We will be using “AdventureWorks” database for this sample. We would like to derive a report
how much quantity sales were done per product. For this sample we will have to refer three tables
Salesorderdetails, Salesorderheader and product table. Below is the SQL which also shows what
the relationship between those tables is:-
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
inner join Sales.Salesorderheader
on Sales.Salesorderheader.salesorderid=
Sales.Salesorderdetail.Salesorderid
inner join production.product
on production.product.productid=sales.salesorderdetail.productid
group by production.product.Name
Therefore, we will be using the above SQL and trying to derive the report using reporting
services.
First click on business intelligence studio menu in SQL Server 2005 and say File --> New -->
Project. Select the “Report” project wizard. Let’s give this project name “TotalSalesByProduct”.
You will be popped with a startup wizard as shown below.
Click next and you will be prompted to input data source details like type of server, connection
string and name of data source. If you have the connection string just paste it on the text area or
else click edit to specify connection string values through GUI.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 11.2: - Specify Data Source Details
As we are going to use SQL Server for this sample specify OLEDB provider for SQL Server and
click next.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
After selecting the provider specify the connection details, which will build your connection
string. You will need to specify the following details Server Name, Database name and security
details.
This is the most important step of reporting services, specifying SQL. You remember the top
SQL we had specified the same we are pasting it here. If you are not sure about the query, you
can use the query builder to build your query.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 11.5: - SQL Query
Now it is the time to include the fields in reports. At this moment, we have only two fields name
of product and total sales.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 11.7: - Specify field positions
Finally, you can preview your report. In the final section, there are three tabs data, layout and
preview. In data tab, you see your SQL or the data source. In layout tab, you can design your
report most look and feel aspect is done in this section. Finally below is the preview where you
can see your results.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 11.8: - Final view of the report
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 11.9: - stored procedure in the query builder
You have to also specify the command type from the data tab.
Figure 11.10: - Specify the command type from the Data tab.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 11.11: - Reporting Services Architecture
Report designer
This is an interactive GUI, which will help you to design and test your reports.
Reporting Service Database
After the report is designed, they are stored in XML format. These formats are in RDL (Report
Design Layout) formats. These entire RDL format are stored in Report Service Database.
Report Server
Report Server is nothing but an ASP.NET application running on IIS Server. Report Server
renders and stores these RDL formats.
Report Manager
It’s again an ASP.NET web based application, which can be used by administrators to control
security and managing reports. From administrative perspective that have the authority to create
the report, run the report etc...
You can also see the various formats, which can be, generated XML, HTML etc using the report
server.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What is ACID?
“ACID” is a set of rule which are laid down to ensure that “Database transaction” is reliable.
Database transaction should principally follow ACID rule to be safe. “ACID” is an acronym,
which stands for:-
• Atomicity
A transaction allows for the grouping of one or more changes to tables and rows in the database
to form an atomic or indivisible operation. That is, either all of the changes occur or none of them
do. If for any reason the transaction cannot be completed, everything this transaction changed can
be restored to the state it was in prior to the start of the transaction via a rollback operation.
• Consistency
Transactions always operate on a consistent view of the data and when they end always leave the
data in a consistent state. Data may be said to be consistent as long as it conforms to a set of
invariants, such as no two rows in the customer table have the same customer id and all orders
have an associated customer row. While a transaction executes these invariants may be violated,
but no other transaction will be allowed to see these inconsistencies, and all such inconsistencies
will have been eliminated by the time the transaction ends.
• Isolation
To a given transaction, it should appear as though it is running all by itself on the database. The
effects of concurrently running transactions are invisible to this transaction, and the effects of this
transaction are invisible to others until the transaction is committed.
• Durability
Once a transaction is committed, its effects are guaranteed to persist even in the event of
subsequent system failures. Until the transaction commits, not only are any changes made by that
transaction not durable, but are guaranteed not to persist in the face of a system failure, as crash
recovery will rollback their effects.
The simplicity of ACID transactions is especially important in a distributed database environment
where the transactions are being made simultaneously.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 13.1: - Different Types of Transaction Points
There are two paths defined in the transaction one which rollbacks to the main state and other
which rollbacks to a “tran1”. You can also see “tran1” and “tran2” are planted in multiple places
as bookmark to roll-back to that state.
Brushing up the syntaxes
To start a transaction
BEGIN TRAN Tran1
Creates a book point
SAVE TRAN PointOne
This will roll back to point one
ROLLBACK TRAN PointOne
This commits complete data right when Begin Tran point
COMMIT TRAN Tran1
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) What are “Implicit Transactions”?
In order to initiate a transaction we use “Begin Tran Tran1” and later when we want to save
complete data we use “Commit Tran <TransactionName>”. In SQL Server you can define to start
transaction by default i.e. with out firing “Begin Tran Tr1”. You can set this by using:-
SET IMPLICIT_TRANSACTIONS ON
So after the above command is fired any SQL statements that are executed will be by default in
transaction. You have to only fire “Commit Tran <Transaction Name>” to close the transaction.
For instance, the above figure depicts the concurrency problem. “Mr X” started viewing
“Record1” after some time “MR Y” picks up “Record1” and starts updating it. So “Mr X” is
viewing data which is not consistent with the actual database.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 13.3: - Locking implemented
In our first question we saw the problem above is how locking will work. “Mr. X” retrieves
“Record1” and locks it. When “Mr Y” comes in to update “Record1” he can not do it as it’s
been locked by “Mr X”.
Note: - What I have showed is small glimpse, in actual situations there
are different types of locks we will going through each in the coming
questions.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 13.4: - Dirty Reads
“Dirty Read” occurs when one transaction is reading a record, which is part of a half, finished
work of other transaction. Above figure defines the “Dirty Read” problem in a pictorial format. I
have defined all activities in Step’s which shows in what sequence they are happening (i.e. Step1,
Step 2 etc).
• Step1: - “Mr. Y” Fetches “Record” which has “Value=2” for updating it.
• Step2:- In Mean Time, “Mr. X” also retrieves “Record1” for viewing. He also
sees it as “Value=2”.
• Step3:- While “Mr. X” is viewing the record, concurrently “Mr. Y” updates it
as “Value=5”. Boom… the problem “Mr. X” is still seeing it as “Value=3”,
while the actual value is “5”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 13.5: - Unrepeatable Read
In every data read if you get different values then it’s an “Unrepeatable Read” problem. Lets try
to iterate through the steps of the above given figure:-
• Step1:- “Mr. X” get “Record” and sees “Value=2”.
• Step2:- “Mr. Y” meantime comes and updates “Record1” to “Value=5”.
• Step3:- “Mr. X” again gets “Record1” ohh... values are changed “2” …
Confusion.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 13.6: - Phantom Rows
If “UPDATE” and “DELETE” SQL statements seems to not affect the data then it can be
“Phantom Rows” problem.
• Step1:- “Mr. X” updates all records with “Value=2” in “record1” to “Value=5”.
• Step2:- In mean time “Mr. Y” inserts a new record with “Value=2”.
• Step3:- “Mr. X” wants to ensure that all records are updated, so issues a select
command for “Value=2”….surprisingly find records which “Value=2”…
So “Mr. X” thinks that his “UPDATE” SQL commands are not working properly.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What are “Lost Updates”?
“Lost Updates” are scenario where one updates which is successfully written to database is over-
written with other updates of other transaction. So let’s try to understand all the steps for the
above figure:-
• Step1:- “Mr. X” tries to update all records with “Value=2” to “Value=5”.
• Step2:- “Mr. Y” comes along the same time and updates all records with
“Value=5” to “Value=2”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Step3 :- Finally the “Value=2” is saved in database which is inconsistent
according to “Mr. X” as he thinks all the values are equal to “2”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Figure 13.8: - Different Lock sequence in actual scenarios
• Step1:- First transaction issues a “SELECT” statement on the resource, thus acquiring a
“Shared Lock” on the data
• Step2:- Second transaction also executes a “SELECT” statement on the resource, which
is permitted as “Shared” lock is honored by “Shared” lock.
• Step3:- Third transaction tries to execute an “Update” SQL statement. As it’s a “Update”
statement it tries to acquire an “Exclusive”. But because we already have a “Shared” lock
on it, it acquires a “Update” lock.
• Step4:- The final transaction tries to fire “Select” SQL on the data and try to acquire a
“Shared” lock. But it can not do until the “Update” lock mode is done.
So first “Step4” will not be completed until “Step3” is not executed. When “Step1” and “Step2”
is done “Step3” make the lock in to “Exclusive” mode and updates the data. Finally, “Step4” is
completed.
• Intent Locks: - When SQL Server wants to acquire a “Shared” lock or an “Exclusive”
lock below the hierarchy you can use “Intent” locks. For instance one of the transactions
has acquired as table lock and you want to have row level lock you can use “Intent”
locks. Below are different flavors of “Intent” locks but with one main intention to acquire
locks on lower level:-
o Intent locks include:
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
o Intent shared (IS)
o Intent exclusive (IX)
o Shared with intent exclusive (SIX)
o Intent update (IU)
o Update intent exclusive (UIX)
o Shared intent update (SIU)
• Schema Locks: - Whenever you are doing any operation, which is related to “Schema”
operation, this lock is acquired. There are basically two types of flavors in this :-
o Schema modification lock (Sch-M):- Any object structure change using ALTER,
DROP, CREATE etc will have this lock.
o Schema stability lock (Sch-S) – This lock is to prevent “Sch-M” locks. These
locks are used when compiling queries. This lock does not block any transactional
locks, but when the Schema stability (Sch-S) lock is used, the DDL operations
cannot be performed on the table.
o Bulk Update locks:-Bulk Update (BU) locks are used during bulk copying of data
into a table. For example when we are executing batch process in mid-night over a
database.
o Key-Range locks: - Key-Range locks are used by SQL Server to prevent phantom
insertions or deletions into a set of records accessed by a transaction.
Below are different flavors of “Key-range” locks
o RangeI_S
o RangeI_U
o RangeI_X
o RangeX_S
o RangeX_U
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What are different types of Isolation levels in SQL Server?
Following are different Isolation levels in SQL Server:-
• READ COMMITTED
• READ UNCOMMITTED
• REPEATABLE READ
• SERIALIZABLE
(Q) If you are using COM+ what “Isolation” level is set by default?
In order to maintain integrity COM+ and MTS set the isolation level to “SERIALIZABLE”.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(Q) What are “Lock” hints?
This is for more control on how to use locking. You can specify how locking should be applied in
your SQL queries. This can be given by providing optimizer hints. “Optimizer” hints tells SQL
Server that escalate me to this specific lock level. Example the below query says to put table lock
while executing the SELECT SQL.
(Q) What are the steps you can take to avoid “Deadlocks”?
Below are some guidelines for avoiding “Deadlocks”:-
• Make database normalized as possible. As more small pieces, the system is better
granularity you have to lock which can avoid lot of clashing.
• Do not lock during user is making input to the screen, keep lock time as minimum as
possible by good design.
• As far as possible avoid cursors.
• Keep transactions as short as possible. One way to help accomplish this is to reduce the
number of round trips between your application and SQL Server by using the stored
procedures or keeping transactions with a single batch. Another way of reducing the time
a transaction takes to complete is to make sure you are not performing the same reads
repeatedly. If you do need to read the same data more than once, cache it by storing it in a
variable or an array, and then re-reading it from there.
• Reduce lock time. Try to develop your application so that it grabs locks at the latest
possible time, and then releases them at the very earliest time.
• If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK
• Consider using the NOLOCK hint to prevent locking if the data being locked is not
modified often.
• If appropriate, use as low of isolation level as possible for the user connection running the
transaction.
• Consider using bound connections.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
(DB) How can I know what locks are running on which resource?
In order to see the current locks on an “object” or a “process” expand the management tree and
right click on “Activity” tab. So in case you want to see “dead locks” or you want to terminate the
“dead lock” you can use this facility to get a bird-eye view.
SQL Server is a giant processing engine which processes various kinds of workloads like
SQL queries, transactions etc. In order to process these workloads appropriate CPU
power and RAM memory needs to be allocated.
Now workloads are of different nature some are light workloads while some are heavy.
You would never like that heavy SQL operations hijacking the complete CPU and
memory resources thus affecting other operations.
So one of the ways to achieve this is by identifying those SQL queries and putting
restriction on the maximum CPU and memory resource for those queries. So for example
as shown in the below figure if you have some heavy SQL which does reporting you
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
would like to allocate 80 % of the CPU and memory resources. While for light weight
SQL you would like to allocate only 20%.
This is achieved by using SQL Server governor.
See the video :- What is the use of SQL Server governor to see to how configure SQL Server
governor?
Others
What is hashing?
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Hashing is a process of converting string of characters into a shorter fixed-length value or key
which represents the original string. It has many uses like encryption (MD5 , SHA1 etc) or
creating a key by which the original values can be retrieved easily ( Hash tables collections in
.NET).
Change data capture helps to capture insert, update and delete activity in SQL Server.
EXEC sys.sp_cdc_enable_db
Once CDC is enabled on a database level, we need to also specify which tables needs to be
enabled for CDC. Below is a simple code snippet which shows how “Sales” table has been
enabled for CDC.
EXEC sys.sp_cdc_enable_table
@source_schema=N'dbo',
@source_name=N'Sales',
@role_name=NULL
Once CDC is enabled, you will find the below tables created for CDC. The most important table
is _CT table. For example you can see the below image, for the sales table it has created
“dbo_Sales_CT” table.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
Now if we modify any data in the “Sales” table the “Sales_CT” table will be affected. After any
modification on the “Sales” table, in “Sales_CT” table we will get two rows one with the old
value and the other with new value. Below image shows that “Rajendra” has been modified to
”Raju” in the “Sales” table.
How can we know in CDC what kind of operations have been done on
a record?
If you see the _CT table it has a column called as”_$operation”. This field will help us identify
what kinds of transactions are done with the data. Below are the possible values depending on
operation done on the data:-
• Delete Statement = 1
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel
• Insert Statement = 2
• Value before Update Statement = 3
• Value after Update Statement = 4
No, CDC needs SQL Server agent. Without it CDC will not function.
To get LIVE training, new topic release video updates install Telegram app & join us using - https://tinyurl.com/QuestPondChannel