SQL Server Interview Questions2
SQL Server Interview Questions2
Pakistan
M/s. Vanguard Books P Ltd, 45 The Mall, Lahore, Pakistan (Tel: 0092-42-7235767, 7243783 and
7243779 and Fax: 7245097)
E-mail: vbl@brain.net.pk
http://www.prakashbooks.com/details.php3?id=17875&c=Computer Books
http://www.amazon.co.uk/exec/obidos/ASIN/8183330770/qid%3D1139594062/026-8105897-
7667603
http://www.prakashbooks.com/details.php3?id=19008&c=Computer Books
http://www.amazon.co.uk/exec/obidos/ASIN/8183331033/qid%3D1136610981/026-1344994-
2263615#product-details
If you want to purchase the book directly through BPB Publication's delhi , India :-
bpb@bol.net or bpb@vsnl.com
www.questpond.com
Titles written by Shivprasad Koirala
-- .NET Interview questions
-- SQL Server Interview questions
-- Java Interview questions
-- C# and ASP.NET Projects
-- How to prepare Software quotations
-- Excel for office people.
-- Software Testing Interview questions
-- Hacking for beginners
-- Full one year course for C# , SQL Server - 40000 Rupees ( Installment Facility Available) Satur-
day and Sunday theory and practicals from Monday to Friday.
-- Full one year course for JAVA / J2EE - 40000 Rupees. ( Installment Facility Available)Saturday
and Sunday theory and practicals from Monday to Friday.
-- Full one year course for Testing - 40000 Rupees ( Installment Facility Available)Saturday and
Sunday theory and practicals from Monday to Friday
-- Crash course for C# , SQL Server - 10000 Rupees ( One month course) .
-- Crash course for JAVA / J2EE - 10000 Rupees ( One month course) ..
-- Crash course for Testing - 10000 Rupees ( One month course) .
-- Estimation course for project manager ( Full estimation course for project managers with hands
ww techniques). Every Saturday two hours course for 5000 Rupees
on Function points , UCP and other
w. two hours course for 5000 Rupees
-- Architecture course. Every Saturday q
ue
stp
on
d.c
om
From the Author
First thing thanks to all those who have sent me complaints and also appreciation for what ever titles
i have written till today. But interview question series is very near to my heart as i can understand
the pain of searching a job. Thanks to my publishers (BPB) , readers and reviewers to always excuse
all my stupid things which i always do.
So why is this PDF free ?. Well i always wanted to distribute things for free specially when its a
interview question book which can fetch a job for a developer. But i am also bounded with publish-
ers rules and regulations. And why not they have a whole team of editor, printing guys, designers,
distributors, shopkeepers and including me. But again the other aspect, readers should know of what
they are buying , the quality and is it really useful to buy this book. So here are sample free ques-
tions which i am giving out free to the readers to see the worth of the book.
I can be contacted at shiv_koirala@yahoo.com its bit difficult to answer all answers but as i get time
i do it.
We have recently started a career counselling drive absolutely free for new comers and experienced
guys. So i have enlisted the following guys on the panel. Thanks to all these guys to accept the panel
job of consulting. Feel free to shoot them questions just put a title in the mail saying “Question about
Career”. I have always turned up to them when i had some serious career decision to take.
Shivprasad Koirala :- Not a great guy but as i have done the complete book i have to take up one of
the positions. You can contact me at shiv_koirala@yahoo.com for technical career aspect.
Tapan Das :- If you think you are aiming at becoming a project manager he is the right person to
consult. He can answer all your questions regarding how to groom your career as a project manager
tapand@vsnl.com.
Kapil Siddharth :- If you are thinking to grow as architect in a company then he is a guy. When it
comes to role model as architect i rate this guy at the top. You can contact him at
kapilsiddharth@hotmail.com
Second if you think you can help the developers mail me at shiv_koirala@yahoo.com and if i find
you fitting in the panel i will display your mail address. Please note there are no financial rewards as
such but i am sure you will be proud of the work you are doing and whos knows what can come up.
Lets make Software Industry a better place to work ..... Happy Job Hunting and Best of Luck
Sample SQLSERVER Interview
Questions
by
Shivprasad Koirala
Including SQLCLR, XML Integration, Database
optimization, Data warehousing, Data mining and
reporting services
The Table of contents is different from what is available in traditional books.So rather than reading
through the whole book just look at what questions you feel uncomfortable and revise that.
Contents
Introduction ................................................................................................................. 38
Dedication .................................................................................................................... 38
About the author .......................................................................................................... 38
..................................................................................................................................... 40
Introduction ................................................................................................................. 40
How to read this book .................................................................................................. 41
Software Company hierarchy ...................................................................................... 41
Resume Preparation Guidelines................................................................................... 45
Salary Negotiation ....................................................................................................... 47
Points to remember ...................................................................................................... 48
1. Database Concepts ................................................................................................... 51
What is database or database management systems (DBMS)? ................................... 51
What’s difference between DBMS and RDBMS ?...................................................... 52
(DB)What are CODD rules?........................................................................................ 53
Is access database a RDBMS? ..................................................................................... 56
What’s the main difference between ACCESS and SQL SERVER? ........................... 56
What’s the difference between MSDE and SQL SERVER 2000?............................... 60
What is SQL SERVER Express 2005 Edition? ........................................................... 61
(DB) What is SQL Server 2000 Workload Governor? ................................................ 62
What’s the difference between SQL SERVER 2000 and 2005?.................................. 62
What are E-R diagrams? .............................................................................................. 65
How many types of relationship exist in database designing? .................................... 66
What is normalization? What are different type of normalization?............................. 69
What is denormalization ? ........................................................................................... 71
(DB) Can you explain Fourth Normal Form? ............................................................. 72
(DB) Can you explain Fifth Normal Form? ................................................................ 72
(DB) What’s the difference between Fourth and Fifth normal form? ......................... 74
(DB) Have you heard about sixth normal form? ......................................................... 74
What is Extent and Page? ............................................................................................ 74
(DB)What are the different sections in Page? ............................................................. 74
What are page splits? ................................................................................................... 75
In which files does actually SQL Server store data? ................................................... 75
What is Collation in SQL Server? ............................................................................... 76
(DB)Can we have a different collation for database and table? .................................. 78
2. SQL .......................................................................................................................... 79
Revisiting basic syntax of SQL? ................................................................................. 79
What are “GRANT” and “REVOKE’ statements? ...................................................... 80
What is Cascade and Restrict in DROP table SQL? .................................................... 80
..................................................................................................................................... 80
How to import table using “INSERT” statement? ....................................................... 81
What is a DDL, DML and DCL concept in RDBMS world? ...................................... 81
What are different types of joins in SQL? ................................................................... 81
What is “CROSS JOIN”? ............................................................................................ 82
You want to select the first record in a given set of rows? .......................................... 82
How do you sort in SQL? ............................................................................................ 82
How do you select unique rows using SQL? ............................................................... 83
Can you name some aggregate function is SQL Server?............................................. 83
What is the default “SORT” order for a SQL? ............................................................ 83
What is a self-join? ...................................................................................................... 83
What’s the difference between DELETE and TRUNCATE ? ..................................... 84
Select addresses which are between ‘1/1/2004’ and ‘1/4/2004’? ................................ 84
What are Wildcard operators in SQL Server? ............................................................. 84
What’s the difference between “UNION” and “UNION ALL” ?................................ 86
What are cursors and what are the situations you will use them? ............................... 88
What are the steps to create a cursor?.......................................................................... 88
What are the different Cursor Types? .......................................................................... 90
What are “Global” and “Local” cursors? .................................................................... 92
What is “Group by” clause? ........................................................................................ 92
What is ROLLUP?....................................................................................................... 94
What is CUBE? ........................................................................................................... 96
What is the difference between “HAVING” and “WHERE” clause? ......................... 97
What is “COMPUTE” clause in SQL? ........................................................................ 98
What is “WITH TIES” clause in SQL? ....................................................................... 98
What does “SET ROWCOUNT” syntax achieves? ................................................... 100
What is a Sub-Query? ................................................................................................ 100
What is “Correlated Subqueries”? ............................................................................. 101
What is “ALL” and “ANY” operator? ...................................................................... 101
What is a “CASE” statement in SQL? ....................................................................... 101
What does COLLATE Keyword in SQL signify? ..................................................... 101
What is CTE (Common Table Expression)?.............................................................. 101
Why should you use CTE rather than simple views? ................................................ 102
What is TRY/CATCH block in T-SQL? .................................................................... 102
What is PIVOT feature in SQL Server? .................................................................... 102
What is UNPIVOT?................................................................................................... 104
What are RANKING functions?................................................................................ 104
What is ROW_NUMBER()? ..................................................................................... 104
What is RANK() ? ..................................................................................................... 104
What is DENSE_RANK()? ....................................................................................... 105
What is NTILE()? ...................................................................................................... 106
(DB)What is SQl injection ? ...................................................................................... 107
3. .NET Integration .................................................................................................... 108
What are steps to load a .NET code in SQL SERVER 2005? ................................... 108
How can we drop an assembly from SQL SERVER? ............................................... 108
Are changes made to assembly updated automatically in database? ......................... 108
Why do we need to drop assembly for updating changes?........................................ 108
How to see assemblies loaded in SQL Server?.......................................................... 108
I want to see which files are linked with which assemblies? .................................... 108
Does .NET CLR and SQL SERVER run in different process? ................................. 109
Does .NET controls SQL SERVER or is it vice-versa? ............................................. 110
Is SQLCLR configured by default? ........................................................................... 110
How to configure CLR for SQL SERVER? .............................................................. 110
Is .NET feature loaded by default in SQL Server? .................................................... 111
How does SQL Server control .NET run-time? ......................................................... 111
What’s a “SAND BOX” in SQL Server 2005? ......................................................... 112
What is an application domain? ................................................................................. 113
How are .NET Appdomain allocated in SQL SERVER 2005?.................................. 114
What is Syntax for creating a new assembly in SQL Server 2005? .......................... 115
Do Assemblies loaded in database need actual .NET DLL? ..................................... 115
You have a assembly which is dependent on other assemblies, will SQL Server load
the dependent assemblies? .................................................................................... 115
Does SQL Server handle unmanaged resources? ...................................................... 115
What is Multi-tasking? .............................................................................................. 115
What is Multi-threading? ........................................................................................... 115
What is a Thread ? ..................................................................................................... 116
Can we have multiple threads in one App domain? .................................................. 116
What is Non-preemptive threading? .......................................................................... 116
What is pre-emptive threading? ................................................................................. 116
Can you explain threading model in SQL Server? .................................................... 116
How does .NET and SQL Server thread work? ......................................................... 116
How is exception in SQLCLR code handled? ........................................................... 117
Are all .NET libraries allowed in SQL Server? ......................................................... 117
What is “Hostprotectionattribute” in SQL Server 2005? .......................................... 117
How many types of permission level are there for an assembly?.............................. 118
In order that an assembly gets loaded in SQL Server what type of checks are done?118
Can you name system tables for .NET assemblies? .................................................. 119
Are two version of same assembly allowed in SQL Server? ..................................... 121
How are changes made in assembly replicated?........................................................ 121
Is it a good practice to drop a assembly for changes? ............................................... 121
In one of the projects following steps where done, will it work? .............................. 121
What does Alter assembly with unchecked data signify? .......................................... 122
How do I drop an assembly? ..................................................................................... 122
Can we create SQLCLR using .NET framework 1.0? ............................................... 123
While creating .NET UDF what checks should be done? ......................................... 123
How do you define a function from the .NET assembly? ......................................... 123
Can you compare between T-SQL and SQLCLR? .................................................... 124
With respect to .NET is SQL SERVER case sensitive?............................................. 124
Does case sensitive rule apply for VB.NET? ............................................................ 125
Can nested classes be accessed in T-SQL? ................................................................ 125
Can we have SQLCLR procedure input as array? ..................................................... 125
Can object datatype be used in SQLCLR? ................................................................ 125
How’s precision handled for decimal datatypes in .NET? ........................................ 125
How do we define INPUT and OUTPUT parameters in SQLCLR? ......................... 126
Is it good to use .NET datatypes in SQLCLR? .......................................................... 127
How to move values from SQL to .NET datatypes? ................................................. 127
What is System.Data.SqlServer? ............................................................................... 127
What is SQLContext? ................................................................................................ 127
Can you explain essential steps to deploy SQLCLR? .............................................. 128
How do create function in SQL Server using .NET? ................................................ 133
How do we create trigger using .NET? ..................................................................... 134
How to create User Define Functions using .NET? .................................................. 134
How to create aggregates using .NET? ..................................................................... 135
What is Asynchronous support in ADO.NET? .......................................................... 135
What is MARS support in ADO.NET? .................................................................... 136
What is SQLbulkcopy object in ADO.NET? ............................................................. 136
How to select range of rows using ADO.NET?......................................................... 136
What are different types of triggers in SQl SERVER 2000 ? .................................... 136
If we have multiple AFTER Triggers on table how can we define the sequence of the
triggers ? ............................................................................................................... 137
How can you raise custom errors from stored procedure ? ....................................... 137
4. ADO.NET .............................................................................................................. 140
Which are namespaces for ADO.NET? ..................................................................... 140
Can you give a overview of ADO.NET architecture? ............................................... 140
What are the two fundamental objects in ADO.NET? .............................................. 141
What is difference between dataset and datareader? ................................................. 142
What are major difference between classic ADO and ADO.NET? ........................... 142
What is the use of connection object? ....................................................................... 142
What are the methods provided by the command object? ......................................... 142
What is the use of “Dataadapter”? ............................................................................. 143
What are basic methods of “Dataadapter”? ............................................................... 143
What is Dataset object? ............................................................................................. 144
What are the various objects in Dataset? ................................................................... 144
How can we connect to Microsoft Access, FoxPro, Oracle etc? ............................... 144
What’s the namespace to connect to SQL Server? .................................................... 145
How do we use stored procedure in ADO.NET? ....................................................... 146
How can we force the connection object to close? .................................................... 147
I want to force the datareader to return only schema? ............................................... 147
Can we optimize command object when there is only one row? .............................. 147
Which is the best place to store connectionstring? .................................................... 147
What are steps involved to fill a dataset? ................................................................. 148
What are the methods provided by the dataset for XML? ......................................... 149
How can we save all data from dataset? .................................................................... 149
How can we check for changes made to dataset? ...................................................... 149
How can we add/remove row’s in “DataTable” object of “DataSet”? ...................... 150
What’s basic use of “DataView”? .............................................................................. 151
What’s difference between “DataSet” and “DataReader”? ....................................... 151
How can we load multiple tables in a DataSet? ........................................................ 152
How can we add relation’s between table in a DataSet? ........................................... 152
What’s the use of CommandBuilder? ........................................................................ 153
What’s difference between “Optimistic” and “Pessimistic” locking? ....................... 153
How many way’s are there to implement locking in ADO.NET? ............................. 153
How can we perform transactions in .NET? .............................................................. 154
What’s difference between Dataset. clone and Dataset. copy? ................................. 155
Whats the difference between Dataset and ADO Recordset? .................................... 155
5. Notification Services ............................................................................................. 156
What are notification services?.................................................................................. 156
(DB)What are basic components of Notification services?....................................... 156
(DB)Can you explain architecture of Notification Services? .................................... 158
(DB)Which are the two XML files needed for notification services? ....................... 159
(DB)What is Nscontrols command? .......................................................................... 161
What are the situations you will use “Notification” Services? .................................. 162
6. Service Broker ....................................................................................................... 163
What do we need Queues?......................................................................................... 163
What is “Asynchronous” communication?................................................................ 163
What is SQL Server Service broker? ......................................................................... 163
What are the essential components of SQL Server Service broker? ......................... 163
What is the main purpose of having Conversation Group? ....................................... 164
How to implement Service Broker? .......................................................................... 164
How do we encrypt data between Dialogs? ............................................................... 170
7. XML Integration .................................................................................................... 171
What is XML? ........................................................................................................... 171
What is the version information in XML?................................................................. 171
What is ROOT element in XML? .............................................................................. 171
If XML does not have closing tag will it work? ........................................................ 171
Is XML case sensitive? .............................................................................................. 172
What’s the difference between XML and HTML? .................................................... 172
Is XML meant to replace HTML? ............................................................................. 172
Can you explain why your project needed XML? ..................................................... 172
What is DTD (Document Type definition)? .............................................................. 172
What is well formed XML? ....................................................................................... 172
What is a valid XML? ............................................................................................... 173
What is CDATA section in XML? ............................................................................. 173
What is CSS? ............................................................................................................. 173
What is XSL?............................................................................................................. 173
What is Element and attributes in XML? .................................................................. 173
Can we define a column as XML? ............................................................................ 173
How do we specify the XML data type as typed or untyped? ................................... 174
How can we create the XSD schema? ....................................................................... 174
How do I insert in to a table which has XSD schema attached to it? ........................ 176
What is maximum size for XML datatype? ............................................................... 176
What is Xquery? ........................................................................................................ 176
What are XML indexes? ............................................................................................ 177
What are secondary XML indexes? ........................................................................... 177
What is FOR XML in SQL Server? ........................................................................... 177
Can I use FOR XML to generate SCHEMA of a table and how? ............................. 177
What is the OPENXML statement in SQL Server? ................................................... 177
I have huge XML file which we want to load in database? ....................................... 178
How to call stored procedure using HTTP SOAP? ................................................... 178
What is XMLA? ........................................................................................................ 179
8. Data Warehousing/Data Mining ............................................................................ 180
What is “Data Warehousing”? ................................................................................... 180
What are Data Marts? ................................................................................................ 180
What are Fact tables and Dimension Tables? ............................................................ 180
(DB)What is Snow Flake Schema design in database? ............................................. 183
(DB)What is ETL process in Data warehousing? ...................................................... 184
(DB)How can we do ETL process in SQL Server? ................................................... 185
What is “Data mining”? ............................................................................................. 185
Compare “Data mining” and “Data Warehousing”?.................................................. 186
What is BCP?............................................................................................................. 187
How can we import and export using BCP utility? ................................................... 188
During BCP we need to change the field position or eliminate some fields how can we
achieve this? ......................................................................................................... 189
What is Bulk Insert? .................................................................................................. 191
What is DTS?............................................................................................................. 192
(DB)Can you brief about the Data warehouse project you worked on? .................... 193
What is an OLTP (Online Transaction Processing) System?..................................... 194
What is an OLAP (On-line Analytical processing) system? ..................................... 194
What is Conceptual, Logical and Physical model? ................................................... 195
(DB)What is Data purging? ....................................................................................... 195
What is Analysis Services? ........................................................................................ 195
(DB)What are CUBES? ............................................................................................. 196
(DB)What are the primary ways to store data in OLAP? .......................................... 196
(DB)What is META DATA information in Data warehousing projects? .................. 197
(DB)What is multi-dimensional analysis? ................................................................. 197
................................................................................................................................... 198
(DB)What is MDX?................................................................................................... 199
(DB)How did you plan your Data ware house project? ............................................ 199
What are different deliverables according to phases? ............................................... 202
(DB)Can you explain how analysis service works? .................................................. 203
What are the different problems that “Data mining” can solve? ............................... 219
What are different stages of “Data mining”? ............................................................. 220
(DB)What is Discrete and Continuous data in Data mining world? ......................... 223
(DB)What is MODEL is Data mining world? ........................................................... 223
DB)How are models actually derived? ...................................................................... 224
(DB)What is a Decision Tree Algorithm? ................................................................. 224
(DB)Can decision tree be implemented using SQL? ................................................. 226
(DB)What is Naïve Bayes Algorithm? ...................................................................... 226
(DB)Explain clustering algorithm? ........................................................................... 227
(DB)Explain in detail Neural Networks? .................................................................. 228
(DB)What is Back propagation in Neural Networks? ............................................... 231
(DB)What is Time Series algorithm in data mining? ................................................ 232
(DB)Explain Association algorithm in Data mining?................................................ 232
(DB)What is Sequence clustering algorithm? ........................................................... 232
(DB)What are algorithms provided by Microsoft in SQL Server?............................ 232
(DB)How does data mining and data warehousing work together? .......................... 234
What is XMLA? ........................................................................................................ 235
What is Discover and Execute in XMLA? ................................................................ 236
9. Integration Services/DTS ...................................................................................... 237
What is Integration Services import / export wizard? ............................................... 237
What are prime components in Integration Services? ............................................... 243
How can we develop a DTS project in Integration Services? ................................... 245
10. Replication ........................................................................................................... 258
Whats the best way to update data between SQL Servers? ....................................... 258
What are the scenarios you will need multiple databases with schema? ................... 258
(DB)How will you plan your replication? ................................................................. 259
What are publisher, distributor and subscriber in “Replication”? ............................. 260
What is “Push” and “Pull” subscription? .................................................................. 261
(DB)Can a publication support push and pull at one time? ....................................... 261
What are different models / types of replication? ...................................................... 262
What is Snapshot replication? ................................................................................... 262
What are the advantages and disadvantages of using Snapshot replication? ............ 262
What type of data will qualify for “Snapshot replication”? ...................................... 262
What’s the actual location where the distributor runs?.............................................. 263
Can you explain in detail how exactly “Snapshot Replication” works? ................... 263
What is merge replication? ........................................................................................ 264
How does merge replication works?.......................................................................... 264
What are advantages and disadvantages of Merge replication? ................................ 265
What is conflict resolution in Merge replication? ..................................................... 265
What is a transactional replication? ........................................................................... 266
Can you explain in detail how transactional replication works? ............................... 266
What are data type concerns during replications? ..................................................... 267
11. Reporting Services ............................................................................................... 272
Can you explain how can we make a simple report in reporting services? ............... 272
How do I specify stored procedures in Reporting Services? ..................................... 279
What is the architecture for “Reporting Services “?.................................................. 280
12. Database Optimization ........................................................................................ 283
What are indexes? ...................................................................................................... 283
What are B-Trees? ..................................................................................................... 283
I have a table which has lot of inserts, is it a good database design to create indexes on
that table?.............................................................................................................. 284
What are “Table Scan’s” and “Index Scan’s”? .......................................................... 285
What are the two types of indexes and explain them in detail? ................................ 286
(DB)What is “FillFactor” concept in indexes? .......................................................... 289
(DB) What is the best value for “FillFactor”? ........................................................... 289
What are “Index statistics”? ...................................................................................... 289
(DB)How can we see statistics of an index? ............................................................. 290
(DB) How do you reorganize your index, once you find the problem? .................... 294
What is Fragmentation? ............................................................................................. 294
(DB)How can we measure Fragmentation? ............................................................... 296
(DB)How can we remove the Fragmented spaces? ................................................... 296
What are the criteria you will look in to while selecting an index? .......................... 297
(DB)What is “Index Tuning Wizard”? ...................................................................... 298
(DB)What is an Execution plan? ............................................................................... 305
How do you see the SQL plan in textual format? ...................................................... 308
(DB)What is nested join, hash join and merge join in SQL Query plan?.................. 308
What joins are good in what situations? .................................................................... 310
(DB)What is RAID and how does it work ? .............................................................. 311
13. Transaction and Locks ......................................................................................... 313
What is a “Database Transactions “? ......................................................................... 313
What is ACID?........................................................................................................... 313
What is “Begin Trans”, “Commit Tran”, “Rollback Tran” and “Save Tran”? .......... 314
(DB)What are “Checkpoint’s” in SQL Server? ......................................................... 315
(DB)What are “Implicit Transactions”? .................................................................... 315
(DB)Is it good to use “Implicit Transactions”? ......................................................... 315
What is Concurrency? ............................................................................................... 316
How can we solve concurrency problems? ............................................................... 316
What kind of problems occurs if we do not implement proper locking strategy?..... 317
What are “Dirty reads”? ............................................................................................ 317
What are “Unrepeatable reads”?................................................................................ 319
What are “Phantom rows”? ....................................................................................... 320
What are “Lost Updates”? ......................................................................................... 321
What are different levels of granularity of locking resources? ................................. 322
What are different types of Locks in SQL Server? .................................................... 322
What are different Isolation levels in SQL Server? ................................................... 325
What are different types of Isolation levels in SQL Server? ..................................... 325
If you are using COM+ what “Isolation” level is set by default?.............................. 326
What are “Lock” hints? ............................................................................................. 327
What is a “Deadlock” ? ............................................................................................. 327
What are the steps you can take to avoid “Deadlocks” ? .......................................... 327
(DB)How can I know what locks are running on which resource? ........................... 328
“Cheers to the true fighting spirit of IT professionals”
Introduction
Dedication
This book is dedicated to my kid Sanjana, whose dad’s play time has been stolen and
given to this book. I am thankful to my wife for constantly encouraging me and also to
BPB Publication to give new comer a platform to perform. Finally at the top of all thanks
to two old eyes my mom and dad for always blessing me. I am blessed to have Raju as my
brother who always keeps my momentum moving on.
I am grateful to Bhavnesh Asar who initially conceptualized the idea I believe concept
thinking is more important than execution. Tons of thanks to my reviewers whose feedback
provided an essential tool to improve my writing capabilities.
Just wanted to point out Miss Kadambari . S. Kadam took all the pain to review for the
left outs with out which this book would have never seen the quality light.
Author works in a big multinational company and has over 8 years of experience in
software industry. He is working presently as project lead and in past has led projects in
Banking, travel and financial sectors.
But on the top of all , I am a simple developer like you all guys there doing an 8 hour job.
Writing is something I do extra and I love doing it. No one is perfect and same holds true
for me .So anything you want to comment, suggest, point typo / grammar mistakes or
technical mistakes regarding the book you can mail me at shiv_koirala@yahoo.com. Believe
me guys your harsh words would be received with love and treated to the top most priority.
Without all you guys I am not an author.
Writing an interview question book is really a great deal of responsibility. I have tried to
cover maximum questions for the topic because I always think probably leaving one silly
question will cost someone’s job there. But huge natural variations in an interview are
something difficult to cover in this small book. So if you have come across such questions
during interview which is not addressed in this book do mail at shiv_koirala@yahoo.com
.Who knows probably that question can save some other guys job.
51
√ This book goes in best combination with my previous book “.NET Interview
questions”. One takes care of your front end aspect and this one the back end
which will make you really stand out during .NET interviews.
√ Around 400 plus SQL Server Interview questions sampled from real SQL Server
Interviews conducted across IT companies.
√ Other than core level interview question, DBA topics like database optimization
and locking are also addressed.
√ Replication section where most of the developer stumble, full chapter is
dedicated to replication so that during interview you really look a champ.
√ SQLCLR that is .NET integration which is one of the favorites of every interviewer
is addressed with great care .This makes developer more comfortable during interview.
√ XML is one of the must to be answered questions during interview. All new XML
features are covered with great elegance.
√ Areas like data warehousing and data mining are handled in complete depth.
√ Reporting and Analysis services which can really surprise developers during interviews
are also dealt with great care.
√ A complete chapter on ADO.NET makes it more stronger from a programmer aspect.
In addition new ADO.NET features are also highlighted which can be pain points
for the new features released with SQL Server.
√ Must for developers who are looking to crack SQL Server interview for DBA position
or programmer position.
√ Must for freshers who want to avoid some unnecessary pitfall during interview.
√ Every answer is precise and to the point rather than hitting around the bush. Some
questions are answered to greater detail with practical implementation in mind.
√ Every question is classified in DB and NON-DB level. DB level question are mostly
for guys who are looking for high profile DBA level jobs. All questions other than
DB level are NON-DB level which is must for every programmer to know.
√ Tips and tricks for interview, resume making and salary negotiation section takes
this book to a greater height.
52
Introduction
When my previous book ".NET Interview Questions" reached the readers, the only voice
heared was more “SQL Server”. Ok guys we have heard it louder and clearer, so here’s
my complete book on SQL Server: - “SQL Server Interview Questions”. But there’s a
second stronger reason for writing this book which stands taller than the readers demand
and that is SQL Server itself. Almost 90 % projects in software industry need databases
or persistent data in some or other form. When it comes to .NET persisting data SQL
Server is the most preferred database to do it. There are projects which use ORACLE,
DB2 and other database product, but SQL Server still has the major market chunk when
language is .NET and especially operating system is windows. I treat this great relationship
between .NET, SQL Server and Windows OS as a family relationship.
In my previous book we had only one chapter which was dedicated to SQL Server which
is complete injustice to this beautiful product.
So why an interview question book on SQL Server? If you look at any .NET interview
conducted in your premises both parties (Employer and Candidate) pay no attention to
SQL Server even though when it is such an important part of development project. They
will go talking about stars (OOP, AOP, Design patterns, MVC patterns, Microsoft
Application blocks, Project Management etc.) but on database side there would be rare
questions. I am not saying these things are not important but if you see in development or
maintenance majority time you will be either in your IDE or in SQL Server.
Secondly many candidates go really as heroes when answering questions of OOP , AOP
, Design patterns , architecture , remoting etc etc but when it comes to simple basic
question on SQL Server like SQL , indexes ( forget DBA level questions) they are
completely out of track.
Third very important thing IT is changing people expect more out of less. That means
they expect a programmer should be architect, coder, tester and yes and yes a DBA also.
For mission critical data there will always be a separate position for a DBA. But now
many interviewers expect programmers to also do a job of DBA, Data warehousing etc.
This is the major place where developers lack during facing these kinds of interview.
So this book will make you walk through those surprising questions which can sprang
from SQL Server aspect. I have tried to not go too deep as that will defeat the complete
purpose of an Interview Question book. I think that an interview book should make you
53
run through those surprising question and make you prepare in a small duration (probably
with a night or so). I hope this book really points those pitfalls which can come during
SQL Server Interview’s.
I hope this book takes you to a better height and gives you extra confidence boost during
interviews.Best of Luck and Happy Job-Hunting.............
If you can read English, you can read this book....kidding. In this book there are some
legends which will make your reading more effective. Every question has simple tags
which mark the rating of the questions.
These rating are given by Author and can vary according to companies and individuals.
Compared to my previous book “.NET Interview Questions” which had three levels
(Basic, Intermediate and Advanced) this book has only two levels (DBA and NON-
DBA) because of the subject. While reading you can come across section marked as
“Note” , which highlight special points of that section. You will also come across tags
like “TWIST”, which is nothing , but another way of asking the same question, for instance
“What is replication?” and “How do I move data between two SQL Server database?” ,
point to the same answer.
All questions with DBA level are marked with (DB) tag. Questions which do not have
tags are NON-DBA levels. Every developer should have a know how of all NON-DBA
levels question. But for DBA guys every question is important. For instance if you are
going for a developer position and you flunk in simple ADO.NET question you know the
result. Vice versa if you are going for a DBA position and you can not answer basic query
optimization questions probably you will never reach the HR round.
So the best way to read this book is read the question and judge yourself do you think you
will be asked these types of questions? For instance many times you know you will be
only asked about data warehousing and rather than hitting the bush around you would
like to target that section more. And Many times you know your weakest area and you
would only like to brush up those sections. You can say this book is not a book which has
to be read from start to end you can start from a chapter or question and when you think
you are ok close it.
54
It’s very important during interview to be clear about what position you are targeting.
Depending on what positions you are targeting the interviewer shoots you questions.
Example if you are looking for a DBA position you will be asked around 20% ADO.NET
questions and 80% questions on query optimization, profiler, replication, data warehousing,
data mining and others.
Note:- In small scale software house and mid scale software companies there are chances
where they expect a developer to a job of programming , DBA job , data mining and
everything. But in big companies you can easily see the difference where DBA job are
specifically done by specialist of SQL Server rather than developers. But now a days some
big companies believe in a developer doing multitask jobs to remove dependencies on a
resource.
55
Figure :- 0.1 IT Company hierarchy
Above is a figure of a general hierarchy across most IT companies ( Well not always but
I hope most of the time). Because of inconsistent HR way of working you will see
difference between companies.
56
Note: - There are many small and medium software companies which do not follow this
hierarchy and they have there own ADHOC way of defining positions in the company.
So why there is a need of hierarchy in an interview?
“Interview is a contract between the employer and candidate to achieve specific goals.”
So employer is looking for a suitable candidate and candidate for a better career. Normally
in interviews the employer is very clear about what type of candidate he is looking for.
But 90% times the candidate is not clear about the positions he is looking for.
How many times has it happened with you that you have given a whole interview and
when you mentioned the position you are looking for...pat comes the answer we do not
have any requirements for this position. So be clarified about the position right when you
start the interview.
Following are the number of years of experience according to position.
√ Junior engineers are especially fresher and work under software engineers.
√ Software engineers have around 1 to 2 years of experience. Interviewer expects
software engineers to have know how of how to code ADO.NET with SQL Server.
√ Senior Software Engineers have around 2 to 4 years of experience. Interviewer
expect them to be technically very strong.
√ Project leads should handle majority technical aspect of project and should have
around 4 to 8 years of experience. They are also actively involved in to defining
architect of the project. Interviewer expects them to be technically strong plus should
have managerial skills.
√ Project Managers are expected to be around 40% technically strong and should have
experience above 10 years plus. But they are more interviewed from aspect of project
management, client interaction, people management, proposal preparation etc.
√ Pure DBA’s do not come in hierarchy as such in pure development projects. They do
report to the project managers or project leads but they are mainly across the hierarchy
helping every one in a project. In small companies software developers can also act
as DBA’s depending on companies policy. Pure DBA’s have normally around 6 and
above years of experience in that particular database product.
57
√ When it comes to maintenance projects where you have special DBA positions lot
of things are ADHOC. That means one or two guys work fulfilling maintenance
tickets.
So now judge where you stand where you want to go..........
59
√ Do not mention your salary in CV. You can talk about it during interview with HR
or the interviewer.
√ When you are writing your summary for project make it effective by using verbs like
managed a team of 5 members, architected the project from start to finish etc. It
brings huge weight.
√ This is essential very essential take 4 to 5 Xerox copies of your resume you will need
it now and then.
√ Just in case take at least 2 passport photos with you. You can escape it but many
times you will need it.
√ Carry you’re all current office documents specially your salary slips and joining letter.
Salary Negotiation
Ok that’s what we all do it for money… not every one right. This is probably the weakest
area for techno savvy guys. They are not good negotiators. I have seen so many guys at the
first instance they will smile say “NEGOTIABLE SIR”. So here are some points:-
√ Do a study of what‘s the salary trend? For instance have some kind of baseline. For
example what’s the salary trend on number of year of experience? Discuss this
with your friends out.
√ Do not mention your expected salary on the resume?
√ Let the employer first make the salary offer. Try to delay the salary discussion till the
end.
√ If they say what you expect ? , come with a figure with a little higher end and
say negotiable. Remember never say negotiable on something which you have aimed,
HR guys will always bring it down. So negotiate on AIMED SALARY + some thing
extra.
√ The normal trend is that they look at your current salary and add a little it so that
they can pull you in. Do your home work my salary is this much and I expect this
much so whatever it is now I will not come below this.
√ Do not be harsh during salary negotiations.
√ It’s good to aim high. For instance I want 1 billion dollars / month but at the same
time be realistic.
60
√ Some companies have those hidden cost attached in salary clarify that rather to be
surprised at the first salary package.
√ Many of the companies add extra performance compensation in your basic which
can be surprising at times. So have a detail break down. Best is to discuss on hand
salary rather than NET.
√ Talk with the employer in what frequency does the hike happen.
√ Take everything in writing , go back to your house and have a look once with a cool
head is the offer worth it of what your current employer is giving.
√ Do not forget once you have job in hand you can come back to your current employer
for negotiation so keep that thing in mind.
√ Remember the worst part is cribbing after joining the company that your colleague is
getting this much. So be careful while interview negotiations or be sportive to be a
good negotiator in the next interview.
√ One very important thing the best negotiation ground is not the new company where
you are going but the old company which you are leaving. So once you have offer on
hand get back to your old employee and show them the offer and then make your
next move. It’s my experience that negotiating with the old employer is easy than
with the new one….Frankly if approached properly rarely any one will say no. Just
do not be aggressive or egoistic that you have an offer on hand.
Top of all some time some things are worth above money :- JOB SATISFACTION. So
whatever you negotiate if you think you can get JOB SATISFACTION aspect on higher
grounds go for it. I think its worth more than money.
Points to remember
√ One of the first questions asked during interview is “Can you say something about
yourself ”?
√ Can you describe about your self and what you have achieved till now?
√ Why you want to leave the current company?
√ Where do you see yourself after three years?
√ What are your positive and negative points?
√ How much do you rate yourself in .NET and SQL Server in one out of ten?
61
√ Are you looking for onsite opportunities? (Be careful do not show your desperation
of abroad journeys)
√ Why have you changed so many jobs? (Prepare a decent answer do not blame
companies and individuals for your frequent change).
√ Never talk for more than 1 minute straight during interview.
√ Have you worked with previous version of SQL Server?
√ Would you be interested in a full time Database administrator job?
√ Do not mention client name’s in resume. If asked say that it’s confidential which
brings ahead qualities like honesty
√ When you make your resume keep your recent projects at the top.
√ Find out what the employer is looking for by asking him questions at the start of
interview and best is before going to interview. Example if a company has projects
on server products employer will be looking for BizTalk, CS CMS experts.
√ Can you give brief about your family background?
√ As you are fresher do you think you can really do this job?
√ Have you heard about our company ? Say five points about our company? Just read
at least once what company you are going for?
√ Can you describe your best project you have worked with?
√ Do you work on Saturday and Sunday?
√ Which is the biggest team size you have worked with?
√ Can you describe your current project you have worked with?
√ How much time will you need to join our organization? What’s notice period for
your current company?
√ What certifications have you cleared?
√ Do you have pass port size photos, last year mark sheet, previous companies
employment letter, last months salary slip, pass port and other necessary documents.
√ What’s the most important thing that motivates you?
√ Why you want to leave the previous organization?
62
√ Which type of job gives you greatest satisfaction?
√ What is the type of environment you are looking for?
√ Do you have experience in project management?
√ Do you like to work as a team or as individual?
√ Describe your best project manager you have worked with?
√ Why should I hire you?
√ Have you been ever fired or forced to resign?
√ Can you explain some important points that you have learnt from your past project
experiences?
√ Have you gone through some unsuccessful projects, if yes can you explain why did
the project fail?
√ Will you be comfortable with location shift? If you have personal problems say no
right at the first stage.... or else within two months you have to read my book again.
√ Do you work during late nights? Best answer if there is project deadline yes. Do not
show that it’s your culture to work during nights.
√ Any special achievements in your life till now...tell your best project which you
have done best in your career.
√ Any plans of opening your own software company...Beware do not start pouring
your bill gate’s dream to him.....can create a wrong impression.
63
1. Database Concepts
What is database or database management systems
(DBMS)?
Twist: - What’s the difference between file and database? Can files qualify as a database?
Note: - Probably these questions are too basic for experienced SQL SERVER guys. But
from freshers point of view it can be a difference between getting a job and to be jobless.
Database provides a systematic and organized way of storing, managing and retrieving
from collection of logically related information.
Secondly the information has to be persistent, that means even after the application is
closed the information should be persisted.
Finally it should provide an independent way of accessing data and should not be
dependent on the application to access the information.
Ok let me spend a few sentence more on explaining the third aspect. Below is a simple
figure of a text file which has personal detail information. The first column of the
information is Name, Second address and finally the phone number. This is a simple text
file which was designed by a programmer for a specific application.
64
It works fine in the boundary of the application. Now some years down the line a third
party application has to be integrated with this file , so in order the third party application
integrates properly it has the following options :-
√ Use interface of the original application.
√ Understand the complete detail of how the text file is organized, example the
first column is Name, then address and finally phone number. After analyzing
write a code which can read the file, parse it etc ….Hmm lot of work right.
That’s what the main difference between a simple file and database; database has
independent way (SQL) of accessing information while simple files do not (That answers
my twisted question defined above). File meets the storing, managing and retrieving part
of a database but not the independent way of accessing data.
Note: - Many experienced programmers think that the main difference is that file can not
provide multi-user capabilities which a DBMS provides. But if you look at some old
COBOL and C programs where file where the only means of storing data, you can see
functionalities like locking, multi-user etc provided very efficiently. So it’s a matter of debate
if some interviewers think this as a main difference between files and database accept it…
going in to debate is probably loosing a job.
(Just a note for fresher’s multi-user capabilities means that at one moment of time more than
one user should be able to add, update, view and delete data. All DBMS provides this as
in built functionalities but if you are storing information in files it’s up to the application to
write a logic to achieve these functionalities)
65
Many DBMS companies claimed there DBMS product was a RDBMS compliant, but
according to industry rules and regulations if the DBMS fulfills the twelve CODD rules
it’s truly a RDBMS. Almost all DBMS (SQL SERVER, ORACLE etc) fulfills all the
twelve CODD rules and are considered as truly RDBMS.
Note: - One of the biggest debate, Is Microsoft Access a RDBMS? We will be answering
this question in later section.
66
Rule 3: Systematic treatment of null values.
"Null values (distinct from the empty character string or a string of blank characters and
distinct from zero or any other number) are supported in fully relational DBMS for
representing missing information and inapplicable information in a systematic way,
independent of data type."
In SQL SERVER if there is no data existing NULL values are assigned to it. Note NULL
values in SQL SERVER do not represent spaces, blanks or a zero value; it’s a distinct
representation of missing information and thus satisfying rule 3 of CODD.
67
database vendors where providing there own flavor of syntaxes until in 80 ANSI-SQL
came into standardize this variation between vendors. As ANSI-SQL is quiet limited,
every vendor including Microsoft introduced there additional SQL syntaxes in addition to
the support of ANSI-SQL. You can see SQL syntaxes varying from vendor to vendor.
68
In SQL SERVER you can specify data types (integer, nvarchar, Boolean etc) which puts
in data type checks in SQL SERVER rather than through application programs.
69
√ Access uses file server design and SQL SERVER uses the Client / Server
model. This forms the major difference between SQL SERVER and ACCESS.
Note: - Just to clarify what is client server and file server I will make quick description of
widely accepted architectures. There are three types of architecture:-
• Main frame architecture (This is not related to the above explanation but just
mentioned it as it can be useful during interview and also for comparing with
other architectures)
• File sharing architecture (Followed by ACCESS).
• Client Server architecture (Followed by SQL SERVER).
In Main Frame architecture all the processing happens on central host server. User interacts
through dump terminals which only sends keystrokes and information to host. All the
main processing happens on the central host server. So advantage in such type of
architecture is that you need least configuration clients. But the disadvantage is that you
need robust central host server like Main Frames.
In File sharing architecture which is followed by access database all the data is sent to the
client terminal and then processed. For instance you want to see customers who stay in
INDIA, in File Sharing architecture all customer records will be send to the client PC
regardless whether the customer belong to INDIA or not. On the client PC customer
records from India is sorted/filtered out and displayed, in short all processing logic happens
on the client PC. So in this architecture the client PC should have heavy configuration
and also it increases network traffic as lot of data is sent to the client PC. But advantage
of this architecture is that your server can be of low configurations.
70
Figure 1.2:- File Server Architecture of Access
In client server architecture the above limitation of the file server architecture is removed.
In Client server architecture you have two entities client and the database server. File
server is now replaced by database server. Database server takes up the load of processing
any database related activity and the client any validation aspect of database. As the
work is distributed between the entities it increases scalability and reliability. Second the
network traffic also comes down as compared to file server. For example if you are
requesting customers from INDIA, database server will sort/ filter and send only INDIAN
customer details to the client, thus bringing down the network traffic tremendously. SQL
SERVER follows the client-server architecture.
71
Figure 1.3:- Client Server Architecture of SQL SERVER
√ Second issue comes in terms of reliability. In Access the client directly interacts
with the access file, in case there is some problem in middle of transaction
there are chances that access file can get corrupt. But in SQL SERVER the
engine sits in between the client and the database, so in case of any problems
in middle of the transaction it can revert back to its original state.
Note: - SQL SERVER maintains a transaction log by which you can revert back to your
original state in case of any crash.
√ When your application has to cater to huge load demand, highly transactional
environment and high concurrency then its better to for SQL SERVER or
MSDE.
√ But when it comes to cost and support the access stands better than SQL
SERVER.In case of SQL SERVER you have to pay for per client license, but
access runtime is free.
Summarizing: - SQL SERVER gains points in terms of network traffic, reliability and
scalability vice-versa access gains points in terms of cost factor.
72
What’s the difference between MSDE and SQL SERVER
2000?
MSDE is a royalty free, redistributable and cut short version of the giant SQL SERVER
database. It’s primarily provided as a low cost option for developers who need database
server which can easily be shipped and installed. It can serve as good alternative for
Microsoft Access database as it over comes quiet a lot of problems which access has.
Below is a complete list which can give you a good idea of differences:-
√ Size of database: - MS ACCESS and MSDE have a limitation of 2GB while
SQL SERVER has 1,048,516 TB1.
√ Performance degrades in MSDE 2000 when maximum number of concurrent
operations goes above 8 or equal to 8. It does not mean that you can not have
more than eight concurrent operations but the performance degrades. Eight
connection performance degradation is implemented by using SQL SERVER
2000 work load governor (we will be looking in to more detail of how it works).
As compared to SQL SERVER 2000 you can have 32,767 concurrent
connections.
√ MSDE does not provide OLAP and Data ware housing capabilities.
√ MSDE does not have support facility for SQL mail.
√ MSDE 2000 does not have GUI administrative tool such as enterprise manager,
Query analyzer or Profiler. But there are round about ways by which you can
manage MSDE 2000 :-
√ Old command line utility OSQL.EXE.
√ VS.NET IDE Server Explorer: - Inside VS.NET IDE you have a
functionality which can give you a nice GUI administrative tool to manage
IDE.
√ SQL SERVER WEB Data administrator installs a web based GUI which
you can use to manage your database. For any details refer http://
www.microsoft.com/downloads/details.aspx?familyid=c039a798-c57a-
419e-acbc-2a332cb7f959&displaylang=en
√ SQL-DMO objects can be used to build your custom UI.
73
√ There are lots of third party tools which provide administrative capability
GUI, which is out of scope of the book as it’s only meant for interview
questions.
√ MSDE does not support Full text search.
Summarizing: - There are two major differences first is the size limitation (2 GB) of
database and second are the concurrent connections (eight concurrent connections) which are
limited by using the work load governor. During interview this answer will suffice if he is
really testing your knowledge.
75
defined types can now be written using your own favorite .NET language
(VB.NET, C#, J# etc .). This support was not there in SQL SERVER
2000 where the only language was T-SQL. In SQL 2005 you have support for
two languages T-SQL and .NET.
√ (PG) SQL SERVER 2005 has reporting services for reports which is a newly
added feature and does not exist for SQL SERVER 2000.It was a seperate
installation for SQL Server 2000
√ (PG) SQL SERVER 2005 has introduced two new data types varbinary (max)
and XML. If you remember in SQL SERVER 2000 we had image and text
data types. Problem with image and text data types is that they assign same
amount of storage irrespective of what the actual data size is. This problem is
solved using varbinary (max) which acts depending on amount of data. One
more new data type is included "XML" which enables you to store XML
documents and also does schema verification. In SQL SERVER 2000 developers
used varchar or text data type and all validation had to be done
programmatically.
√ (PG) SQL SERVER 2005 can now process direct incoming HTTP request
with out IIS web server. Also stored procedure invocation is enabled using the
SOAP protocol.
√ (PG) Asynchronous mechanism is introduced using server events. In Server
event model the server posts an event to the SQL Broker service, later the
client can come and retrieve the status by querying the broker.
√ For huge databases SQLSERVER has provided a cool feature called as “Data
partitioning”. In data partitioning you break a single database object such as a
table or an index into multiple pieces. But for the client application accessing
the single data base object “partitioning” is transparent.
√ In SQL SERVER 2000 if you rebuilt clustered indexes even the non-clustered
indexes where rebuilt. But in SQL SERVER 2005 building the clustered indexes
does not built the non-clustered indexes.
√ Bulk data uploading in SQL SERVER 2000 was done using BCP (Bulk copy
program’s) format files. But now in SQL SERVER 2005 bulk data uploading
uses XML file format.
76
√ In SQL SERVER 2000 there where maximum 16 instances, but in 2005 you
can have up to 50 instances.
√ SQL SERVER 2005 has support of “Multiple Active Result Sets” also called
as “MARS”. In previous versions of SQL SERVER 2000 in one connection
you can only have one result set. But now in one SQL connection you can
query and have multiple results set.
√ In previous versions of SQL SERVER 2000, system catalog was stored in
master database. In SQL SERVER 2005 it’s stored in resource database which
is stored as sys object , you can not access the sys object directly as in older
version wewhere accessing master database.
√ This is one of hardware benefits which SQL SERVER 2005 has over SQL
SERVER 2000 – support of hyper threading. WINDOWS 2003 supports hyper
threading; SQL SERVER 2005 can take the advantage of the feature unlike
SQL SERVER 2000 which did not support hyper threading.
Note: - Hyper threading is a technology developed by INTEL which creates two logical
processors on a single physical hardware processor.
√ SMO will be used for SQL Server Management.
√ AMO (Analysis Management Objects) to manage Analysis Services servers,
data sources, cubes, dimensions, measures, and data mining models. You can
map AMO in old SQL SERVER with DSO (Decision Support Objects).
√ Replication is now managed by RMO (Replication Management Objects).
Note: - SMO, AMO and RMO are all using .NET Framework.
√ SQL SERVER 2005 uses current user execution context to check rights rather
than ownership link chain, which was done in SQL SERVER 2000.
Note: - There is a question on this later see for execution context questions.
√ In previous versions of SQL SERVER the schema and the user name was
same, but in current the schema is separated from the user. Now the user owns
schema.
Note: - There are questions on this, refer – “Schema” later.
Note:-Ok below are some GUI changes.
77
√ Query analyzer is now replaced by query editor.
√ Business Intelligence development studio will be used to create Business
intelligence solutions.
√ OSQL and ISQL command line utility is replaced by SQLCMD utility.
√ SQL SERVER Enterprise manager is now replaced by SQL SERVER
Management studio.
√ SERVER Manager which was running in system tray is now replaced by SQL
Computer manager.
√ Database mirror concept supported in SQL SERVER 2005 which was not
present in SQL SERVER 2000.
√ In SQL SERVER 2005 Indexes can be rebuild online when the database is
in actual production. If you look back in SQL SERVER 2000 you can not do
insert, update and delete operations when you are building indexes.
√ (PG) Other than Serializable, Repeatable Read, Read Committed and Read
Uncommitted isolation level there is one more new isolation level “Snapshot
Isolation level”.
Note: - We will see “Snapshot Isolation level” in detail in coming questions.
Summarizing: - The major significant difference between SQL SERVER 2000 and SQL
SERVER 2005 is in terms of support of .NET Integration, Snap shot isolation level,
Native XML support, handling HTTP request, Web service support and Data
partitioning. You do not have to really say all the above points during interview a sweet
summary and you will rock.
78
Figure 1.4: - Asset management ER diagram.
79
Figure 1.5 : - One-to-One relationship ER diagram
√ One-to-many
In this many records in one table corresponds to the one record in other table. Example:
- Every one customer can have multiple sales. So there exist one-to-many relationships
between customer and sales table.
One “Asset” can have multiple “Maintenance”. So “Asset” entity has one-to-many
relationship between them as the ER model shows below.
80
Figure 1.6 : - One-to-Many Relationship ER diagram
√ Many-to-many
In this one record in one table corresponds to many rows in other table and also vice-
versa. For instance :- In a company one employee can have many skills like java , c# etc
and also one skill can belong to many employees.
Given below is a sample of many-to-many relationship. One employee can have knowledge
of multiple “Technology”. So in order to implement this we have one more table
“EmployeeTechnology” which is linked to the primary key of “Employee” and
“Technology” table.
81
Figure 1.7 : - Many-to-Many Relationship ER diagram
82
form.But believe this book answering three normal forms will put you in decent shape during
interview.
Following are the three normal forms :-
For in the above example city1 and city2 are repeating.In order this table to be in First
normal form you have to modify the table structure as follows.Also not that the Customer
Name is now broken down to first name and last name (First normal form data should be
broken down to smallest unit).
83
Figure 1.10 :- Normalized customer table.
So now the "Total" field is removed and is multiplication of Unit price * Qty.
What is denormalization ?
84
Denormalization is the process of putting one fact in numerous places (its vice-versa of
normalization).Only one valid reason exists for denormalizing a relational design - to
enhance performance.The sacrifice to performance is that you increase redundancy in
database.
In the above table you can see there are two many-to-many relationship between “Supplier”
/ “Product” and “Supplier” / “Location” (or in short multi-valued facts). In order the
above example satisfies fourth normal form, both the many-to-many relationship should
go in different tables.
85
(DB) Can you explain Fifth Normal Form?
Note: - UUUHHH if you get his question after joining the company do ask him, did he
himself really use it?
Fifth normal form deals with reconstructing information from smaller pieces of
information. These smaller pieces of information can be maintained with less redundancy.
Example: - “Dealers” sells “Product” which can be manufactured by various “Companies”.
“Dealers” in order to sell the “Product” should be registered with the “Company”. So
these three entities have very much mutual relationship within them.
The above table shows some sample data. If you observe closely a single record is created
using lot of small information’s. For instance: - “JM Associate” can sell sweets in the
following two conditions:-
√ “JM Associate” should be an authorized dealer of “Cadbury”.
√ “Sweets” should be manufactured by “Cadbury” company.
These two smaller information forms one record of the above given table. So in order
that the above information to be “Fifth Normal Form” all the smaller information should
be in three different places. Below is the complete fifth normal form of the database.
86
Figure 1.16 : - Complete Fifth Normal Form
87
Twist: - What’s the relationship between Extent and Page?
Extent is a basic unit of storage to provide space for tables. Every extent has number of
data pages. As new records are inserted new data pages are allocated. There are eight data
pages in an extent. So as soon as the eight pages are consumed it allocates new extent
with data pages.
While extent is basic unit storage from database point of view, page is a unit of allocation
within extent.
88
Figure 1.17 : - General view of a Extent
89
Figure 1.18 : - MDF and LDF files.
90
Figure 1.19 : - Collation according to language
Case sensitivity
If A and a, B and b, etc. are treated in the same way then it is case-insensitive. A computer
treats A and a differently because it uses ASCII code to differentiate the input. The ASCII
value of A is 65, while a is 97. The ASCII value of B is 66 and b is 98.
Accent sensitivity
If “a” and “A”, o and “O” are treated in the same way, then it is accent-insensitive. A
computer treats “a” and “A” differently because it uses ASCII code for differentiating the
input. The ASCII value of “a” is 97 and “A” 225. The ASCII value of “o” is 111 and “O”
is 243.
Kana Sensitivity
When Japanese kana characters Hiragana and Katakana are treated differently, it is called
Kana sensitive.
Width sensitivity
When a single-byte character (half-width) and the same character when represented as a
double-byte character (full-width) are treated differently then it is width sensitive.
91
(DB)Can we have a different collation for database and
table?
Yes you can specify different collation sequence for both the entity differently.
92
93
2. SQL
Note: - This is one of the crazy things which I did not want to put in my book. But when I
did sampling of some real interviews conducted across companies I was stunned to find some
interviewer judging developers on syntaxes. I know many people will conclude this is childish
but it’s the interviewer’s decision. If you think that this chapter is not useful you can
happily skip it. But I think on fresher’s level they should not
Note: - I will be heavily using the “AdventureWorks” database which is a sample database
shipped (in previous version we had the famous’NorthWind’ database sample) with SQL
Server 2005. Below is a view expanded from “SQL Server Management Studio”.
94
INSERT INTO ColorTable (code, colorvalue) VALUES ('b1', 'Brown')
DELETE FROM ColorTable WHERE code = ‘b1'
UPDATE ColorTable SET colorvalue =’Black’ where code=’bl’
DROP TABLE table-name {CASCADE|RESTRICT}
GRANT SELECT ON ColorTable TO SHIVKOIRALA WITH GRANT
OPTION
REVOKE SELECT, INSERT, UPDATE (ColorCode) ON ColorTable FROM
Shivkoirala
COMMIT [WORK]
ROLLBACK [WORK]
Select * from Person.Address
Select AddressLine1, City from Person.Address
Select AddressLine1, City from Person.Address where city ='Sammamish'
95
How to import table using “INSERT” statement?
I have made a new temporary color table which is flourished using the below SQL.
Structures of both the table should be same in order that this SQL executes properly.
INSERT INTO TempColorTable
SELECT code,ColorValue
FROM ColorTable
INNER JOIN
Inner join shows matches only when they exist in both tables. Example in the below SQL
there are two tables Customers and Orders and the inner join in made on
Customers.Customerid and Orders.Customerid. So this SQL will only give you result
with customers who have orders. If the customer does not have order it will not display
that record.
SELECT Customers.*, Orders.* FROM Customers INNER JOIN Orders ON
Customers.CustomerID =Orders.CustomerID
96
LEFT OUTER JOIN
Left join will display all records in left table of the SQL statement. In SQL below customers
with or without orders will be displayed. Order data for customers without orders appears
as NULL values. For example, you want to determine the amount ordered by each
customer and you need to see who has not ordered anything as well. You can also see the
LEFT OUTER JOIN as a mirror image of the RIGHT OUTER JOIN (Is covered in the
next section) if you switch the side of each table.
SELECT Customers.*, Orders.* FROM Customers LEFT OUTER JOIN Orders ON
Customers.CustomerID =Orders.CustomerID
97
Using the “ORDER BY” clause, you either sort the data in ascending manner or descending
manner.
select * from sales.salesperson order by salespersonid asc
select * from sales.salesperson order by salespersonid desc
What is a self-join?
If you want to join two instances of the same table you can use self-join.
98
What’s the difference between DELETE and TRUNCATE
?
Following are difference between them:
√ DELETE TABLE syntax logs the deletes thus making the delete operations
low. TRUNCATE table does not log any information but it logs information
about deallocation of data page of the table. So TRUNCATE table is faster as
compared to delete table.
√ DELETE table can be rolled back while TRUNCATE can not be.
√ DELETE table can have criteria while TRUNCATE can not.
√ TRUNCATE table can not have triggers.
99
Figure 2.2 : - “%” operator in action.
100
Figure 2.3 : - “_” operator in action
101
union all
Select * from person.address
This returns 39228 rows ( “unionall” does not check for duplicates so returns double the
record show up)
102
Figure 2.5 : - Union All in action (39228 rows)
Note: - Selected records should have same data type or else the syntax will not work.
Note: - In the coming questions you will see some 5 to 6 questions on cursors. Though not a
much discussed topic but still from my survey 5% of interviews have asked questions on
cursors. So let’s leave no stone for the interviewer to reject us.
What are cursors and what are the situations you will
use them?
SQL statements are good for set at a time operation. So it is good at handling set of data.
But there are scenarios where you want to update row depending on certain criteria. You
will loop through all rows and update data accordingly. There’s where cursors come in to
picture.
103
√ Declare
√ Open
√ Fetch
√ Operation
√ Close and Deallocate
This is a small sample which uses the “person.address” class. This T-SQL program will
only display records which have “@Provinceid” equal to “7”.
DECLARE @provinceid int
-- Declare Cursor
104
DECLARE provincecursor CURSOR FOR
SELECT stateprovinceid
FROM Person.Address
-- Open cursor
OPEN provincecursor
-- Fetch data from cursor in to variable
FETCH NEXT FROM provincecursor
INTO @provinceid
WHILE @@FETCH_STATUS = 0
BEGIN
-- Do operation according to row value
if @Provinceid=7
begin
PRINT @Provinceid
end
-- Fetch the next cursor
FETCH NEXT FROM provincecursor
INTO @provinceid
END
-- Finally do not forget to close and deallocate the cursor
CLOSE provincecursor
DEALLOCATE provincecursor
105
[LOCAL | GLOBAL]
[FORWARD_ONLY | SCROLL]
[STATIC | KEYSET | DYNAMIC | FAST_FORWARD]
[READ_ONLY | SCROLL_LOCKS | OPTIMISTIC]
[TYPE_WARNING]
FOR select_statement
[FOR UPDATE [OF column_list]]
STATIC
STATIC cursor is a fixed snapshot of a set of rows. This fixed snapshot is stored in a
temporary database. As the cursor is using private snapshot any changes to the set of
rows external will not be visible in the cursor while browsing through it. You can define a
static cursor using “STATIC” keyword.
DECLARE cusorname CURSOR STATIC
FOR SELECT * from tablename
WHERE column1 = 2
KEYSET
In KEYSET the key values of the rows are saved in tempdb. For instance let’s say the
cursor has fetched the following below data. So only the “supplierid” will be stored in the
database. Any new inserts happening is not reflected in the cursor. But any updates in the
key-set values are reflected in the cursor. Because the cursor is identified by key values
you can also absolutely fetch them using “FETCH ABSOLUTE 12 FROM mycursor”
106
Figure 2.7 : - Key Set Data
DYNAMIC
In DYNAMIC cursor you can see any kind of changes happening i.e. either inserting new
records or changes in the existing and even deletes. That’s why DYNAMIC cursors are
slow and have least performance.
FORWARD_ONLY
As the name suggest they only move forward and only a one time fetch is done. In every
fetch the cursor is evaluated. That means any changes to the data are known, until you
have specified “STATIC” or “KEYSET”.
FAST_FORWARD
These types of cursor are forward only and read-only and in every fetch they are not re-
evaluated again. This makes them a good choice to increase performance.
107
territory wise how many sales people are there. So in the second figure I made a group by
on territory id and used the “count” aggregate function to see some meaningful data.
“Northwest” has the highest number of sales personnel.
108
Figure 2.9 : - Group by applied
What is ROLLUP?
ROLLUP enhances the total capabilities of “GROUP BY” clause.
Below is a GROUP BY SQL which is applied on “SalesorderDetail” on “Productid” and
“Specialofferid”. You can see 707,708,709 etc products grouped according to
“Specialofferid” and the third column represents total according to each pair of
“Productid” and “Specialofferid”. Now you want to see sub-totals for each group of
“Productid” and “Specialofferid”.
109
Figure 2.10: - Salesorder displayed with out ROLLUP
So after using ROLLUP you can see the sub-total. The first row is the grand total or the
main total, followed by sub-totals according to each combination of “Productid” and
“Specialofferid”. ROLLUP retrieves a result set that contains aggregates for a hierarchy
of values in selected columns.
110
Figure 2.11: - Subtotal according to product using ROLLUP
What is CUBE?
CUBE retrieves a result set that contains aggregates for all combinations of values in the
selected columns. ROLLUP retrieves a result set that contains aggregates for a hierarchy
of values in selected columns.
111
Figure 2.12: - CUBE in action
112
inner join sales.salesterritory on
sales.salesterritory.territoryid=sales.salesperson.territoryid
group by sales.salesperson.territoryid,sales.salesterritory.name
having count(sales.salesperson.territoryid) >= 2
Note:- You can see the having clause applied. In this case you can not specify it with
“WHERE” clause it will throw an error. In short “HAVING” clause applies filter on a
group while “WHERE” clause on a simple SQL.
113
(PERCENT) rows. So what does that sentence mean? See the below figure there are four
products p1,p2,p3 and p4. “UnitCost” of p3 and p4 are same.
So when we do a TOP 3 on the “ProductCost” table we will see three rows as show
below. But even p3 has the same value as p4. SQL just took the TOP 1. So if you want to
display tie up data like this you can use “WITH TIES”.
You can see after firing SQL with “WITH TIES” we are able to see all the products
properly.
114
Figure 2.16: - WITH TIES in action
Note: - You should have an “ORDER CLAUSE” and “TOP” keyword specified or else
“WITH TIES” is not of much use.
What is a Sub-Query?
A query nested inside a SELECT statement is known as a subquery and is an alternative
to complex join statements. A subquery combines data from multiple tables and returns
results that are inserted into the WHERE condition of the main query. A subquery is
always enclosed within parentheses and returns a column. A subquery can also be referred
to as an inner query and the main query as an outer query. JOIN gives better performance
than a subquery when you have to check for the existence of records.
For example, to retrieve all EmployeeID and CustomerID records from the ORDERS
table that have the EmployeeID greater than the average of the EmployeeID field, you
can create a nested query, as shown:
115
SELECT DISTINCT EmployeeID, CustomerID
FROM ORDERS
WHERE EmployeeID > (SELECT AVG(EmployeeID)
FROM ORDERS)
The WITH statement defines the CTE and later using the CTE name I have displayed the
CTE data.
117
Select year(orderdate),Status,isnull(Subtotal,0) from
purchasing.PURCHASEORDERHEADER
)
Select Status as OrderStatus,isnull([2001],0) as 'Yr 2001' ,isnull([2002],0) as 'Yr
2002' from PURCHASEORDERHEADERCTE
pivot (sum(Subtotal) for Orderdate in ([2001],[2002])) as pivoted
You can see from the above SQL the top WITH statement is the CTE supplied to the
PIVOT. After that PIVOT is applied on subtotal and orderdate. You have to specify in
what you want the pivot (here it is 2001 and 2002). So below is the output of CTE table.
After the PIVOT is applied you can see the rows are now grouped column wise with the
subtotal assigned to each. You can summarize that PIVOT summarizes your data in cross
tab format.
118
What is UNPIVOT?
It’s exactly the vice versa of PIVOT. That means you have a PIVOTED data and you
want to UNPIVOT it.
What is ROW_NUMBER()?
The ROW_NUMBER() function adds a column that displays a number corresponding
the row's position in the query result . If the column that you specify in the OVER clause
is not unique, it still produces an incrementing column based on the column specified in
the OVER clause. You can see in the figure below I have applied ROW_NUMBER function
over column col2 and you can notice the incrementing numbers generated.
What is RANK() ?
The RANK() function works much like the ROW_NUMBER() function in that it numbers
records in order. When the column specified by the ORDER BY clause contains unique
values, then ROW_NUMBER() and RANK() produce identical results. They differ in the
119
way they work when duplicate values are contained in the ORDER BY expression.
ROW_NUMBER will increment the numbers by one on every record, regardless of
duplicates. RANK() produces a single number for each value in the result set. You can
see for duplicate value it does not increment the row number.
What is DENSE_RANK()?
DENSE_RANK() works the same way as RANK() does but eliminates the gaps in the
numbering. When I say GAPS you can see in previous results it has eliminated 4 and 5
from the count because of the gap in between COL2. But for dense_rank it overlooks the
gap.
120
Figure 2.21 :- DENSE_RANK() in action
What is NTILE()?
NTILE() breaks the result set into a specified number of groups and assigns the same
number to each record in a group. Ok NTILE just groups depending on the number given
or you can say divides the data. For instance I have said to NTILE it to 3. It has 6 total
rows so it grouped in number of 2.
121
(DB)What is SQl injection ?
It is a Form of attack on a database-driven Web site in which the attacker executes
unauthorized SQL commands by taking advantage of insecure code on a system connected
to the Internet, bypassing the firewall. SQL injection attacks are used to steal information
from a database from which the data would normally not be available and/or to gain
access to an organization’s host computers through the computer that is hosting the
database.
SQL injection attacks typically are easy to avoid by ensuring that a system has strong
input validation.
As name suggest we inject SQL which can be relatively dangerous for the database.
Example this is a simple SQL
SELECT email, passwd, login_id, full_name
FROM members
WHERE email = 'x'
Now somebody does not put “x” as the input but puts “x ; DROP TABLE members;”.
So the actual SQL which will execute is :-
SELECT email, passwd, login_id, full_name
FROM members
WHERE email = 'x' ; DROP TABLE members;
Think what will happen to your database.
122
√ If there is an error in UDF its stops executing. But in SP’s it just ignores the
error and moves to the next statement.
√ UDF can not make permanent changes to server environments while SP’s can
change some of the server environment.
123
3. .NET Integration
What are steps to load a .NET code in SQL SERVER
2005?
Following are the steps to load a managed code in SQL SERVER 2005:-
√ Write the managed code and compile it to a DLL / Assembly.
√ After the DLL is compiled using the “CREATE ASSEMBLY” command you
can load the assembly in to SQL SERVER. Below is the create command
which is loading “mycode.dll” in to SQL SERVER using the “CREATE
ASSEMBLY” command.
CREATE ASSEMBLY mycode FROM 'c:/mycode.dll’
125
Does .NET controls SQL SERVER or is it vice-versa?
SQL Server controls the way .NET application will run. Normally .NET framework
controls the way application should run. But in order that we have high stability and good
security SQL Server will control the way .NET framework works in SQL Server
environment. So lot of things will be controlled through SQL Server example threads,
memory allocations, security etc .
SQL Server can control .NET framework by “Host Control” mechanism provided by
.NET Framework 2.0. Using the “Host Control” framework external application’s can
control the way memory management is done, thread allocation’s are done and lot more.
SQL Server uses this “Host Control” mechanism exposed by .NET 2.0 and controls the
framework.
126
go
EXEC sp_configure 'clr enabled' , '1'
go
reconfigure;
go
Note :- You can see after running the SQL “clr enabled” property is changed from 0 to 1 ,
which indicates that the CLR was successfully configured for SQL SERVER.
127
In previous versions of .NET it was done via COM interface
“ICorRuntimeHost”.
In pervious version you can only do the following with the COM interface.
√ Specify that whether its server or workstation DLL.
√ Specify version of the CLR (e.g. version 1.1 or 2.0)
√ Specify garbage collection behavior.
√ Specify whether or not jitted code may be shared across AppDomains.
128
Safe Access sandbox
This will be the favorite setting of DBA's if they are every compelled to run CLR - Safe
access. Safe means you have only access to in-proc data access functionalities. So you can
create stored procedures, triggers, functions, data types, triggers etc. But you can not
access memory, disk, create files etc. In short you can not hang the SQL Server.
129
Figure 3.4 :- Application Domain architecture
Note: - This can be pretty confusing during interviews so just make one note “One
Appdomain per Owner Identity per Database”.
130
What is Syntax for creating a new assembly in SQL
Server 2005?
CREATE ASSEMBLY customer FROM 'c:\customers\customer.dll'
What is Multi-tasking?
It’s a feature of modern operating systems with which we can run multiple programs at
same time example Word,Excel etc.
What is Multi-threading?
131
Multi-threading forms subset of Multi-tasking instead of having to switch between programs
this feature switches between different parts of the same program. Example you are
writing in word and at the same time word is doing a spell check in background.
What is a Thread ?
A thread is the basic unit to which the operating system allocates processor time.
132
so that .NET threads do not consume full resource and go out of control. SQL Server
introduced blocking points which allows this transition to happen between SQLCLR and
SQL Server threads.
133
many assemblies , so that SQL Server can be alerted to load those namespaces or not.
Example if you look at System.Windows you will see this attribute.
So during runtime SQL Server uses reflection mechanism to check if the assembly has
valid protection or not.
Note :- HostProtection is checked only when you are executing the assembly in SQL Server
2005.
External Access
It’s like safe but you can access network resources like files from network, file system,
DNS system, event viewer’s etc.
Unsafe
In unsafe code you can run anything you want. You can use PInvoke; call some external
resources like COM etc. Every DBA will like to avoid this and every developer should
avoid writing unsafe code unless very much essential. When we create an assembly we
can give the permission set at that time.
Note: - We had talked about sand boxes in the previous question. Just small note
sandboxes are expressed by using the permission level concepts.
134
Following are the checks done while the assembly is loaded in SQL Server:-
√ It does the META data and IL verification, to see that syntaxes are appropriate
of the IL.
√ If the assembly is marked as safe and external then following checks are done
! Check for static variables, it will only allow read-only static variables.
! Some attributes are not allowed for SQL Server and those attributes are
also checked.
! Assembly has to be type safe that means no unmanaged code or pointers
are allowed.
! No finalizer’s are allowed.
Note: - SQL Server checks the assembly using the reflection API, so the code should be IL
compliant.
You can do this small exercise to check if SQL Server validates your code or not. Compile
the simple below code which has static variable defined in it. Now because the static
variable is not read-only it should throw an error.
using System; namespace StaticDll { public class Class1 { static int i; } }
After you have compiled the DLL, use the Create Assembly syntax to load the DLL in
SQL Server. While cataloging the DLL you will get the following error:-
Msg 6211, Level 16, State 1, Line 1 CREATE ASSEMBLY failed because type
'StaticDll.Class1' in safe assembly 'StaticDll' has a static field 'i'. Attributes of static
fields in safe assemblies must be marked readonly in Visual C#, ReadOnly in Visual
Basic, or initonly in Visual C++ and intermediate language.
135
We can do a small practical hand on to see how the assembly table looks like. Let’s try to
create a simple class class1. Code is as shown below.
using System;
using System.Collections.Generic;
using System.Text;
namespace Class1
{
public class Class1
{
}
}
Then we create the assembly by name “X1” using the create assembly syntax. In above
image is the query output of all three main tables in this sequence sys.assemblies,
sys.assembly_files and sys.assembly_references.
Note :- In the second select statement we have a content field in which the actual binary
data stored. So even if we do not have the actual assembly it will load from this content
field.
136
Are two version of same assembly allowed in SQL
Server?
You can give different assembly name in the create statement pointing to the different file
name version.
137
}
}
Compiled the project success fully.
√ Using the create assembly cataloged it in SQL Server.
√ Later made the following changes to the class
public class clscustomer
{
Public void add(string code)
{
}
}
Note: - The add method signature is now changed.
√ After that using “Alter” we tried to implement the change.
Using alter syntax you can not change public method signatures, in that case you will
have to drop the assembly and re-create it again.
138
Note: - If the assembly is referencing any other objects like triggers, stored procedures,
UDT, other assemblies then the dependents should be dropped first, or else the drop
assembly will fail.
139
Then my stored procedure should be defined accordingly and in the same order. That means
in the stored procedure you should define it in the same order.
140
So what does that mean? Well if you define a .NET DLL and catalog it in SQL Server. All
the methods and class name are case sensitive and assembly is not case sensitive. For
instance I have cataloged the following DLL which has the following details:-
√ Assembly Name is “CustomerAssembly”.
√ Class Name in the “CustomerAssembly” is “ClsCustomer”.
√ Function “GetCustomerCount()” in class “ClsCustomer”.
When we catalog the above assembly in SQL Server. We can not address the
“ClsCustomer” with “CLSCUSTOMER” or function “GetCustomerCount()” with
“getcustomercount()” in SQL Server T-SQL language. But assembly “CustomerAssembly”
can be addressed by “customerassembly” or “CUSTOMERASSEMBLY”, in short the
assemblies are not case sensitive.
141
In .NET we declare decimal datatypes with out precision. But in SQL Server you can
define the precision part also.
decimal i; --> .NET Definition
decimal(9,2) --> SQL Server Definition
This creates a conflict when we want the .NET function to be used in T-SQL as SQLCLR
and we want the precision facility.
Here’s the answer you define the precision in SQL Server when you use Create syntax. So
even if .NET does not support the precision facility we can define the precision in SQL
Server.
.NET definition
func1(decimal x1)
{
}
SQL Server definition
create function func1(@x1 decimal(9,2))
returns decimal
as external name CustomerAssembly.[CustomerNameSpace.ClsCustomer].func1
If you see in the above code sample func1 is defined as simple decimal but later when we
are creating the function definition in SQL Server we are defining the precision.
142
But for “out” types of parameters there are no mappings defined. Its logical “out” types
of parameter types does not have any equivalents in SQL Server.
Note: - When we define byref in .NET that means if variable value is changed it will be
reflected outside the subroutine, so it maps to SQL Server input/output (OUT) parameters.
What is System.Data.SqlServer?
When you have functions, stored procedures etc written in .NET you will use this provider
rather than the traditional System.Data.SQLClient. If you are accessing objects created
using T-SQL language then you will need a connection to connect them. Because you
need to specify which server you will connect, what is the password and other credentials?
But if you are accessing objects made using .NET itself you are already residing in SQL
Server so you will not need a connection but rather a context.
What is SQLContext?
As said previously when use ADO.NET to execute a T-SQL created stored procedure we
are out of the SQL Server boundary. So we need to provide SQLConnection object to
connect to the SQLServer. But when we need to execute objects which are created using
.NET language we only need the context in which the objects are running.
143
Figure 3.7 : - SQLConnection and SQLContext
So you can see in the above figure SQLConnection is used because you are completely
outside SQL Server database. While SQLContext is used when you are inside SQL Server
database. That means that there is already a connection existing so that you can access
the SQLContext. And any connections created to access SQLContext are a waste as there
is already a connection opened to SQL Server.
These all things are handled by SQLContext.
Which are the four static methods of SQLContext?
Below are the four static methods in SQLContext:-
GetConnection() :- This will return the current connection
GetCommand() :- Get reference to the current batch
GetTransaction() :- If you have used transactions this will get the current transaction
GetPipe() :- This helps us to send results to client. The output is in Tabular Data stream
format. Using this method you can fill in datareader or data set, which can later be used
by client to display data.
Note: - In top question I had shown how we can manually register the DLL’s in SQL
Server but in real projects no body would do that rather we will be using the VS.NET
studio to accomplish the same. So we will run through a sample of how to deploy DLL’s
using VS.NET and paralelly we will also run through how to use SQLContext.
144
Let start step1 go to visual studio --> new project --> expand the Visual C# (+)--> select
database, you will see SQL Server project. Select SQL Server project template and give a
name to it, then click ok.
As these DLL’s need to be deployed on the server you will need to specify the server
details also. So for the same you will be prompted to specify database on which you will
deploy the .NET stored procedure. Select the database and click ok. In case you do not
see the database you can click on “Add reference” to add the database to the list.
145
Figure 3.9 : - Select database
Once you specify the database you are inside the visual studio.net editor. At the right
hand side you can see the solution explorer with some basic files created by visual studio
in order to deploy the DLL on the SQL Server. Right click on SQL Server project and
click on ADD --> New items are displayed as shown in figure below.
146
You can see in the below figure you can create different objects using VS.NET. For this
point of time we need to only create a stored procedure which will fetch data from
“Product.Product”.
This section is where the real action will happen. As said previously you do not need to
open a connection but use the context. So below are the three steps:-
• Get the reference of the context.
• Get the command from the context.
• Set the command text, at this moment we need to select everything from
“Production.Product” table.
• Finally get the Pipe and execute the command.
147
Figure 3.12 : - Simple code to retrieve product table
After that you need to compile it to a DLL form and then deploy the code in SQL Server.
You can compile using “Build Solution” menu to compile and “Deploy Solution” to deploy
it on SQL Server.
148
After deploying the solution you can see the stored procedure “SelectProductAll” in the
stored procedure section as shown below.
Just to test I have executed the stored procedure and everything working fine.
149
In order to create the function you have to select in Visual studio installed templates,
User defined function template. Below is the sample code. Then follow again the same
procedure of compiling and deploying the solution.
150
Figure 3.16 :- Function source code
151
SqlDataReader rdr = mycommand.EndExecuteReader(myResult);
Note: - Here’s a small project which you can do with Asynchronous processing. Fire a heavy
duty SQL and in UI show how much time the SQL Server took to execute that query.
INSTEAD OF triggers
INSTEAD OF triggers fire in place of the triggering action. For example, if an INSTEAD
OF UPDATE trigger exists on the Sales table and an UPDATE statement is executed
against the Salestable, the UPDATE statement will not change a row in the sales table.
Instead, the UPDATE statement causes the INSTEAD OF UPDATE trigger to be
executed, which may or may not modify data in the Sales table.
AFTER triggers
AFTER triggers execute following the SQL action, such as an insert, update, or delete.This
is the traditional trigger which existed in SQL SERVER.
INSTEAD OF triggers gets executed automatically before the Primary Key and the Foreign
Key constraints are checked, whereas the traditional AFTER triggers gets executed after
these constraints are checked.
Unlike AFTER triggers, INSTEAD OF triggers can be created on views.
153
[ WITH option [ ,,...n ] ]
A description of the components of the statement follows.
msg_id :-The ID for an error message, which is stored in the error column in sysmessages.
msg_str :-A custom message that is not contained in sysmessages.
severity :- The severity level associated with the error. The valid values are 0–25. Severity
levels 0–18 can be used by any user, but 19–25 are only available to members of the
fixed-server role sysadmin. When levels 19–25 are used, the WITH LOG option is required.
state A value that indicates the invocation state of the error. The valid values are 0–127.
This value is not used by SQL Server.
Argument, . . .
One or more variables that are used to customize the message. For example, you could
pass the current process ID (@@SPID) so it could be displayed in the message.
WITH option, . . .
The three values that can be used with this optional argument are described here.
LOG - Forces the error to logged in the SQL Server error log and the NT application log.
NOWAIT - Sends the message immediately to the client.
SETERROR - Sets @@ERROR to the unique ID for the message or 50,000.
The number of options available for the statement make it seem complicated, but it is
actually easy to use. The following shows how to create an ad hoc message with a severity
of 10 and a state of 1.
RAISERROR ('An error occured updating the NonFatal table',10,1)
--Results--
An error occured updating the NonFatal table
The statement does not have to be used in conjunction with any other code, but for our
purposes it will be used with the error handling code presented earlier. The following
alters the ps_NonFatal_INSERT procedure to use RAISERROR.
USE tempdb
go
154
ALTER PROCEDURE ps_NonFatal_INSERT
@Column2 int =NULL
AS
DECLARE @ErrorMsgID int
INSERT NonFatal VALUES (@Column2)
SET @ErrorMsgID =@@ERROR
IF @ErrorMsgID <>0
BEGIN
RAISERROR ('An error occured updating the NonFatal table',10,1)
END
When an error-producing call is made to the procedure, the custom message is passed to
the client. The following shows the output generated by Query Analyzer.
155
4. ADO.NET
Which are namespaces for ADO.NET?
Following are the namespaces provided by .NET for data management:-
System.data
This contains the basic objects used for accessing and storing relational data, such as
DataSet, DataTable and DataRelation. Each of these is independent of the type of data
source and the way we connect to it.
System.Data.OleDB
This contains the objects that we use to connect to a data source via an OLE-DB provider,
such as OleDbConnection, OleDbCommand, etc. These objects inherit from the common
base classes and so have the same properties, methods, and events as the SqlClient
equivalents.
System.Data.SqlClient:
This contains the objects that we use to connect to a data source via the Tabular Data
Stream (TDS) interface of Microsoft SQL Server (only). This can generally provide better
performance as it removes some of the intermediate layers required by an OLE-DB
connection.
System.XML
This Contains the basic objects required to create, read, store, write, and manipulate
XML documents according to W3C recommendations.
156
√ Data Adapter (This object acts as a bridge between datastore and dataset).
√ Datareader (This object reads data from data store in forward only mode).
Dataset object represents disconnected and cached data.If you see the diagram it is not in
direct connection with the data store (SQL SERVER , ORACLE etc) rather it talks with
Data adapter , who is responsible for filling the dataset.Dataset can have one or more
Datatable and relations.
157
Datareader and Dataset are the two fundamental objects in ADO.NET.
158
√ ExecuteNonQuery :- Executes the command defined in the CommandText
property against the connection defined in the Connection property for a query
that does not return any rows (an UPDATE, DELETE or INSERT). Returning
an Integer indicating the number of rows affected by the query.
√ ExecuteReader :- Executes the command defined in the CommandText property
against the connection defined in the Connection property. Returns a "reader"
object that is connected to the resulting rowset within the database, allowing
the rows to be retrieved.
√ ExecuteScalar :- Executes the command defined in the CommandText property
against the connection defined in the Connection property. Returns only a
single value (effectively the first column of the first row of the resulting rowset).
Any other returned columns and rows are discarded. Fast and efficient when
only a "singleton" value is required
159
UpdateBatch method provided by the ADO Recordset object, but in the DataSet it can
be used to update more than one table.
160
objOLEDBCommand = New OleDbCommand(“Select FirstName
from Employees”)
objOLEDBCon.Open()
objOLEDBCommand.Connection = objOLEDBCon
objOLEDBReader = objOLEDBCommand.ExecuteReader()
Do While objOLEDBReader.Read()
lstNorthwinds.Items.Add(objOLEDBReader.GetString(0))
Loop
Catch ex As Exception
Throw ex
Finally
objOLEDBCon.Close()
End Try
End Sub
What’s the namespace to connect to SQL Server?
Below is a sample code which shows a simple connection with SQL Server.
Private Sub LoadData()
‘ note :- with and end with makes your code more readable
Dim strConnectionString As String
Dim objConnection As New SqlConnection
Dim objCommand As New SqlCommand
Dim objReader As SqlDataReader
Try
‘ this gets the connectionstring from the app.config
file.
‘ note if this gives error see where the MDB file is
stored in your pc and point to that
strConnectionString =
AppSettings.Item(“ConnectionString”)
‘ take the connectiostring and initialize the connection
object
With objConnection
.ConnectionString = strConnectionString
.Open()
End With
objCommand = New SqlCommand(“Select FirstName from
Employees”)
With objCommand
.Connection = objConnection
objReader = .ExecuteReader()
End With
161
‘ looping through the reader to fill the list box
Do While objReader.Read()
lstData.Items.Add(objReader.Item(“FirstName”))
Loop
Catch ex As Exception
Throw ex
Finally
objConnection.Close()
End Try
Now from interview point of view definitely you are not going to say the whole source
code which is given in book. Interviewer expects only the broader answer of what are the
steps needed to connect to SQL SERVER. For fundamental sake author has explained
the whole source code. In short you have to explain the “LoadData” method in broader
way. Following are the steps to connect to SQL SERVER :-
√ First is import the namespace “System.Data.SqlClient”.
√ Create a connection object as shown in “LoadData” method.
With objConnection
.ConnectionString = strConnectionString
.Open()
End Withs
√ Create the command object with the SQL.Also assign the created connection
object to command object. and execute the reader.
objCommand = New SqlCommand(“Select FirstName from Employees”)
With objCommand
.Connection = objConnection
objReader = .ExecuteReader()
End With
√ Finally loop through the reader and fill the list box.If old VB programmers are
expecting the movenext command it’s replaced by Read() which returns true
if there is any data to be read.If the .Read() return’s false that means that it’s
end of datareader and there is no more data to be read.
Do While objReader.Read()
lstData.Items.Add(objReader.Item(“FirstName”))
Loop
√ Finally do not forget to close the connection object.
162
ADO.NET provides the SqlCommand object which provides the functionality of executing
stored procedures.
If txtEmployeeName.Text.Length = 0 Then
objCommand = New SqlCommand(“SelectEmployee”)
Else
objCommand = New SqlCommand(“SelectByEmployee”)
objCommand.Parameters.Add(“@FirstName”,
Data.SqlDbType.NVarChar, 200)
objCommand.Parameters.Item(“@FirstName”).Value =
txtEmployeeName.Text.Trim()
End If
In the above sample not lot has been changed only that the SQL is moved to the stored
procedures. There are two stored procedures one is “SelectEmployee” which selects all
the employees and the other is “SelectByEmployee” which returns employee name starting
with a specific character. As you can see to provide parameters to the stored procedures
we are using the parameter object of the command object. In such question interviewer
expects two simple answers one is that we use command object to execute stored procedures
and the parameter object to provide parameter to the stored procedure. Above sample is
provided only for getting the actual feel of it. Be short, be nice and get a job.
End Sub
In such type of question’s interviewer is looking from practical angle, that have you
worked with dataset and datadapters. Let me try to explain the above code first and then
we move to what steps to say suring interview.
Dim objConn As New SqlConnection(strConnectionString)
objConn.Open()
First step is to open the connection.Again note the connection string is loaded from
config file.
Dim objCommand As New SqlCommand(“Select FirstName from Employees”)
objCommand.Connection = objConn
Second step is to create a command object with appropriate SQL and set the connection
object to this command.
Dim objDataAdapter As New SqlDataAdapter()
objDataAdapter.SelectCommand = objCommand
164
Third step is to create the Adapter object and pass the command object to the adapter
object.
objDataAdapter.Fill(objDataSet)
Fourth step is to load the dataset using the “Fill” method of the dataadapter.
lstData.DataSource = objDataSet.Tables(0).DefaultView
lstData.DisplayMember = “FirstName”
lstData.ValueMember = “FirstName”
Fifth step is to bind to the loaded dataset with the GUI.At this moment sample has
listbox as the UI. Binding of the UI is done by using DefaultView of the dataset.Just to
revise every dataset has tables and every table has views. In this sample we have only
loaded one table i.e. Employees table so we are referring that with a index of zero.
Just say all the five steps during interview and you will see the smile in the interviewer’s
face.....Hmm and appointment letter in your hand.
GetChanges
Return’s dataset which are changed since it was loaded or since Acceptchanges was
executed.
HasChanges
This property indicates has any changes been made since the dataset was loaded or
“acceptchanges” method was executed.
If we want to revert or abandon all changes since the dataset was loaded use
“RejectChanges”.
Note:- One of the most misunderstood things about these properties is that it tracks the
changes of actual database. That’s a fundamental mistake; actually the changes are related
to only changes with dataset and has nothing to with changes happening in actual database.
As dataset are disconnected and do not know anything about the changes happening in
actual database.
Add
Add’s a new row in DataTable
Remove
Remove’s a “DataRow” object from “DataTable”
166
RemoveAt
Remove’s a “DataRow” object from “DataTable” depending on index position of the
“DataTable”.
Find
Take’s an array of values and returns the index of the row.
FindRow
This also takes array of values but returns a collection of “DataRow”.
If we want to manipulate data of “DataTable” object create “DataView” (Using the
“DefaultView” we can create “DataView” object) of the “DataTable” object. and use
the following functionalities :-
AddNew
Add’s a new row to the “DataView” object.
Delete
Delete the specified row from “DataView” object.
167
√ “DataSet” is a disconnected architecture, while “DataReader” has live
connection while reading data. So if we want to cache data and pass to a
different tier “DataSet” forms the best choice and it has decent XML support.
√ When application needs to access data from more than one table “DataSet”
forms the best choice.
√ If we need to move back while reading record’s, “datareader” does not support
this functionality.
√ But one of the biggest drawbacks of DataSet is speed. As “DataSet” carry
considerable overhead because of relations, multiple tables etc speed is slower
than “DataReader”.Always try to use “DataReader” wherever possible, as
it’s meant specially for speed performance.
169
updating and any mismatch in timestamp it will not update the records. This is
the best practice used by industries for locking.
Update table1 set field1=@test where LastTimeStamp=@CurrentTimeStamp
√ Check for original values stored in SQL SERVER and actual changed values.
In stored procedure check before updating that the old data is same as the
current. Example in the below shown SQL before updating field1 we check
that is the old field1 value same. If not then some one else has updated and
necessary action has to be taken.
Update table1 set field1=@test where field1 = @oldfield1value
Locking can be handled at ADO.NET side or at SQL SERVER side i.e. in stored
procedures.for more details of how to implementing locking in SQL SERVER read “What
are different locks in SQL SERVER?” in SQL SERVER chapter.
Note:- This is one of the favorite question’s of interviewer, so cram it....When I say cram
it i do not mean it.... I mean understand it. This book has tried to cover ADO.NET as
much as possible, but indeterminist nature of ADO.NET interview questions makes it
difficult to make full justice. But hope so that the above questions will make you quiet
confident during interviews.
170
√ Commit or roll back the transaction using the Commit or Rollback method of
the transaction object.
√ Close the database connection.
171
5. Notification Services
What are notification services?
Notification services help you to deliver messaging application which can deliver
customized messages to huge group of subscribers.
In short it’s a software application which sits between the information and the recipient.
172
√ Subscriptions: - Subscriptions are nothing but user showing interest in certain
events and registering them to the events for information. For instance a user
may subscribe to “heavy rainfall” event. In short subscription links the user
and the event.
√ Notifications: - Notification is the actual action which takes place that is
message is sent to the actual user who has shown interest in the event.
Notification can be in various formats and to variety of devices.
√ Notification engine: - This is the main coordinator who will monitor for any
events; if any event occurs it matches with the subscribers and sends the
notifications.
In short notification engine is the central engine which manages “Events”, “Subscriptions”
and “Notifications”.
173
(DB)Can you explain architecture of Notification
Services?
Note: - This will go more as a DBA question.
Following are the detail sections in SQL notification services:-
Notification Service application:- It’s a simple user application which will be used to add
subscription to the subscription database.
Event providers :- All events reside in Event providers. There are two event providers
which are provided by default “File System watcher” and “SQL Server event provider”.
“File System watcher” detects changes in operating system files. “SQL Server event
provider” watches for SQL Server or analysis service database for change. You can also
plug in custom event providers. When event providers find any change in database they
send the event in “Event” table.
Generator :- Generator checks the event database, when it finds any event it tries to
match with the subscription and sends it to the notification database. So generator in
short is the decision maker between the subscribers and the events.
Distributor: - Distributor continuously pools the “Notification” database for any
“Notification’s” to be processed. If the distributor finds any entry it retrieves it and formats
it so that it can be delivered to the end recipient. Formatting is normally done using
“XML” and “XSLT” for rendering purpose. After the formatting is done it is then pushed
to the “distribution providers”. They are nothing but medium of delivery. There are three
built-in providers:-
√ SMTP provider
√ File provider
√ HTTP provider
174
Figure 5.3 : - Detail architecture of SQL notification services.
175
ADF file describes the event, subscription, rules, and notification structure that will be
employed by the Notification Services application.
After these files have been defined you have to load the ADF file using the command line
utility or the UI provided by SQL Server 2005. Click on the server browser as show
below and expand the “Notification Services”, the right click to bring up the “Notification
Services” dialog box.
You will also have to code the logic to add subscription here’s a small sample.
176
using Microsoft.SqlServer.NotificationServices;
using System.Text;
public class NSSubscriptions
{
private string AddSubscription(string instanceName, string
applicationName, string subscriptionClassName, string subscriberId)
{
NSInstance myNSInstance = new NSInstance(instanceName);
NSApplication myNSApplication = new NSApplication
(myNSInstance, applicationName);
Subscription myNSSubscription = new Subscription
(myNSApplication, subscriptionClassName);
myNSSubscription.Enabled = true;
myNSSubscription.SubscriberId = subscriberId;
myNSSubscription["Emailid"] = "shiv_koirala@yahoo.com";
string subscriptionId = myNSSubscription.Add();
return subscriptionId;
}
}
Note: - As this been an interview book it’s beyond the scope of the book to go in to detail
of how to create notification. It’s better to create a small sample using MSDN and get some
fundamentals clear of how practically “Notification services” are done. Try to understand
the full format of both the XML files.
177
For creating notification services you can either use the dialog box of notification service
or use the command line utility “Nscontrols” command. So just in short “Nscontrol” is a
command-line tool that’s used to create and administer Notification Services applications.
Note: - You can refer MSDN for “Nscontrol” commands.
178
6. Service Broker
What do we need Queues?
There are instances when we expect that the other application with which we are interacting
are not available. For example when you chat on messaging system like yahoo, MSN, ICQ
etc, you do not expect that the other users will be guaranteed online. So there is where we
need queues. So during chatting if the user is not online all the messages are sent to a
queue. Later when the user comes online he can read all messages from the queue.
179
message is part of a conversation and it has a unique identifier as well as a unique sequence
number to enforce message ordering.
√ Dialog
Dialog ensure messages to be read in the same order as they where put in to queue between
endpoints. In short it ensures proper ordered sequence of events at both ends for a message.
√ Conversation Group
Conversation Group is a logical grouping of Dialog. To complete a task you can need one
or more dialog. For instance an online payment gateway can have two Dialog’s first is the
“Address Check” and second is the “Credit Card Number” validation, these both dialog
form your complete “Payment process”. So you can group both the dialogs in one
Conversation Group.
√ Message Transport
Message transport defines how the messages will be send across networks. Message
transport is based on TCP/IP and FTP. There are two basic protocols “Binary Adjacent
Broker Protocol” which is like TCP/IP and “Dialog Protocol” which like FTP.
180
√ Further you have to assign these Messagetype to Contract. Messagetype is grouped
in Contracts. Contract is an entity which describes messages for a particular Dialog.
So a contract can have multiple messagetype’s.
√ Contracts are further grouped in service. Service has all the dialogs needed to complete
one process.
√ Service can further be attached to multiple queues. Service is the basic object from
SQL Server Service broker point of view.
√ So when any client wants to communicate with a queue he opens a dialog with the
service.
181
Figure 6.2 : - Message, contract and service
Above figure shows how SQL Server Service broker works. Client who want to use the
queues do not have to understand the complexity of queues. They only communicate
with the logical view of SQL Server Service broker objects (Messages, Contracts and
Services). In turn these objects interact with the queues below and shield the client from
any physical complexities of queues.
Below is a simple practical implementation of how this works. Try running the below
statements from a T-SQL and see the output.
-- Create a Message type and do not do any data type validation for this
CREATE MESSAGE TYPE MessageType
VALIDATION = NONE
182
GO
-- Create Message contract what type of users can send these messages at this moment we are
defining current as an initiator
CREATE CONTRACT MessageContract
(MessageType SENT BY INITIATOR)
GO
-- Declare the two end points that’s sender and receive queues
CREATE QUEUE SenderQ
CREATE QUEUE ReceiverQ
GO
-- Create service and bind them to the queues
CREATE SERVICE Sender
ON QUEUE SenderQ
CREATE SERVICE Receiver
ON QUEUE ReceiverQ (MessageContract)
GO
-- Send message to the queue
DECLARE @conversationHandle UNIQUEIDENTIFIER
DECLARE @message NVARCHAR(100)
BEGIN
BEGIN TRANSACTION;
BEGIN DIALOG @conversationHandle
FROM SERVICE Sender
TO SERVICE 'Receiver'
ON CONTRACT MessageContract
183
-- Sending message
SET @message = N'SQL Server Interview Questions by Shivprasad Koirala';
SEND ON CONVERSATION @conversationHandle
MESSAGE TYPE MessageType (@message)
COMMIT TRANSACTION
END
GO
-- Receive a message from the queue
RECEIVE CONVERT(NVARCHAR(max), message_body) AS message
FROM ReceiverQ
-- Just dropping all the object so that this sample can run successfully
DROP SERVICE Sender
DROP SERVICE Receiver
DROP QUEUE SenderQ
DROP QUEUE ReceiverQ
DROP CONTRACT MessageContract
DROP MESSAGE TYPE MessageType
GO
After executing the above T-SQL command you can see the output below.
184
Figure 6.3 : - Output of the above sample
Note:- In case your SQL Server service broker is not active you will get the following error
as shown below. In order to remove that error you have to enable the service broker by using
Alter Database [DatabaseName] set Enable_broker
At this moment I have created all these samples in the sample database
“AdventureWorks”.
185
Figure 6.4 : - Error Service broker not active
186
7. XML Integration
Note: - In this chapter we will first just skim through basic XML interview questions so
that you do not get stuck up with simple questions.
What is XML?
XML (Extensible markup language) is all about describing data. Below is a XML which
describes invoice data.
<?xml version="1.0" encoding="ISO-8859-1"?>
<invoice>
<productname>Shoes</productname>
<qty>12</qty>
<totalcost>100</totalcost>
<discount>10</discount>
</invoice>
An XML tag is not something predefined but it is something you have to define according
to your needs. For instance in the above example of invoice all tags are defined according
to business needs. The XML document is self explanatory, any one can easily understand
looking at the XML data what exactly it means.
187
Is XML case sensitive?
Yes, they are case sensitive.
188
What is a valid XML?
If XML is confirming to DTD rules then it’s a valid XML.
What is CSS?
With CSS you can format a XML document.
What is XSL?
XSL (the eXtensible Stylesheet Language) is used to transform XML document to some
other document. So its transformation document which can convert XML to some other
document. For instance you can apply XSL to XML and convert it to HTML document or
probably CSV files.
189
Figure 7.1 : - Specify XML data type
After you have created the schema you see the MYXSD schema in the schema collections
folder.
Figure 7.2 : - You can view the XSD in explorer of Management Studio
When you create the XML data type you can assign the MyXsd to the column.
191
How do I insert in to a table which has XSD schema
attached to it?
I know many developers will just say what the problem with simple insert statement. Well
guys its not easy with attaching the XSD its now a well formed datatype.The above table
I have named as xmltable. So we had specified in the schema two nodes one is ordered
and the other customername. So here’s the insert.
Insert into xmltable values ('<MyXSD xmlns="http://MyXSD"><Orderid>1</
Orderid><CustomerName>Shiv</CustomerName></MyXSD>')
What is Xquery?
In a typical XML table below is the type of data which is seen. Now I want to retrieve
orderid “4”. I know many will jump up with saying use the “LIKE” keyword. Ok you say
that interviewer is very sure that you do not know the real power of XML provided by
SQL Server.
Well first thing XQUERY is not that something Microsoft invented, it’s a language defined
by W3C to query and manipulate data in a XML. For instance in the above scenario we
can use XQUERY and drill down to specific element in XML.
So to drill down here’s the XQUERY
192
SELECT * FROM xmltable
WHERE TestXml.exist('declare namespace
xd=http://MyXSD/xd:MyXSD[xd:Orderid eq "4"]') = 1
Note: - It’s out of the scope of this book to discuss XQUERY. I hope and only hope guys
many interviewers will not bang in this section. In case you have doubt visit www.w3c.org or
SQL Server books online they have a lot of material in to this.
193
I have huge XML file which we want to load in
database?
Twist: - Can I do a BULK load of XML in database?
Below is the SQL statement which will insert from “MyXml.xml” in to “MyTable”.
INSERT into MyTable(MyXMlColumn) SELECT * FROM OPENROWSET
(Bulk 'c:\MyXml.xml', SINGLE_CLOB) as abc
194
BATCHES = ENABLED,
WSDL = DEFAULT,
DATABASE = 'AdventureWorks',
NAMESPACE = 'http://AdventureWorks/TotalSales'
)
What is XMLA?
XMLA stand for XML for Analysis Services. Analysis service is covered in depth in data
mining and data ware housing chapters. Using XMLA we can expose the Analysis service
data to the external world in XML. So that any data source can consume it as XML is
universally known.
195
8. Data Warehousing/Data Mining
Note: - “Data mining” and “Data Warehousing” are concepts which are very wide and it’s
beyond the scope of this book to discuss it in depth. So if you are specially looking for a
“Data mining / warehousing” job its better to go through some reference books. But below
questions can shield you to some good limit.
196
Twist: - What is Star Schema Design?
When we design transactional database we always think in terms of normalizing design
to its least form. But when it comes to designing for Data warehouse we think more in
terms of “denormalizing” the database. Data warehousing databases are designed using
“Dimensional Modeling”. Dimensional Modeling uses the existing relational database
structure and builds on that.
There are two basic tables in dimensional modeling:-
√ Fact Tables.
√ Dimension Tables.
Fact tables are central tables in data warehousing. Fact tables have the actual aggregate
values which will be needed in a business process. While dimension tables revolve around
fact tables. They describe the attributes of the fact tables. Let’s try to understand these
two conceptually.
197
Figure 8.2 : - Dimensional Modeling
In the above example we have three tables which are transactional tables:-
√ Customer: - It has the customer information details.
√ Salesperson: - Sales person who are actually selling products to customer.
√ CustomerSales: - This table has data of which sales person sold to which
customer and what was the sales amount.
Below is the expected report Sales / Customer / Month. You will be wondering if we
make a simple join query from all three tables we can easily get this output. But imagine
if you have huge records in these three tables it can really slow down your reporting
process. So we introduced a third dimension table “CustomerSalesByMonth” which will
have foreign key of all tables and the aggregate amount by month. So this table becomes
198
the dimension table and all other tables become fact tables. All major data warehousing
design use Fact and Dimension model.
199
Figure 8.4 : - Snow Flake Schema
Extraction:-
In this process we extract data from the source. In actual scenarios data source can be in
many forms EXCEL, ACCESS, Delimited text, CSV (Comma Separated Files) etc. So
extraction process handle’s the complexity of understanding the data source and loading
it in a structure of data warehouse.
200
Transformation:-
This process can also be called as cleaning up process. It’s not necessary that after the
extraction process data is clean and valid. For instance all the financial figures have NULL
values but you want it to be ZERO for better analysis. So you can have some kind of
stored procedure which runs through all extracted records and sets the value to zero.
Loading:-
After transformation you are ready to load the information in to your final data warehouse
database.
201
“Data mining” is a concept by which we can analyze the current data from different
perspectives and summarize the information in more useful manner. It’s mostly used
either to derive some valuable information from the existing data or to predict sales to
increase customer market.
There are two basic aims of “Data mining”:-
√ Prediction: - From the given data we can focus on how the customer or market
will perform. For instance we are having a sale of 40000 $ per month in India,
if the same product is to be sold with a discount how much sales can the
company expect.
√ Summarization: - To derive important information to analyze the current
business scenario. For example a weekly sales report will give a picture to the
top management how we are performing on a weekly basis?
202
Figure 8.6 : - Data Warehouse and Data mining
The above figure gives a picture how these concepts are quiet different. “Data Warehouse”
collects cleans and filters data through different sources like “Excel”, “XML” etc. But
“Data Mining” sits on the top of “Data Warehouse” database and generates intelligent
reports. Now either it can export to a different database or just generate report using some
reporting tool like “Reporting Services”.
What is BCP?
Note: - It’s not necessary that this question will be asked for data mining. But if a
interviewer wants to know your DBA capabilities he will love to ask this question. If he is
a guy who has worked from the old days of SQL Server he will expect this to be answered.
There are times when you want to move huge records in and out of SQL Server, there’s
where this old and cryptic friend will come to use. It’s a command line utility. Below is the
detail syntax:-
bcp {[[<database name>.][<owner>].]{<table name>|<view name>}|"<query>"}
{in | out | queryout | format} <data file>
[-m <maximum no. of errors>] [-f <format file>] [-e <error file>]
[-F <first row>] [-L <last row>] [-b <batch size>]
203
[-n] [-c] [-w] [-N] [-V (60 | 65 | 70)] [-6]
[-q] [-C <code page>] [-t <field term>] [-r <row term>]
[-i <input file>] [-o <output file>] [-a <packet size>]
[-S <server name>[\<instance name>]] [-U <login id>] [-P <password>]
[-T] [-v] [-R] [-k] [-E] [-h "<hint> [,...n]"]
UUUHH Lot of attributes there. But during interview you do not have to remember so
much. Just remember that BCP is a utility with which you can do import and export of
data.
204
Figure 8.7 : - After executing BCP command prompts for some properties
205
FMT file is basically the format file for BCP to govern how it should map with tables.
Lets say, in from our salesperson table we want to eliminate commissionpct, salesytd and
saleslastyear. So you have to modify the FMT file as shown below. We have made the
values zero for the fields which has to be eliminated.
If we want to change the sequence you have to just change the original sequence number.
For instance we have changed the sequence from 9 to 5 --> 5 to 9 , see the figure below.
206
Once you have changed the FMT file you can specify the .FMT file in the BCP command
arguments as shown below.
bcp adventureworks.sales.salesperson in c:\salesperson.txt -
c:\bcp.fmt -T
Note: - we have given the .FMT file in the BCP command.
207
[[,] KEEPIDENTITY ]
[[,] KEEPNULLS ]
[[,] KILOBYTES_PER_BATCH [ = kilobytes_per_batch ]]
[[,] LASTROW [ = last_row ]]
[[,] MAXERRORS [ = max_errors ]]
[[,] ORDER ( { column [ ASC | DESC ]}[ ,…n ])]
[[,] ROWS_PER_BATCH [ = rows_per_batch ]]
[[,] ROWTERMINATOR [ = ‘row_terminator’ ]]
[[,] TABLOCK ]
)]
Below is a simplified version of bulk insert which we have used to import a comma
separated file in to “SalesPersonDummy”. The first row is the column name so we specified
start importing from the second row. The other two attributes define how the fields and
rows are separated.
bulk insert adventureworks.sales.salespersondummy from 'c:\salesperson.txt' with
(
FIRSTROW=2,
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
What is DTS?
Note :- It’s now a part of integration service in SQL Server 2005.
DTS provides similar functionality as we had with BCP and Bulk Import. There are two
major problems with BCP and Bulk Import:-
√ BCP and Bulk import do not have user friendly User Interface. Well some DBA does
still enjoy using those DOS prompt commands which makes them feel doing
something worthy.
208
√ Using BCP and Bulk imports we can import only from files, what if we wanted to
import from other database like FoxPro, access, and oracle. That is where DTS is
the king.
√ One of the important things that BCP and Bulk insert misses is transformation,
which is one of the important parts of ETL process. BCP and Bulk insert allows you
to extract and load data, but does not provide any means by which you can do
transformation. So for example you are getting sex as “1” and “2”, you would like to
transform this data to “M” and “F” respectively when loading in to data warehouse.
√ It also allows you do direct programming and write scripts by which you can have
huge control over loading and transformation process.
√ It allows lot of parallel operation to happen. For instance while you are reading data
you also want the transformation to happen in parallel , then DTS is the right choice.
You can see DTS Import / Export wizard in the SQL Server 2005 menu.
Note: - DTS is the most used technology when you are during Data warehousing using
SQL Server. In order to implement the ETL fundamental properly Microsoft has rewritten
the whole DTS from scratch using .NET and named it as “Integration Services”. There is
a complete chapter which is dedicated to “Integration Services” which will cover DTS
indirectly in huge details. Any interviewer who is looking for data warehousing professional
in SQL Server 2005 will expect that candidates should know DTS properly.
209
Note: - This question is the trickiest and shoot to have insight, from where the interviewer
would like to spawn question threads. If you have worked with a data warehouse project
you can be very sure of this. If not then you really have to prepare a project to talk
about…. I know it’s unethical to even talk in books but?
I leave this to readers as everyone would like to think of a project of his own. But just try
to include the ETL process which every interviewer thinks should be followed for a data
warehouse project.
210
√ Transactions are mainly batch transactions which are running so there are no huge
volumes of transaction.
√ Do not need to have recovery process as such until the project specifies specifically.
211
return results. As we are not going through any joins (because data is in denormalized
form) SQL queries are executed faster and in more optimized way.
MOLAP
Multidimensional OLAP (MOLAP) stores dimension and fact data in a persistent data
store using compressed indexes. Aggregates are stored to facilitate fast data access.
MOLAP query engines are usually proprietary and optimized for the storage format used
by the MOLAP data store. MOLAP offers faster query processing than ROLAP and
usually requires less storage. However, it doesn’t scale as well and requires a separate
database for storage.
ROLAP
Relational OLAP (ROLAP) stores aggregates in relational database tables. ROLAP use
of the relational databases allows it to take advantage of existing database resources,
plus it allows ROLAP applications to scale well. However, ROLAP’s use of tables to
store aggregates usually requires more disk storage than MOLAP, and it is generally not
as fast.
HOLAP
As its name suggests, hybrid OLAP (HOLAP) is a cross between MOLAP and ROLAP.
Like ROLAP, HOLAP leaves the primary data stored in the source database. Like MOLAP,
HOLAP stores aggregates in a persistent data store that’s separate from the primary
relational database. This mix allows HOLAP to offer the advantages of both MOLAP
212
and ROLAP. However, unlike MOLAP and ROLAP, which follow well-defined standards,
HOLAP has no uniform implementation.
213
Figure 8.12 : - Single Dimension view.
The above table gives a three dimension view; you can have more dimensions according
to your depth of analysis. Like from the above multi-dimension view I am able to predict
that “Calcutta” is the only place where “Shirts” and “Caps” are selling, other metros do
not show any sales for this product.
214
(DB)What is MDX?
MDX stands for multi-dimensional expressions. When it comes to viewing data from
multiple dimensions SQL lacks many functionalities, there’s where MDX queries are useful.
MDX queries are fired against OLAP data bases. SQL is good for transactional databases
(OLTP databases), but when it comes to analysis queries MDX stands the top.
Note: - If you are planning for data warehousing position using SQL Server 2005, MDX
will be the favorite of the interviewers. MDX itself is such a huge and beautiful beast that
we cannot cover in this small book. I will suggest at least try to grab some basic syntaxes of
MDX like select before going to interview.
215
Once you are ok with requirement its time to select which tools can do good work for
you. This book only focuses on SQL Server 2005, but in reality there are many tools for
data warehousing. Probably SQL Server 2005 will sometimes not fit your project
requirement and you would like to opt for something else.
√ Data Modeling and design
This where the actual designing takes place. You do conceptual and logical designing of
your database, star schema design.
√ ETL Process
This forms the major part for any data warehouse project. Refer previous section to see
what an ETL process is. ETL is the execution phase for a data warehouse project. This is
the place where you will define your mappings, create DTS packages, define work flow,
write scripts etc. Major issue when we do ETL process is about performance which should
be considered while executing this process.
Note: - Refer “Integration Services” for how to do the ETL process using SQL Server
2005.
√ OLAP Cube Design
This is the place where you define your CUBES, DIMENSIONS on the data warehouse
database which was loaded by the ETL process. CUBES and DIMENSIONS are done by
using the requirement specification. For example you see that customer wants a report
“Sales Per month” so he can define the CUBES and DIMENSIONS which later will be
absorbed by the front end for viewing it to the end user.
√ Front End Development
Once all your CUBES and DIMENSIONS are defined you need to present it to the user.
You can build your front ends for the end user using C#, ASP.NET, VB.NET any language
which has the ability to consume the CUBES and DIMENSIONS. Front end stands on
top of CUBES and DIMENSION and delivers the report to the end users. With out any
front end the data warehouse will be of no use form user’s perspective.
√ Performance Tuning
Many projects tend to overlook this process. But just imagine a poor user sitting to view
“Yearly Sales” for 10 minutes….frustrating no. There are three sections where you can
really look why your data warehouse is performing slow:-
216
! While data is loading in database “ETL” process.
This is probably the major area where you can optimize your database. The best is to
look in to DTS packages and see if you can make it better to optimize speed.
! OLAP CUBES and DIMENSIONS.
CUBES and DIMENSIONS are something which will be executed against the data
warehouse. You can look in to the queries and see if some optimization can be
done.
! Front end code.
Front end are mostly coded by programmers and this can be a major bottle neck for
optimization. So you can probably look for loops and you also see if the front end is
running too far away from the CUBES.
√ User Acceptance Test ( UAT )
UAT means saying to the customer “Is this product ok with you?”. It’s a testing phase
which can be done either by the customer (and mostly done by the customer) or by your
own internal testing department to ensure that its matches with the customer requirement
which was gathered during the requirement phase.
√ Rolling out to Production
Once the customer has approved your UAT, its time to roll out the data ware house in
production so that customer can get the benefit of it.
√ Production Maintenance
I know the most boring aspect from programmer’s point of view, but the most profitable
for an IT company point of view. In data warehousing this will mainly involve doing back
ups, optimizing the system and removing any bugs. This can also include any enhancements
if the customer wants it.
217
Figure 8.14 : - Data ware house project life cycle
218
√ Requirement phase: - System Requirement documents, Project management plan,
Resource allocation plan, Quality management document, Test plans and Number
of reports the customer is looking at. I know many people from IT will start raising
there eye balls hey do not mix the project management with requirement gathering.
But that’s a debatable issue I leave it to you guys if you want to further split it.
√ Tool Selection: - POC (proof of concept) documents comparing each tool according
to project requirement.
Note: - POC means can we do?. For instance you have a requirement that, 2000 users at a
time should be able to use your data warehouse. So you will probably write some sample
code or read through documents to ensure that it does it.
√ Data modeling: - Logical and Physical data model diagram. This can be ER diagrams
or probably some format which the client understands.
√ ETL: - DTS packages, Scripts and Metadata.
√ OLAP Design:-Documents which show design of CUBES / DIMENSIONS and
OLAP CUBE report.
√ Front end coding: - Actual source code, Source code documentation and deployment
documentation.
√ Tuning: - This will be a performance tuning document. What performance level we
are looking at and how will we achieve it or what steps will be taken to do so. It can
also include what areas / reports are we targeting performance improvements.
√ UAT: - This is normally the test plan and test case document. It can be a document
which has steps how to create the test cases and expected results.
√ Production: - In this phase normally the entire data warehouse project is the
deliverable. But you can also have handover documents of the project, hardware,
network settings, in short how is the environment setup.
√ Maintenance: - This is an on going process and mainly has documents like error
fixed, issues solved, within what time the issues should be solved and within what
time it was solved.
219
with a small project. For this complete explanation I am taking the old sample database of
Microsoft “NorthWind”.
First and foremost ensure that your service is started so go to control panel, services and
start the “Analysis Server “service.
As said before we are going to use “NorthWind” database for showing analysis server
demo.
220
Figure 8.16 : - NorthWind Snapshot.
We are not going use all tables from “NorthWind”. Below are the only tables we will be
operating using. Leaving the “FactTableCustomerByProduct” all other tables are self
explanatory. Ok I know I have still not told you what we want to derive from this whole
exercise. We will try to derive a report how much products are bought by which customer
and how much products are sold according to which country. So I have created the fact
table with three fields Customerid , Productid and the TotalProducts sold. All the data in
Fact table I have loaded from “Orders” and “Order Details”. Means I have taken all
customerid and productid with there respective totals and made entries in Fact table.
221
Figure 8.17: - Fact Table
Ok I have created my fact table and also populated using our ETL process. Now its time
to use this fact table to do analysis.
So let’s start our BI studio as shown in figure below.
222
Figure 8.19 : - Select Analysis Services Project
I have name the project as “AnalysisProject”. You can see the view of the solution explorer.
Data Sources :- This is where we will define our database and connection.
To add a new “data Source” right click and select “new Data Source”.
223
Figure 8.21 : - Create new data Source
After that Click next and you have to define the connection for the data source which you
can do by clicking on the new button. Click next to complete the data source process.
224
Figure 8.22 : - Define Data source connection details
225
Figure 8.23 : - Create new Data source view
So here we will select only two tables “Customers”, “Products” and the fact table.
226
We had said previously fact table is a central table for dimension table. You can see
products and customers table form the dimension table and fact table is the central point.
Now drag and drop from the “Customerid” of fact table to the “Customerid” field of the
customer table. Repeat the same for the “productid” table with the products table.
Check “Autobuild” as we are going to let the analysis service decide which tables he want
to decide as “fact” and “Dimension” tables.
227
Figure 8.26 : - Check Auto build
After that comes the most important step which are the fact tables and which are dimension
tables. SQL Analysis services decides by itself, but we will change the values as shown in
figure below.
228
Figure 8.27 : - Specify Fact and Dimension Tables
229
Figure 8.28 : - Specify measures
230
Figure 8.29 : - Deploy Solution
231
√ Cube Builder Works with the cube measures
√ Dimensions Works with the cube dimensions
√ Calculations Works with calculations for the cube
√ KPIs Works with Key Performance Indicators for the cube
√ Actions Works with cube actions
√ Partitions Works with cube partitions
√ Perspectives Works with views of the cube
√ Translations Defines optional transitions for the cube
√ Browser Enables you to browse the deployed cube
Once you are done with the complete process drag drop the fields as shown by the arrows
below.
Figure 8.32: - Drag and Drop the fields over the designer
232
Figure 8.33: - Final look of the CUBE
Once you have dragged dropped the fields you can see the wonderful information unzipped
between which customer has bought how many products.
233
Figure 8.34: - Product and Customer Report
This is the second report which says in which country I have sold how many products.
234
Figure 8.35: - Product Sales by country
Note: - I do not want my book to increase pages just because of images but sometimes the
nature of the explanation demands it. Now you can just summarize to the interviewer from
the above steps how you work with analysis services.
Analyzing Relationships
This term is also often called as “Link Analysis”. For instance one of the companies who
sold adult products did an age survey of his customers. He found his entire products
235
where bought by customers between age of 25 – 29. He further became suspicious that
all of his customers must have kids around 2 to 5 years as that’s the normal age of
marriage. He analyzed further and found that maximum of his customers where married
with kids. Now the company can also try selling kid products to the same customer as
they will be interested in buying it, which can tremendously boost up his sales. Now here
the link analysis was done between the “age” and “kids” decide a marketing strategy.
Prediction
Prediction is more about forecasting how the business will move ahead. For instance
company has sold 1000 Shoe product items, if the company puts a discount on the product
sales can go up to 2000.
Problem Definition.
This is the first step in “Data mining” define your metrics by which the model will be
evaluated. For instance if it’s a small travel company he would like to measure his model
236
on number of tickets sold , but if it’s a huge travel companies with lot of agents he would
like to see it with number of tickets / Agent sold. If it’s a different industry together like
bank they would like to see actual amount of transactions done per day.
There can be several models which a company wants to look into. For instance in our
previous travel company model, they would like to have the following metrics:-
√ Ticket sold per day
√ Number of Ticket sold per agent
√ Number of ticket sold per airlines
√ Number of refunds per month
So you should have the following check list:-
√ What attribute you want to measure and predict?
√ What type of relationship you want to explore? In our travel company example you
would like to explore relationship between Number of tickets sold and Holiday
patterns of a country.
Exploring Models
Data mining / Explore models means calculating the min and max values, look in to any
serious deviations that are happening, and how is the data distrubuted. Once you see the
data you can look in to if the data is flawed or not. For instance normal hours in a day is
237
24 and you see some data has more than 24 hours which is not logical. You can then look
in to correcting the same.
Data Source View Designer in BI Development Studio contains tools which can let you
analyze data.
Building Models
Data derived from Exploring models will help us to define and create a mining model. A
model typically contains input columns, an identifying column, and a predictable column.
You can then define these columns in a new model by using the Data Mining Extensions
(DMX) language or the Data Mining Wizard in BI Development Studio.
After you define the structure of the mining model, you process it, populating the empty
structure with the patterns that describe the model. This is known as training the model.
Patterns are found by passing the original data through a mathematical algorithm. SQL
Server 2005 contains a different algorithm for each type of model that you can build. You
can use parameters to adjust each algorithm.
A mining model is defined by a data mining structure object, a data mining model object,
and a data mining algorithm.
238
Figure 8.36 : - Data mining life Cycle.
239
MODEL is extracting and understanding different patterns from a data. Once the patterns
and trends of how data behaves are known we can derive a model from the same. Once
these models are decided we can see how these models can be helpful for prediction /
forecasting, analyzing trends, improving current process etc.
240
Based on the above data we have made the following decision tree. So you can see decision
tree takes data and then start applying attribute comparison on every node recursively.
241
√ Age 18-25 always buys internet connection, irrelevant of income.
√ Income drawers above 5000 always buy internet connection, irrelevant of age.
Using this data we have made predictions that if we market using the above criteria’s we
can make more “Internet Connection” sales.
So we have achieved two things from “Decision tree”:-
Prediction
√ If we market to age groups between 32-40 and income below 5000 we will not have
decent sales.
√ If we target customer with Age group 18-25 we will have good sales.
√ All income drawers above 5000 will always have sales.
Classification
√ Customer classification by Age.
√ Customer classification depending on income amount.
242
Figure 8.40 : - Bayesian Sample Data
If you look at the sample we can say that 80 % of time customer who buy pants also buys
shirts.
P (Shirt | Pants) = 0.8
Customer who buys shirts are more than who buys pants , we can say 1 of every 10
customer will only buy shirts and 1 of every 100 customer will buy only pants.
P (Shirts) = 0.1
P (Pants) = 0.01
Now suppose we a customer comes to buys pants how much is the probability he will buy
a shirt and vice-versa. According to theorem:-
Probability of buying shirt if bought pants = 0.8-0.01 / 0.1=7.9
Probability of buying pants if bought shirts = 0.8-0.1 / 0.01=70
So you can see if the customer is buying shirts there is a huge probability that he will buy
pants also. So you can see naïve bayes algorithm is use for predicting depending on existing
data.
243
√ Exclusive: A member belongs to only one cluster.
√ Overlapping: A member can belong to more than one cluster.
√ Probabilistic: A member can belong to every cluster with a certain amount of
probability.
√ Hierarchical: Members are divided into hierarchies, which are sub-divided into clusters
at a lower level.
244
Figure 8.41 : - Artificial Neuron Model
Above is the figure which shows a neuron model. We have inputs (I1, I2 … IN) and for
every input there are weights (W1, W2 …. WN) attached to it. The ellipse is the
“NEURON”. Weights can have negative or positive values. Activation value is the
summation and multiplication of all weights and inputs coming inside the nucleus.
Activation Value = I1 * W1 + I2 * W2+ I3 * W3+ I4 * W4…… IN * WN
There is threshold value specified in the neuron which evaluates to Boolean or some
value, if the activation value exceeds the threshold value.
245
So probably feeding a customer sales records we can come out with an output is the sales
department under profit or loss.
For instance take the case of the top customer sales data. Below is the neural network
defined for the above data.
246
You can see neuron has calculated the total as 5550 and as it’s greater than threshold
2000 we can say the company is under profit.
The above example was explained for simplification point of view. But in actual situation
there can many neurons as shown in figure below. It’s a complete hidden layer from the
data miner perspective. He only looks in to inputs and outputs for that scenario.
247
you are expecting values between 0 to 6000 maximum). So you can always go back and
look at whether you have some wrong input or weights. So the error is again Fed back to
the neural network and the weights are adjusted accordingly. This is also called training
the model.
248
√ Microsoft Decision Trees Algorithm
√ Microsoft Naive Bayes Algorithm
√ Microsoft Clustering Algorithm
√ Microsoft Neural Network Algorithm
Predicting a continuous attribute, for example, to forecast next year's sales.
√ Microsoft Decision Trees Algorithm
√ Microsoft Time Series Algorithm
Predicting a sequence, for example, to perform a click stream analysis of a company's
Web site.
√ Microsoft Sequence Clustering Algorithm
Finding groups of common items in transactions, for example, to use market basket analysis
to suggest additional products to a customer for purchase.
√ Microsoft Association Algorithm
√ Microsoft Decision Trees Algorithm
Finding groups of similar items, for example, to segment demographic data into groups
to better understand the relationships between attributes.
√ Microsoft Clustering Algorithm
√ Microsoft Sequence Clustering Algorithm
Why we went through all these concepts is when you create data mining model you have
to specify one the algorithms. Below is the snapshot of all SQL Server existing algorithms.
249
Figure 8.45: - Snapshot of the algorithms in SQL Server
Note: - During interviewing it’s mostly the theory that counts and the way you present. For
datamining I am not showing any thing practical as such probably will try to cover this thing
in my second edition. But it’s a advice please do try to run make a small project and see how
these techniques are actually used.
250
Figure 8.46 : - Data mining and Data Warehousing
Let’s start from the most left hand side of the image. First section comes is the transaction
database. This is the database in which you collect data. Next process is the ETL process.
This section extracts data from the transactional database and sends to your data warehouse
which is designed using STAR or SNOW FLAKE model. Finally when your data
warehouse data is loaded in data warehouse, you can use SQL Server tools like OLAP,
Analysis Services, BI, Crystal reports or reporting services to finally deliver the data to
the end user.
Note: - Interviewer will always try goof you up saying why should not we run OLAP,
Analysis Services, BI, Crystal reports or reporting services directly on the transactional
data. That is because transactional database are in complete normalized form which can
make the data mining process complete slow. By doing data warehousing we denormalize the
data which makes the data mining process more efficient.
What is XMLA?
251
XML for Analysis (XMLA) is fundamentally based on web services and SOAP. Microsoft
SQL Server 2005 Analysis Services uses XMLA to handle all client application
communications to Analysis Services.
XML for Analysis (XMLA) is a Simple Object Access Protocol (SOAP)-based XML
protocol, designed specifically for universal data access to any standard multidimensional
data source residing on the Web. XMLA also eliminates the need to deploy a client
component that exposes Component Object Model (COM) or Microsoft .NET Framework.
252
9. Integration Services/DTS
Note: - We had seen some question on DTS in the previous chapter “Data Warehousing”.
But in order to just make complete justice with this topic I have included them in integration
services.
253
Figure 9.2 : - Import / Export Wizard
Next step is to specify from which source you want to copy data. You have to specify the
Data source name and server name. For understanding purpose we are going to move
data between “AdventureWork” databases. I have created a dummy table called as
“SalesPersonDummy” which has the same structure as that of “SalesPerson” table. But
the only difference is that “SalesPersonDummy” does not have data.
254
Figure 9.3 : - Specify the Data Source.
Next step is to specify the destination where the source will be moved. At this moment
we are moving data inside “AdventureWorks” itself so specify the same database as the
source.
255
Figure 9.4 : - Specify Destination for DTS
256
Next step is to specify option from where you want to copy data. For the time being we
going to copy from table, so selected the first option.
Finally choose which object you want to map where. You can map multiple objects if you
want.
257
Figure 9.6 : - “Salesperson” is mapped to “SalesPersonDummy”
When everything goes successful you can see the below screen, which shows the series of
steps DTS has gone through.
258
Figure 9.7 : - Successful execution after series of checks
259
Figure 9.8 : - Data Transformation Pipeline
260
√ Container: - Container logically groups task. For instance you have a task to
load CSV file in to database. So you will have two or three task probably :-
√ Parse the CSV file.
√ Check for field data type
√ Map the source field to the destination.
So you can define all the above work as task and group them logically in to a container
called as Container.
√ Package: - Package are executed to actually do the data transfer.
DTP and DTR model expose API which can be used in .NET language for better control.
Note : -I can hear the shout practical.. practical. I think I have confused you guys over there.
So let’s warm up on some practical DTS stuff. 1000 words is equal to one compiled
program – Shivprasad Koirala ? I really want to invent some proverbs if you do not mind
it.
261
Click File—New – Project and select “Data Transformation Project”.
Give name to the project as “Salesperson” project. Before moving ahead let me give a
brief about what we are trying to do. We are going to use “Sales.SalesPerson” table from
the “adventureworks” database. “Sales.Salesperson” table has field called as “Bonus”.
We have the following task to be accomplished:-
Note: - These both tables have to be created manually by you. I will suggest to use the create
statements and just make both tables. You can see in the image below there are two tables
“SalesPerson5000” and “SalesPersonNot5000”.
√ Whenever “Bonus” field is equal to 5000 it should go
in“Sales.Salesperson5000”.
262
√ Whenever “Bonus” field is not equal to 5000 it should go in
“Sales.SalespersonNot5000”.
One you selected the “Data transformation project” , you will be popped with a designer
explorer as show below. I understand you must be saying its cryptic…it is. But let’s try to
simplify it. At the right hand you can see the designer pane which has lot of objects on it.
At right hand side you can see four tabs (Control flow, Data Flow, Event handlers and
Package Explorer).
Control flow: - It defines how the whole process will flow. For example if you loading a
CSV file. Probably you will have task like parsing, cleaning and then loading. You can see
lot of control flow items which can make your data mining task easy. But first we have to
define a task in which we will define all our data flows. So you can see the curve arrow
which defines what you have to drag and drop on the control flow designer. You can see
the arrow tip which defines the output point from the task.
263
Figure 9.14 : - Data Flow Task
In this project I have only define one task, but in real time project something below like
this can be seen (Extraction, Transformation and Loading: - ETL). One task points as a
input to other task and the final task inputs data in SQL Server.
264
Data Flow: - Data flow say how the objects will flow inside a task. So Data flow is subset
of a task defining the actual operations.
Event Handlers: - The best of part of DTS is that we can handle events. For instance if
there is an error what action do you want it to do. Probably log your errors in error log
table, flat file or be more interactive send a mail.
Now that you have defined your task its time to define the actual operation that will
happen with in the task. We have to move data from “Sales.SalesPerson” to
“Sales.SalesPerson5000” (if their “Bonus” fields are equal to 5000) and
“Sales.SalesPersonNot5000” (if their “Bonus” fields are not equal to 5000). In short we
have “Sales.SalesPerson” as the source and other two tables as Destination. So click on
265
the “Data Flow” tab and drag the OLEDB Source data flow item on the designer, we will
define source in this item. You can see that there is some error which is shown by a cross
on the icon. This signifies that you need to specify the source table that is
“Sales.Salesperson”.
In order to specify source tables we need to specify connections for the OLEDB source.
So right click on the below tab “Connections” and select “New OLEDB Connection”.
You will be popped up with a screen as show below. Fill in all details and specify the
database as “AdventureWorks” and click “OK”.
266
Figure 9.19 : - Connection Manager
If the connection credentials are proper you can see the connection in the “Connections”
tab as shown in below figure.
267
Figure 9.20 : - Connection Added Successfully
Now that we have defined the connection we have to associate that connection with the
OLE DB source. So right click and select the “Edit” menu.
Once you click edit you will see a dialog box as shown below. In data access mode select
“Table or View” and select the “Sales.Salesperson” table. To specify the mapping click on
“Columns” tab and then press ok.
268
Figure 9.22 : - Specify Connection Values
If the credentials are ok you can see the red Cross is gone and the OLE DB source is not
ready to connect further. As said before we need to move data to appropriate tables on
condition that “Bonus” field value. So from the data flow item drag and drop the
“Conditional Split” data flow item.
Right click on the “Conditional Split” data flow item so that you can specify the criteria.
It also gives you a list of fields in the table which you can drag drop. You can also drag
269
drop the operators and specify the criteria. I have made two outputs from the conditional
split one which is equal to 5000 and second not equal to 5000.
Conditional split now has two outputs one which will go in “Sales.SalesPerson5000” and
other in “Sales.SalesPersonNot5000”. So you have to define two destination and the
associate respective tables to it. So drag two OLE DB destination data flow items and
connect it the two outputs of conditional split.
270
Figure 9.25 : - Specify Destination
When you drag from the conditional split items over OLEDB destination items it will
pop up a dialog to specify which output this destination has to be connected. Select the
one from drop down and press ok. Repeat this step again for the other destination object.
271
That’s the final Data flow structure expected.
Its time to build and run the solution which you can do from the drop down. To run the
DTS you press the green icon as pointed by arrow in the below figure. After you run query
both the tables have the appropriate values or not.
272
Note: - You can see various data flow items on the right hand side; it’s out of the scope to
cover all items ( You must be wondering how much time this author will say out of scope ,
but its fact guys something you have to explore). In this sample project we needed the
conditional split so we used it. Depending on projects you will need to explore the toolbox.
It’s rare that any interviewer will ask about individual items but rather ask fundamentals
or general overview of how you did DTS.
273
10. Replication
Whats the best way to update data between SQL
Servers?
By using Replication we can solve this problem. Many of the developers end up saying
DTS, BCP or distributed transaction management. But this is one of the most reliable
ways to maintain consistency between databases.
License problems
SQL Server per user usage has a financial impact. So many of the companies decide to
use MSDE which is free, so that they do not have to pay for the client licenses. Later
every evening or in some specific interval this all data is uploaded to the central server
using replication.
Note: - MSDE supports replication.
Geographical Constraints
It is if the central server is far away and speed is one of the deciding criteria.
274
Reporting Server
In big multi-national sub-companies are geographically far away and the management
wants to host a central reporting server for the sales, which they want to use for decision
making and marketing strategy. So here the transactional SQL Server’s entire database is
scattered across the sub-companies and then weekly or monthly we can push all data to
the central reporting server.
You can see from the above figure how data is consolidated in to a central server which is
hosted in India using replication.
Data planning
It’s not necessary that you will need to replicate the complete database. For example you
have Sales database which has Customer, Sales, Event logs and History tables. You have
requirement to host a centralized reporting server which will be used by top management
275
to know “Sales by Customer”. To achieve this you do not need the whole database on
reporting server, from the above you will only need “Sales” and “Customer” tables.
Frequency planning
As defined in the top example let’s say management wants only “Sales by Customer weekly”,
so you do not need to update every day , rather you can plan weekly. But if the top
management is looking for “Sales by Customer per day” then probably your frequency of
updates would be every night.
276
Figure 10.2 : - Publisher, Distributor and Subscriber in action
277
What are different models / types of replication?
√ Snapshot replication
√ Merge replication
√ Transactional replication
Note: - Below I will go through each of them in a very detail way.
279
Figure 10.3 : - Snapshot replication in Action
280
Figure 10.4 : - Merge Replication
Merge agent stands in between subscriber and publisher. Any conflicts are resolved through
merge agent in turn which uses conflict resolution. Depending how you have configured
the conflict resolution the conflicts are resolved by merge agent.
281
There can be practical situations where same row is affected by one or many publishers
and subscribers. During such critical times Merge agent will look what conflict resolution
is defined and make changed accordingly.
SQL Server uniquely identifies a column using globally unique identifier for each row in
a published table. If the table already has a uniqueidentifier column, SQL Server will
automatically use that column. Else it will add a rowguid column to the table and create
an index based on the column.
Triggers will be created on the published tables at both the Publisher and the Subscribers.
These are used to track data changes based on row or column changes.
282
Figure 10.5 : - Transactional Replication
283
Figure 10. 6 : - Create new publication
284
Figure 10.8 : - Specify Type of replication
285
Figure 10.10 : - Security Details
286
Figure 10.12 : - Replication in Action
287
11. Reporting Services
Note: - I know every one screaming this is a part of Data mining and warehousing. I echo
the same voice with you my readers, but not necessarily. When you want to derive reports on
OLTP systems this is the best way to get your work done. Secondly reporting services is used
so much heavily in projects now a day that it will be completely unfair to discuss this topic in
a short way as subsection of some chapter.
288
Figure 11. 1: - Welcome reporting services wizard
Click next and you will be prompted to input data source details like type of server,
connection string and name of data source. If you have the connection string just paste it
on the text area or else click edit to specify connection string values through GUI.
289
Figure 11.2: - Specify Data Source Details
As we are going to use SQL Server for this sample specify OLEDB provider for SQL
Server and click next.
290
Figure 11.3: - Specify provider
After selecting the provider specify the connection details which will build your connection
string. You will need to specify the following details Server Name, Database name and
security details.
291
Figure 11.4 : - Specify Connection Details
This is the most important step of reporting services, specifying SQL. You remember the
top SQL we had specified the same we are pasting it here. If you are not sure about the
query you can use the query builder to build your query.
292
Figure 11.5 : - SQL Query
293
Now it’s the time to include the fields in reports. At this moment we have only two fields
name of product and total sales.
Finally you can preview your report. In the final section there are three tabs data, layout
and preview. In data tab you see your SQL or the data source. In layout tab you can design
your report most look and feel aspect is done in this section. Finally below is the preview
where you can see your results.
294
Figure 11.8 : - Final view of the report
295
Figure 11.9 : - stored procedure in the query builder
You have to also specify the command type from the data tab.
Figure 11.10 : - Specify the command type from the Data tab.
296
“Reporting Services” is not a stand alone system but rather a group of server sub-system
which work together for creation, management, and deployment of reports across the
enterprise.
Report designer
This is an interactive GUI which will help you to design and test your reports.
Report Server
Report Server is nothing but an ASP.NET application running on IIS Server. Report
Server renders and stores these RDL formats.
297
Report Manager
It’s again an ASP.NET web based application which can be used by administrators to
control security and managing reports. From administrative perspective who have the
authority to create the report, run the report etc...
You can also see the various formats which can be generated XML, HTML etc using the
report server.
298
12. Database Optimization
What are indexes?
Index makes your search faster. So defining indexes to your database will make your
search faster.
299
Above is a sample diagram which explains how B-Tree fundamental works. The above
diagram is showing how index will work for number from 1-50. Let’s say you want to
search 39. SQL Server will first start from the first node i.e. root node.
√ It will see that the number is greater than 30, so it moves to the 50 node.
√ Further in Non-Leaf nodes it compares is it more than 40 or less than 40. As
it’s less than 40 it loops through the leaf nodes which belong to 40 nodes.
You can see that this is all attained in only two steps…faster aaah. That is how exactly
indexes work in SQL Server.
300
Figure 12.2 : - Page split for Indexed tables
If you see the first level index there is “2” and “8”, now let say we want to insert “6”. In
order to balance the “B-TREE” structure rows it will try to split in two pages, as shown.
Even though the second page split has some empty area it will go ahead because the
primary thing for him is balancing the “B-TREE” for fast retrieval.
Now if you see during the split it is doing some heavy duty here:-
√ Creates a new page to balance the tree.
√ Shuffle and move the data to pages.
So if your table is having heavy inserts that means it’s transactional, then you can visualize
the amount of splits it will be doing. This will not only increase insert time but will also
upset the end-user who is sitting on the screen.
So when you forecast that a table has lot of inserts it’s not a good idea to create indexes.
301
These are ways by which SQL Server searches a record or data in table. In “Table Scan”
SQL Server loops through all the records to get to the destination. For instance if you
have 1, 2, 5, 23, 63 and 95. If you want to search for 23 it will go through 1, 2 and 5 to
reach it. Worst if it wants to search 95 it will loop through all the records.
While for “Index Scan’s” it uses the “B-TREE” fundamental to get to a record. For “B-
TREE” refer previous questions.
Note: - Which way to search is chosen by SQL Server engine. Example if it finds that the
table records are very less it will go for table scan. If it finds the table is huge it will go for
index scan.
302
Figure 12.3 : - Clustered Index Architecture
In Non-Clustered index the leaf nodes point to pointers (they are rowid’s) which then
point to actual data.
303
Figure 12.4 : - Non-Clustered Index has pointers.
So here’s what the main difference is in clustered and non-clustered , in clustered when
we reach the leaf nodes we are on the actual data. In non-clustered indexes we get a
pointer, which then points to the actual data.
So after the above fundamentals following are the basic differences between them:-
√ Also note in clustered index actual data as to be sorted in same way as the
clustered indexes are. While in non-clustered indexes as we have pointers which
is logical arrangement we do need this compulsion.
√ So we can have only one clustered index on a table as we can have only one
physical order while we can have more than one non-clustered indexes.
304
If we make non-clustered index on a table which has clustered indexes, how does the
architecture change?
The only change is that the leaf node point to clustered index key. Using this clustered
index key can then be used to finally locate the actual data. So the difference is that leaf
node has pointers while in the next half it has clustered keys. So if we create non-clustered
index on a table which has clustered index it tries to use the clustered index.
305
Figure 12.5 : - Index creation in Action.
306
Note: - Before reading this you should have all the answers of the pervious section clear.
Especially about extent, pages and indexes.
DECLARE
@ID int,
@IndexID int,
@IndexName varchar(128)
-- input your table and index name
SELECT @IndexName = 'AK_Department_Name'
SET @ID = OBJECT_ID('HumanResources.Department')
SELECT @IndexID = IndID
FROM sysindexes
WHERE id = @ID AND name = @IndexName
--run the DBCC command
DBCC SHOWCONTIG (@id, @IndexID)
Just a short note here “DBCC” i.e. “Database consistency checker” is used for checking
heath of lot of entities in SQL Server. Now here we will be using it to see index health.
After the command is run you will see the following output. You can also run “DBCC
SHOWSTATISTICS” to see when was the last time the indexes rebuild.
307
Figure 12.7 DBCC SHOWSTATISTICS
Pages Scanned
The number of pages in the table (for a clustered index) or index.
308
Extents Scanned
The number of extents in the table or index. If you remember we had said in first instance
that extent has pages. More extents for the same number of pages the higher will be the
fragmentation.
Extent Switches
The number of times SQL Server moves from one extent to another. More the switches it
has to make for the same amount of pages, the more fragmented it is.
309
Avg. Bytes free per page
This figure tells how many bytes are free per page. If it’s a table with heavy inserts or
highly transactional then more free space per page is desirable, so that it will have less
page splits.
If it's just a reporting system then having this closer to zero is good as SQL Server can
then read data with less number of pages.
What is Fragmentation?
310
Speed issues occur because of two major things
√ Fragmentation.
√ Splits.
Splits have been covered in the first questions. But one other big issue is fragmentation.
When database grows it will lead to splits, but what happens when you delete something
from the database…HeHeHe life has lot of turns right. Ok let’s say you have two extents
and each have two pages with some data. Below is a graphical representation. Well actually
that’s now how things are inside but for sake of clarity lot of things have been removed.
Now over a period of time some Extent and Pages data undergo some delete. Here’s the
modified database scenario. Now one observation you can see is that some page’s are not
removed even when they do not have data. Second If SQL server wants to fetch all
“Females” it has to span across to two extent and multiple pages within them. This is
called as “Fragmentation” i.e. to fetch data you span across lot of pages and extents. This
is also termed as “Scattered Data”.
What if the fragmentation is removed, you only have to search in two extent and two
pages. Definitely this will be faster as we are spanning across less entities.
311
Figure 12.11 : - Fragmentation removed
312
Figure 12.12: - sp_updatestats in action
313
√ If the column has higher level of unique values and is used in selection criteria
again is a valid member for creating indexes.
√ If “Foreign” key of table is used extensively in joins (Inner, Outer, and Cross)
again a good member for creating indexes.
√ If you find the table to be highly transactional (huge insert, update and deletes)
probably not a good entity for creating indexes. Remember the split problems
with Indexes.
√ You can use the “Index tuning wizard” for index suggestions.
314
Figure 12.13 : - Create New Trace File.
It will alert for giving you all trace file details for instance the “Trace Name”, “File where
to save”. After providing the details click on “Run” button provided below. I have provided
the file name of the trace file as “Testing.trc” file.
315
HUH and the action starts. You will notice that profiler has started tracing queries which
are hitting “SQL Server” and logging all those activities in to the “Testing.trc” file. You
also see the actual SQL and the time when the SQL was fired.
Let the trace run for some but of time. In actually practical environment I run the trace
for almost two hours in peak to capture the actual load on server. You can stop the trace
by clicking on the red icon given above.
You can go the folder and see your “.trc” file created. If you try to open it in notepad you
will see binary data. It can only be opened using the profiler. So now that we have the load
file we have to just say to the advisor hey advisor here’s my problem (trace file) can you
suggest me some good indexes to improve my database performance.
316
In order to go to “Database Tuning Advisor” you can go from “Tools” – “Database Tuning
Advisor”.
In order to supply the work load file you have to start a new session in “Database tuning
advisor”.
After you have said “New Session” you have to supply all details for the session. There
are two primary requirements you need to provide to the Session:-
√ Session Name
317
√ “Work Load File” or “Table” (Note you can create either a trace file or you can
put it in SQL Server table while running the profiler).
I have provided my “Testing.trc” file which was created when I ran the SQL profiler. You
can also filter for which database you need index suggestions. At this moment I have
checked all the databases. After all the details are filled in you have to click on “Green”
icon with the arrow. You can see the tool tip as “Start analysis” in the image below.
While analyzing the trace file it performs basic four major steps:-
√ Submits the configuration information.
√ Consumes the Work load data (that can be in format of a file or a database
table).
318
√ Start performing analysis on all the SQL executed in the trace file.
√ Generates reports based on analysis.
√ Finally give the index recommendations.
You can see all the above steps have run successfully which is indicated by “0 Error and
0 Warning”.
Now its time to see what index recommendations SQL Server has provided us. Also note
it has included two new tabs after the analysis was done “Recommendations” and
“Reports”.
You can see on “AdventureWorks” SQL Server has given me huge recommendations.
Example on “HumanResources.Department” he has told me to create index on
“PK_Department_DepartmentId”.
319
Figure 12.21 : - Recommendations by SQL Server
In case you want to see detail reports you can click on the “Reports” tab and there are
wide range of reports which you can use to analyze how you database is performing on
that “Work Load” file.
320
Figure 12.22 : - Reports by Advisor
Note: - The whole point of putting this all step by step was that you have complete
understanding of how to do “automatic index decision” using SQL Server. During
interview one of the question’s that is very sure “How do you increase speed performance of
SQL Server? “ and talking about the “Index Tuning Wizard” can fetch you some decent
points.
321
Click on the ICON in SQL Server management studio as shown in figure below.
In bottom window pane you will see the complete break up of how your SQL Query will
execute. Following is the way to read it:-
√ Data flows from left to right.
√ Any execution plan sums to total 100 %. For instance in the below figure it is 18 +
28 + 1 + 1 + 52. So the highest is taken by Index scan 52 percent. Probably we can
look in to that logic and optimize this query.
√ Right most nodes are actually data retrieval nodes. I have shown them with arrows
the two nodes.
√ In below figure you can see some arrows are thick and some are thin. More the
thickness more the data is transferred.
√ There are three types of join logic nested join, hash join and merge join.
322
Figure 12.24 : - Largest Cost Query
If you move your mouse gently over any execution strategy you will see a detail breakup
of how that node is distributed.
323
How do you see the SQL plan in textual format?
Execute the following “set showplan_text on” and after that execute your SQL, you will
see a textual plan of the query. In the first question what I discussed was a graphical view
of the query plan. Below is a view of how a textual query plan looks like. In older versions
of SQL Server where there was no way of seeing the query plan graphically “SHOWPLAN”
was the most used. Today if any one is using it that I think he is doing a show business or
a new come learner.
324
Nested Join
If you have less data this is the best logic. It has two loops one is the outer and the other
is the inner loop. For every outer loop, its loops through all records in the inner loop. You
can see the two loop inputs given to the logic. The top index scan is the outer loop and
bottom index seek is the inner loop for every outer record.
Hash Join
Hash join has two input “Probe” and “Build” input. First the “Build” input is processed
and then the “Probe” input. Which ever input is smaller is the “Build” input. SQL Server
first builds a hash table using the build table input. After that he loops through the probe
input and finds the matches using the hash table created previously using the build table
and does the processing and gives the output.
325
Figure 12.28 : - Hash Join
Merge Join
In merge joins both the inputs are sorted on the merge columns. Merge columns are
determined depending on the inner join defined in SQL. Since each input join is sorted
merge join takes input and compares for equality. If there is equality then matching row is
produced. This is processed till the end of rows.
327
Note :- It's difficult to cover complete aspect of RAID in this book.It's better to take some
decent SQL SERVER book for in detail knowledge , but yes from interview aspect you can
probably escape with this answer.
328
13. Transaction and Locks
What is a “Database Transactions “?
It’s a unit of interaction within a database which should be independent of other
transactions.
What is ACID?
“ACID” is a set of rule which are laid down to ensure that “Database transaction” is
reliable. Database transaction should principally follow ACID rule to be safe. “ACID” is
an acronym which stands for:-
√ Atomicity
A transaction allows for the grouping of one or more changes to tables and rows in the
database to form an atomic or indivisible operation. That is, either all of the changes
occur or none of them do. If for any reason the transaction cannot be completed, everything
this transaction changed can be restored to the state it was in prior to the start of the
transaction via a rollback operation.
√ Consistency
Transactions always operate on a consistent view of the data and when they end always
leave the data in a consistent state. Data may be said to be consistent as long as it conforms
to a set of invariants, such as no two rows in the customer table have the same customer
id and all orders have an associated customer row. While a transaction executes these
invariants may be violated, but no other transaction will be allowed to see these
inconsistencies, and all such inconsistencies will have been eliminated by the time the
transaction ends.
√ Isolation
To a given transaction, it should appear as though it is running all by itself on the database.
The effects of concurrently running transactions are invisible to this transaction, and the
effects of this transaction are invisible to others until the transaction is committed.
√ Durability
Once a transaction is committed, its effects are guaranteed to persist even in the event of
subsequent system failures. Until the transaction commits, not only are any changes made
329
by that transaction not durable, but are guaranteed not to persist in the face of a system
failure, as crash recovery will rollback their effects.
The simplicity of ACID transactions is especially important in a distributed database
environment where the transactions are being made simultaneously.
There are two paths defined in the transaction one which rollbacks to the main state and
other which rollbacks to a “tran1”. You can also see “tran1” and “tran2” are planted in
multiple places as book mark to roll-back to that state.
330
Brushing up the syntaxes
To start a transaction
BEGIN TRAN Tran1
Creates a book point
SAVE TRAN PointOne
This will roll back to point one
ROLLBACK TRAN PointOne
This commits complete data right when Begin Tran point
COMMIT TRAN Tran1
331
No. If in case developer forgets to shoot the “Commit Tran” it can open lot of transaction’s
which can bring down SQL Server Performance.
What is Concurrency?
In multi-user environment if two users are trying to perform operations (Add, Modify and
Delete) at the same time is termed as “Concurrency”. In such scenarios there can be lot
of conflicts about the data consistency and to follow ACID principles.
For instance the above figure depicts the concurrency problem. “Mr X” started viewing
“Record1” after some time “MR Y” picks up “Record1” and starts updating it. So “Mr X”
is viewing data which is not consistent with the actual database.
332
Figure 13.3: - Locking implemented
In our first question we saw the problem above is how locking will work. “Mr. X” retrieves
“Record1” and locks it. When “Mr Y” comes in to update “Record1” he can not do it as
it’s been locked by “Mr X”.
Note: - What I have showed is small glimpse, in actual situations there are different types
of locks we will going through each in the coming questions.
333
Figure 13.4 : - Dirty Reads
“Dirty Read” occurs when one transaction is reading a record which is part of a half
finished work of other transaction. Above figure defines the “Dirty Read” problem in a
pictorial format. I have defined all activities in Step’s which shows in what sequence they
are happening (i.e. Step1, Step 2 etc).
√ Step1: -“Mr. Y” Fetches “Record” which has “Value=2” for updating it.
√ Step2:- In mean time “Mr. X” also retrieves “Record1” for viewing. He also
sees it as “Value=2”.
334
√ Step3:- While “Mr. X” is viewing the record, concurrently “Mr. Y” updates it
as “Value=5”. Boom… the problem “Mr. X” is still seeing it as “Value=3”,
while the actual value is “5”.
In every data read if you get different values then it’s an “Unrepeatable Read” problem.
Lets try to iterate through the steps of the above given figure:-
√ Step1:- “Mr. X” get “Record” and sees “Value=2”.
√ Step2:- “Mr. Y” meantime comes and updates “Record1” to “Value=5”.
335
√ Step3:- “Mr. X” again gets “Record1” ohh... values are changed “2” …
Confusion.
If “UPDATE” and “DELETE” SQL statements seems to not affect the data then it can
be “Phantom Rows” problem.
√ Step1:- “Mr. X” updates all records with “Value=2” in “record1” to “Value=5”.
√ Step2:- In mean time “Mr. Y” inserts a new record with “Value=2”.
336
√ Step3:- “Mr. X” wants to ensure that all records are updated, so issues a select
command for “Value=2”….surprisingly find records which “Value=2”…
So “Mr. X” thinks that his “UPDATE” SQL commands are not working properly.
“Lost Updates” are scenario where one updates which is successfully written to database
is over-written with other updates of other transaction. So let’s try to understand all the
steps for the above figure:-
√ Step1:- “Mr. X” tries to update all records with “Value=2” to “Value=5”.
√ Step2:- “Mr. Y” comes along the same time and updates all records with
“Value=5” to “Value=2”.
337
√ Step3 :- Finally the “Value=2” is saved in database which is inconsistent
according to “Mr. X” as he thinks all the values are equal to “2”.
Step1:- First transaction issues a “SELECT” statement on the resource, thus acquiring a
“Shared Lock” on the data.
Step2:- Second transaction also executes a “SELECT” statement on the resource which
is permitted as “Shared” lock is honored by “Shared” lock.
339
Step3:- Third transaction tries to execute an “Update” SQL statement. As it’s a “Update”
statement it tries to acquire an “Exclusive”. But because we already have a “Shared” lock
on it, it acquires a “Update” lock.
Step4:- The final transaction tries to fire “Select” SQL on the data and try to acquire a
“Shared” lock. But it can not do until the “Update” lock mode is done.
So first “Step4” will not be completed until “Step3” is not executed. When “Step1” and
“Step2” is done “Step3” make the lock in to “Exclusive” mode and updates the data.
Finally “Step4” is completed.
√ Intent Locks: - When SQL Server wants to acquire a “Shared” lock or an
“Exclusive” lock below the hierarchy you can use “Intent” locks. For instance
one of the transactions has acquired as table lock and you want to have row
level lock you can use “Intent” locks. Below are different flavors of “Intent”
locks but with one main intention to acquire locks on lower level:-
! Intent locks include:
! Intent shared (IS)
! Intent exclusive (IX)
! Shared with intent exclusive (SIX)
! Intent update (IU)
! Update intent exclusive (UIX)
! Shared intent update (SIU)
√ Schema Locks: - Whenever you are doing any operation which is related to
“Schema” operation this lock is acquired. There are basically two types of
flavors in this :-
! Schema modification lock (Sch-M):- Any object structure change
using ALTER, DROP, CREATE etc will have this lock.
! Schema stability lock (Sch-S) – This lock is to prevent “Sch-M” locks.
These locks are used when compiling queries. This lock does not block
any transactional locks, but when the Schema stability (Sch-S) lock is
used, the DDL operations cannot be performed on the table.
340
√ Bulk Update locks:-Bulk Update (BU) locks are used during bulk copying of
data into a table. For example when we are executing batch process in mid-
night over a database.
√ Key-Range locks: - Key-Range locks are used by SQL Server to prevent phantom
insertions or deletions into a set of records accessed by a transaction.
Below are different flavors of “Key-range” locks
! RangeI_S
! RangeI_U
! RangeI_X
! RangeX_S
! RangeX_U
341
Note: - By default SQL Server has “READ COMMITTED” Isolation level.
Read Committed
Any “Shared” lock created using “Read Committed” will be removed as soon as the SQL
statement is executed. So if you are executing several “SELECT” statements using “Read
Committed” and “Shared Lock”, locks are freed as soon as the SQL is executed.
But when it comes to SQL statements like “UPDATE / DELETE AND INSERT” locks
are held during the transaction.
With “Read Committed” you can prevent “Dirty Reads” but “Unrepeatable” and
“Phantom” still occurs.
Read Uncommitted
This Isolation level says “do not apply any locks”. This increases performance but can
introduces “Dirty Reads”. So why is this Isolation level in existence?. Well sometimes
when you want that other transaction do not get affected and you want to draw some
blurred report , this is a good isolation level to opt for.
Repeatable Read
This type of read prevents “Dirty Reads” and “Unrepeatable reads”.
Serializable
It’s the king of everything. All concurrency issues are solved by using “Serializable” except
for “Lost update”. That means all transactions have to wait if any transaction has a
“Serializable” isolation level.
Note: - Syntax for setting isolation level:-
SET TRANSACTION ISOLATION LEVEL <READ COMMITTED|READ
UNCOMMITTED|REPEATABLE READ|SERIALIZABLE>
342
What are “Lock” hints?
This is for more control on how to use locking. You can specify how locking should be
applied in your SQL queries. This can be given by providing optimizer hints. “Optimizer”
hints tells SQL Server that escalate me to this specific lock level. Example the below
query says to put table lock while executing the SELECT SQL.
SELECT * FROM MasterCustomers WITH (TABLOCKX)
What is a “Deadlock” ?
Deadlocking occurs when two user processes have locks on separate objects and each
process is trying to acquire a lock on the object that the other process has. When this
happens, SQL Server ends the deadlock by automatically choosing one and aborting the
process, allowing the other process to continue. The aborted transaction is rolled back
and an error message is sent to the user of the aborted process. Generally, the transaction
that requires the least amount of overhead to rollback is the transaction that is aborted.
343
√ If appropriate, reduce lock escalation by using the ROWLOCK or PAGLOCK.
√ Consider using the NOLOCK hint to prevent locking if the data being locked
is not modified often.
√ If appropriate, use as low of isolation level as possible for the user connection
running the transaction.
√ Consider using bound connections.
344