Explain The Advantages of Using ASM?
Explain The Advantages of Using ASM?
Explain The Advantages of Using ASM?
I. High availability with reduced downtime: ASM provides the hot swappable disks. You can add or
remove disks from a disk group while a database is online. When you perform add or remove disks from a
disk group, ASM automatically redistributes the file contents and eliminates the need for downtime.
II. Data redundancy: ASM provides flexible server-based mirroring options. The ASM normal and high
redundancy disk groups enable 2-way and 3-way mirroring respectively. You can use RAID storage
subsystems to perform the mirroring
III. Reduces the administrative overhead: ASM accomplish this by consolidating data storage into a
small number of disk groups. This enables you to consolidate the storage for multiple databases and to
provide for improved I/O performance.
IV. Load balancing: ASM provides dynamic parallel load balancing which helps to prevent hot spots. IO
spread evenly across available disks.
V. Compatibility with other storage systems: ASM can support with other storage management options
such as raw disks and third-party file systems. This feature simplifies the integration ASM with pre-
existing environments.
VI. ASM uses the Oracle Managed Files(OMF ) for simplified database file management.
VII. It has easy to use management interfaces such as SQL*plus, ASMCMD command-line interface
and Oracle EM (enterprise Manager). Using OEM wizard you can easily migrate a non-ASM database
files to ASM.
VIII. In an ASM instance, you can store database datafile, redo log files, archive log files, backup files,
datapump dump files, change-tracking files and control files of one or several oracle database. Another
feature of ASM is elimination of fragmentation.
A. Oracle ASM Instance: An Oracle ASM instance has a System Global Area (SGA) and background
processes that are similar to those of Oracle Database. Oracle ASM SGA is much smaller than a database
SGA. Oracle ASM has a minimal performance effect on a server. Oracle ASM instances mounts ASM disk
groups to make Oracle ASM files available to database instances. Oracle ASM and database instances
require shared access to the disks in a disk group. Oracle ASM instances manage the metadata of the disk
group and provide file layout information to the database instances.
The Oracle ASM metadata includes the following information:
1. The disks that belong to a disk group.
2. The amount of space that is available in a disk group.
3. The filenames of the files in a disk group.
4. The location of disk group data file extents.
5. A redo log that records information about atomically changing metadata blocks.
6. Oracle ADVM volume information
There is one Oracle ASM instance for each cluster node. If there are several database instances for
different databases on the same node, then the database instances share the same single Oracle ASM
instance on that node. If the Oracle ASM instance on a node fails, then all of the database instances on that
node also fail.
B. Oracle ASM Disk Groups: A disk group consists of multiple ASM disks and contains the metadata
that is required for the management of space in the disk group. The Disk group components are ASM
disks, ASM files, and allocation units. Any Oracle ASM file is completely contained within a single disk
group. However, a disk group might contain files belonging to several databases and a single database can
use files from multiple disk groups.
Mirroring protects data integrity by storing copies of data on multiple disks. When you create a disk group,
you specify an Oracle ASM disk group type based on one of the following three redundancy levels:
Normal redundancy: Oracle ASM provides two-way mirroring by default, which means that all files are
mirrored so that there are two copies of every extent. A loss of one Oracle ASM disk is tolerated.
High redundancy: Oracle ASM provides triple mirroring by default, which means that all files are mirrored
so that there are three copies of every extent. A loss of two Oracle ASM disks in different failure groups is
tolerated.
Normal redundancy disk groups require at least two failure groups. High redundancy disk groups require at
least three failure groups. Disk groups with external redundancy do not use failure groups.
C. Oracle ASM Failure Groups: A failure group is a subset of the disks in a ASM Disk group. Failure
groups are used to store mirror copies of data. When Oracle ASM allocates an extent for a normal
redundancy file, Oracle ASM allocates a primary copy and a secondary copy. Oracle ASM chooses the
disk on which to store the secondary copy so that it is in a different failure group than the primary copy.
There are always failure groups even if they are not explicitly created. If you do not specify a failure group
for a disk, then Oracle automatically creates a new failure group containing just that disk, except for disk
groups containing disks on Oracle Exadata cells.
D. Oracle ASM Disks: Oracle ASM disks are the storage devices that are provisioned to Oracle ASM disk
groups. Oracle ASM spreads the files proportionally across all of the disks in the disk group. This
allocation pattern maintains every disk at the same capacity level and ensures that all of the disks in a disk
group have the same I/O load.
An Oracle ASM file consists of one or more file extents. A file extent consists of one or more allocation
units. So Every Oracle ASM disk is divided into allocation units (AU) within the ASM Disk group.
When you create a disk group, you can set the Oracle ASM allocation unit size with the AU_SIZE disk
group attribute. The values can be 1, 2, 4, 8, 16, 32, or 64 MB, depending on the specific disk group
compatibility level.
E. Oracle ASM Files: Files that are stored in Oracle ASM disk groups are called Oracle ASM files. Each
Oracle ASM file is contained within a single Oracle ASM disk group. Oracle Database communicates with
Oracle ASM in terms of files.
The following are the files which are stored in ASM Disk groups
1. Control files
2. Data files, temporary data files, and data file copies
3. SPFILEs
4. Online redo logs, archive logs, and Flashback logs
5. RMAN backups
6. Disaster recovery configurations
7. Change tracking bitmaps
8. Data Pump dumpsets
Oracle ASM automatically generates Oracle ASM file names as part of file creation and tablespace
creation. Oracle ASM file names begin with a plus sign (+) followed by a disk group name.
The contents of Oracle ASM files are stored in a disk group as a set, or collection of extents that are stored
on individual disks within disk groups. Each extent resides on an individual disk. Extents consist of one or
more allocation units (AU).
Extent size always equals the disk group AU size for the first 20000 extent sets(0 – 19999).
Extent size equals 4*AU size for the next 20000 extent sets (20000 – 39999).
Extent size equals 16*AU size for the next 20000 and higher extent sets (40000+).
Background Processes:
1.ARBx: These are the slave processes that do the rebalance activity
2.RBAL: This opens all device files as part of discovery and coordinates the rebalance activity for disk
groups.
3.ASMB: At database instance startup, ASMB connects as a foreground process into the ASM instance.
All communication between the database and ASM instances is performed via this bridge. This includes
physical file changes such as data file creation and deletion. Over this connection, periodic messages are
exchanged to update statistics and to verify that both instances are healthy
4.SMON: This process is the system monitor and also acts as a liaison to the Cluster Synchronization
Services (CSS) process (in Oracle Clusterware) for node monitoring.
5.PSP0: This process spawner process is responsible for creating and managing other Oracle processes.
Following are the three redundancy levels which we can define while creating the
diskgroup.
Normal for 2-way mirroring.It means data will be written to one failure group and will be
mirrored to other failure group. So, Normal redundancy disk groups require at least two
failure groups.
High for 3-way mirroring. It means data will be written to one failure group and will be
mirrored to two other failure groups.High redundancy disk groups require at least three
failure groups.
External to not use Oracle ASM mirroring-No ASM mirroring. Useful when the disk group
contain RAID devices. Here no failgroup clause will be needed.
Example:-
SQL> Create diskgroup diskgr1 NORMAL redundancy
2 FAILGROUP controller1 DISK
4 ‘/devices/diska1’ NAME dasm-d1,
3 FAILGROUP controller2 DISK
5 ‘/devices/diskb1’ NAME dasm-d1
6 ATTRIBUTE ‘au_size’=’4M’;
What is RAID?
RAID (redundant array of independent disks) is a setup consisting of multiple disks
for data storage. They are linked together to prevent data loss and/or speed up
performance. Having multiple disks allows the employment of various techniques
like disk striping, disk mirroring, and parity.
RAID 0: Striping
RAID 0, also known as a striped set or a striped volume, requires a minimum of two
disks. The disks are merged into a single large volume where data is stored evenly
across the number of disks in the array.
This process is called disk striping and involves splitting data into blocks and writing
it simultaneously/sequentially on multiple disks. Configuring the striped disks as a
single partition increases performance since multiple disks do reading and writing
operations simultaneously. Therefore, RAID 0 is generally implemented to improve
speed and efficiency.
Advantages of RAID 0
Cost-efficient and straightforward to implement.
Increased read and write performance.
No overhead (total capacity use).
Disadvantages of RAID 0
Doesn't provide fault tolerance or redundancy.
RAID 1: Mirroring
RAID 1 is an array consisting of at least two disks where the same data is stored on
each to ensure redundancy. The most common use of RAID 1 is setting up a
mirrored pair consisting of two disks in which the contents of the first disk is
mirrored in the second. This is why such a configuration is also called mirroring.
Unlike with RAID 0, where the focus is solely on speed and performance, the primary
goal of RAID 1 is to provide redundancy. It eliminates the possibility of data loss and
downtime by replacing a failed drive with its replica.
Advantages of RAID 1
Increased read performance.
Provides redundancy and fault tolerance.
Simple to configure and easy to use.
Disadvantages of RAID 1
Uses only half of the storage capacity.
More expensive (needs twice as many drivers).
Requires powering down your computer to replace failed drive.
It is also suitable for smaller servers with only two disks, as well as if you are
searching for a simple configuration you can easily set up (even at home).
When ASM instance opens a datafile, ASM ships the file’s extent map to rdbms
instance , where it is stored in SGA.
By using that extent map , rdbms can do i/o on the asm files directly without going
through asm instance
Disk group attribute disk_repair_time to determine how long to wait before an ASM disk is permanently
dropped from an ASM disk group after it was taken offline for whatever reason. The default of
disk_repair_time is 3.6 hours(190 Minutes)
Drop disk before repair time expired
-- check compatibility
select * from v$asm_attribute;
Here we see a value “1200” under REPAIR_TIME column; this value is time in seconds after which this
disk would be dropped automatically. This time is calculated using value of a diskgroup attribute
called DISK_REPAIR_TIME that I will discuss bellow.
In 10g, if a disk goes missing, it would immediately get dropped and REBALANCE operation would
kick in immediately whereby ASM would start redistributing the ASM extents across the available
disks in ASM diskgroup to restore the redundancy.
DISK_REPAIR_TIME
Starting 11g, oracle has provided an attribute for diskgroups called “DISK_REPAIR_TIME”. This has a
default value of 3.6 hours. This actually means that in case a disk goes missing, this disk should not
be dropped immediately and ASM should wait for this disk to come online/replaced. This feature
helps in scenarios where a disk is plugged out accidentally, or a storage server/SAN gets
disconnected/rebooted which leaves some ASM diskgroup without one or more disks. During the
time when disk(s) remain unavailable, ASM would keep track of the extents that are candidates of
being written to the missing disks, and immediately starts writing to the disk(s) as soon as missing
disk(s) come back online (this feature is called fast mirror resync). If disk(s) does not come back
online within DISK_REPAIR_TIME threshold, disk(s) is/are dropped and rebalance starts.
FAILGROUP_REPAIR_TIME
Starting 12c, another new attribute can be set for the diskgroup. This attribute is
FAILGROUP_REPAIR_TIME, and this has a default value of 24 hours. This attribute is similar to
DISK_REPAIR_TIME, but is applied to the whole failgroup. In Exadata, all disks belonging to a storage
server can belong to a failgroup (to avoid a mirror copy of extent to be written in a disk from the
same storage server), and this attribute is quite handy in Exadata environment when complete
storage server is taken down for maintenance, or some other reason.
In the following we can see how to set values for the diskgroup attributes explained above.
SQL> select name,value from v$asm_attribute where group_number=3 and name like
'%repair_time%';
NAME VALUE
------------------------------ --------------------
disk_repair_time 3.6h
failgroup_repair_time 24.0h
Diskgroup altered.
SQL> select name,value from v$asm_attribute where group_number=3 and name like
'%repair_time%';
NAME VALUE
------------------------------ --------------------
disk_repair_time 1h
failgroup_repair_time 10h
ORA-15042
If a disk is offline/missing from an ASM diskgroup, ASM may not mount the diskgroup automatically
during instance restart. In this case, we might need to mount the diskgroup manually, with FORCE
option.
ERROR at line 1:
Diskgroup altered.
After a disk goes offline, the time starts ticking and value of REPAIR_TIMER can be monitored to see
the time remains before the disk can be made available to avoid auto drop of the disk.
DATA1 ORCL:DATA1 NORMAL MEMBER 0 ONLINE CACHED
DATA2 ORCL:DATA2 NORMAL MEMBER 0 ONLINE CACHED
DATA3 ORCL:DATA3 NORMAL MEMBER 0 ONLINE CACHED
DATA4 NORMAL UNKNOWN 649 OFFLINE MISSING
--We can confirm that no rebalance has started yet by using following query
no rows selected
If we are able to make this disk available/replaced before DISK_REPAIR_TIME lapses, we can bring
this disk back online. Please note that we would need to bring it ONLINE manually.
Diskgroup altered.
select name,path,state,header_status,REPAIR_TIMER,mode_status,mount_status from
v$asm_disk;
NAME PATH STATE HEADER_ REPAIR_TIMER MODE_ST MOUNT_S
DATA1 ORCL:DATA1 NORMAL MEMBER 0 ONLINE CACHED
DATA2 ORCL:DATA2 NORMAL MEMBER 0 ONLINE CACHED
DATA3 ORCL:DATA3 NORMAL MEMBER 0 ONLINE CACHED
DATA4 NORMAL UNKNOWN 465 SYNCING CACHED
no rows selected
-- After some time, everything would become normal.
select name,path,state,header_status,REPAIR_TIMER,mode_status,mount_status from
v$asm_disk;
NAME PATH STATE HEADER_ REPAIR_TIMER MODE_ST MOUNT_S
DATA1 ORCL:DATA1 NORMAL MEMBER 0 ONLINE CACHED
DATA2 ORCL:DATA2 NORMAL MEMBER 0 ONLINE CACHED
DATA3 ORCL:DATA3 NORMAL MEMBER 0 ONLINE CACHED
DATA4 ORCL:DATA4 NORMAL MEMBER 0 ONLINE CACHED
If same disk cannot be made available, or replaced, either ASM would auto drop the disk after
DISK_REPAIR_TIME has lapsed, or we manually drop this ASM disk. Rebalance would occur after the
disk drop.
Since the disk status if OFFLINE, we would need to use FORCE option to drop the disk. After dropping
the disk rebalance would start and can be monitored from v$ASM_OPERATION view.
ERROR at line 1:
Diskgroup altered.
2 REBAL RESYNC DONE 9 0 0
2 REBAL REBALANCE DONE 9 42 42
2 REBAL COMPACT RUN 9 1 0
Later we can replace the faulty disk and then add back the new disk again into this diskgroup. Adding
diskgroup back would initiate rebalance once again.
Diskgroup altered.
2 REBAL RESYNC DONE 9 0 0
2 REBAL REBALANCE RUN 9 37 2787
2 REBAL COMPACT WAIT 9 1 0
20. For very large databases , should we use small AU or large AU?
If the database is very big, then larger AU is recommended, Because
Reduced SGA size to manage the extent maps in the RDBMS instance
Increased file size limits
Reduced database open time, because VLDBs usually have many big
datafiles( In 11g , this has been eradicated by fetching extents only on
demand).
In Oracle 11g, only the first 60 extents in the extent map are sent at file-open time. The
rest are sent in batches as required by the RDBMS.
24. In your diskgroup, all the disks are of same size. But still when
you find that disks are not balanced. What could be the reason?
1. Either asm disk was added with rebalance power of 0(ZERO).
2. Or Previous rebalance by aborted due to any reason. ( which was not
completed after that).
With Oracle Flex ASM, the clients can connect to remote ASM using network connection
(ie ASM network ). If a server running an ASM instance fails, Oracle Clusterware will start
a new ASM instance on a different server to maintain the cardinality. If a 12c database
instance is using a particular ASM instance, and that instance is lost because of a server
crash or ASM instance failure, then the Oracle 12c database instance will reconnect to an
existing ASM instance on another node. These features are collectively called Oracle Flex
ASM.
62. Suppose the spfile location inside the gpnp profile is missing,
Will the asm start during cluster startup?
When an Oracle ASM instance searches for an initialization parameter file, the search
order is:
1. The location of the initialization parameter file specified in the Grid Plug and
Play (GPnP) profile
2. If the location has not been set in the GPnP profile, then the search order
changes to:
a. SPFILE OR PFILE in the Oracle ASM instance homeFor example,
the SPFILE for Oracle ASM has the following default path in the
Oracle Grid Infrastructure home in a Linux environment:
$GRID_HOME/dbs/spfile+ASM.ora
Note:
A PFILE or SPFILE is required if your configuration uses nondefault initialization
parameters for the Oracle ASM instance.
63. User ran select * from EMP, where the datafile is in ASM .
Explain how it will get the data from asm disks.
64. How you estimate how much the rebalance will take?
3 SCAN listeners sufficient for any RAC setup. It is not mandatory for scan listener to run
on all the nodes.
10. Why RAC has separate undo tablespace for each node?
If we keep only one undo, then it need more coordination between nodes and it will
impact the traffic between the instances.
In RAC, local_listener parameter points to node vip and remote_listener is set to the scan
Purpose of Remote Listener is to connect all instances with all listeners so the instances
can propagate their load balance advisories to all listeners. Listener uses the advisories
to decide which instance should service client request. If listener get to know from
advisories that its local instance is least loaded and should service client request then
listener passes client request to local instance. If local instance is over loaded then
listener can use TNS redirect to redirect client request to a less loaded instance means
remote instance. This Phenomenon is also called as Server Side Load balancing.
12. What are local registry and cluster registry?
13. What is client side load balancing and server side load
balancing?
14. What are the RAC related background processes?
LMON –
(Global Enqueue Service Monitor) It manages global enqueue and resources.
LMON detects the instance transitions and performs reconfiguration of GES
and GCS resources.
It usually do the job of dynamic remastering.
LMD – >
referred to as the GES (Global Enqueue Service) daemon since its job is to
manage the global enqueue and global resource access.
LMD process also handles deadlock detection and remote enqueue requests.
LCK0 -(Instance Lock Manager) > This process manages non-cache fusion resource
requests such as library and row cache requests.
LMS – ( Global Cache Service process) – >
Its primary job is to transport blocks across the nodes for cache-fusion
requests.
GCS_SERVER_PROCESSES –> no of LMS processes specified in init. ora
parameter.
Increase this parameter if global cache is very high.
ACMS:
Atomic Controlfile Memory Service.
ensuring a distributed SGA memory update is either globally committed on
success or globally aborted if a failure occurs.
RMSn: Oracle RAC Management Processes (RMSn)
It usually helps in creation of services, when a new instance is added.
LMHB
Global Cache/Enqueue Service Heartbeat Monitor
LMHB monitors the heartbeat of LMON, LMD, and LMSn processes to ensure
they are running normally without blocking or spinning
15. What is TAF?
TAF provides run time failover of connection. There are different options we can mention
while creating taf policy.
Let’s say we created TAF with select option. Now Suppose a user connecting to using
the taf and running a select statement. While select statement is running, the node on
which the select statement running crashed. So the select statement will be
transparently failed over to other node and select statement will be completed and
results will be fetched.
Oracle Clusterware uses the VD to determine which nodes are members of a cluster.
Oracle Cluster Synchronization Service daemon (OCSSD) on each cluster node updates
the VD with the current status of the node every second. The VD is used to determine
which RAC nodes are still in the cluster should the interconnect heartbeat between the
RAC nodes fail.
GES and GCS are two important parts of GRD(Global resource Directiory)
GES and GES have a memory structure in Global resource , which is distributed across the
instance. It is part stored in shared pool section.
Global enqueue service ( GES) handles the enqueue mechanism in oracle RAC. It
performs concurrency control on dictonary cache, library cache locks and transactional
locks. This mechanism ensures that all the instances in cluster, know the locking status of
each other . i.e If node 1 want to lock a table , then it need to know what type of lock is
present in other node. Background processes like LCK0, LMD and LMON .
Glocal Cache Service(GCS) handles the block management. It maintains and tracks the
location and status of blocks. It is responsible for block transfer across instances. LMS is
primary background process .
1. Oracle performs roll forward recovery against the blocks, applying all redo
log recorded transactions.
2. Once redo transactions are applied, all undo records are applied, which
eliminates non-committed transactions.
3. Database is now fully available to surviving nodes.
NOTE – For the HAIP, to failover to other interconnect, there has to be another physical
interconnect,
It indicates issue with interconnect. If a requested block is not received by the instance in
0.5 seconds, the block is considered to be lost.
57. Suppose you have only one voting disk. i.e the diskgroup on
which voting disk resides is in external redundancy. And if that
voting disk is corrupted. What will be your action plan?
– stop crs on all the nodes ( crsctl stop crs -f )
– start crs in exclusive mode on one of the nodes (crsctl start crs -excl)
– start asm instance on one node using pfile , as asmspfile is inside asm diskgroup.
– create a new diskgroup NEW_VOTEDG(
– move voting disk to NEW_VOTEDG diskgroup ( crsctl replace votedisk
+NEW_VOTEDG) – It will automatically recover the voting disk from the latest OCR
backup.
– stop crs on node1
– restart crs on all nodes
– start crs on rest of the nodes
– start cluster on all the nodes
63. If olr file is missing ,How can you restore olr file from backup
# crsctl stop crs -f
# touch $GRID_HOME/cdata/<node>.olr
# chown root:oinstall $GRID_HOME/cdata/<node>.olr
# ocrconfig -local -restore $GRID_HOME/cdata/<node>/backup_<date>_<num>.olr
# crsctl start crs
64. Someone deleted the olr file by mistake and currently no
backups are available . What will be the impact and how can you
fix it?
If OLR is missing , then if the cluster is already running, then cluster will run fine. But if
you try to restart it , It will fail.
So you need to do below activities.
On the failed node:
# $GRID_HOME/crs/install/rootcrs.pl -deconfig -force
# $GRID_HOME/root.sh
In this case, you can run kfod command to find the missing patch.
Action plan:
Run kfod op=PATCHES on all the nodes and see on which nodes if any patch is missing.
Lets say you found that patch 45372828 is missing on node 2, then
On node2 as a root user , run below command
root#$GRID_HOME/bin/patchgen commit 45372828
After that you can run below commands to verify whether patch level is same or not .
kfod op=PATCHLVL
kfod op=PATCHES
After the confirmation , you can run rootcrs.sh -patch
73.In a rac system , What will happen if i kill the pmon process?
The pmon will be restarted automatically.
80. OCR file has been corrupted, there is no valid backup of OCR.
What will be the action plan?
In this case , we need to deconfig and reconfig.
deconfig can be done using rootcrs.sh -deconfig option
and reconfig can be done using gridsetup.sh script.
81. Can i have 7 voting disks in a 3 node RAC? Let’s say in your
grid setup currently only 3 voting disks are present. How can we
make it 7?
82. I have a 3 node RAC. where node 1 is master node. If node 1 is
crashed. Then out of node2 and node3 , which node will become
master?
83. Is dynamic remastering good or bad?
84. What will happen if I kill the crs process in oracle rac node?
85. What will happen if I kill the ohasd process in oracle rac node?
86. What will happen if I kill the database archiver process in
oracle rac node?
It will be restarted.
90. CSSD is not coming up ? What you will check and where you
will check.
1. Voting disk is not accessible
2. Issue with private interconnect
2.the auto_start parameter is set to NEVER in ora.ocssd resource . ( To fix the issue,
change it to always using crsctl modify resource )
92. crsctl stat res -t -init command, what output it will give?
93. What are the different types of heart beats in Oracle RAC?
There are two types of heart beat.
Network heartbeat is across the interconnect, every one second, a thread (sending) of
CSSD sends a network tcp heartbeat to itself and all other nodes, another thread
(receiving) of CSSD receives the heartbeat. If the network packets are dropped or has
error, the error correction mechanism on tcp would re-transmit the package, Oracle does
not re-transmit in this case. In the CSSD log, you will see a WARNING message about
missing of heartbeat if a node does not receive a heartbeat from another node for 15
seconds (50% of misscount). Another warning is reported in CSSD log if the same node is
missing for 22 seconds (75% of misscount) and similarly at 90% of misscount and when
the heartbeat is missing for a period of 100% of the misscount (i.e. 30 seconds by
default), the node is evicted.
Disk heartbeat is between the cluster nodes and the voting disk. CSSD process in each
RAC node maintains a heart beat in a block of size 1 OS block in a specific offset by
read/write system calls (pread/pwrite), in the voting disk. In addition to maintaining its
own disk block, CSSD processes also monitors the disk blocks maintained by the CSSD
processes running in other cluster nodes. The written block has a header area with the
node name and a counter which is incremented with every next beat (pwrite) from the
other nodes. Disk heart beat is maintained in the voting disk by the CSSD processes and
If a node has not written a disk heartbeat within the I/O timeout, the node is declared
dead. Nodes that are of an unknown state, i.e. cannot be definitively said to be dead, and
are not in the group of nodes designated to survive, are evicted, i.e. the node’s kill block
is updated to indicate that it has been evicted.
Reference – https://databaseinternalmechanism.com/oracle-rac/network-disk-
heartbeats/
[grid@Linux-01 ~]$ crsctl get css misscount
CRS-4678: Successful get misscount 30 for Cluster Synchronization Services.
[grid@Linux-01 ~]$ crsctl get css disktimeout
CRS-4678: Successful get disktimeout 200 for Cluster Synchronization Services.
Overview:
Below are the questions asked to a candidate with 10 years of experience in Oracle DBA
and postgres . There were 2 rounds.
Round 1 :
What are your day to day activities. And what is your role in the Team?
Explain cluster startup sequence.
Suppose users are connected to node1 and Os team, want to do emergency
maintenance and they need to reboot. What wil happens to the transactions
on node1 , and can we move them to node 2.
What are the advantages of partitioning?
Can we convert non-partitioned table to partitioned table online( in
production)?
What are the cloning methods. And what is the difference between duplicate
method and restore method?
How can we open the standby database as read write for testing and Once
testing is done need to revert to previous position. But without using
snapshot command?
Explain how you did grid upgrade.
Which advanced security features you have used ?
What is audit purging?
Lets says a query was running fine for last few days , but today it is not
performing well. How you will troubleshoot it?
What is oracle in memory ?
Explain what type of patches you applied on grid.
As per resume ,You have worked on ansible . So how good you are with
ansible and what type of scripts you have wrote.
How can you troubleshoot a shell script?
2nd round:
Explain ASM architecture , means when ever user runs a query, how it gets
the data from ASM.
Why ASM performance is better
Why VIP is used , Bcoz before 10g vip was not used. What benefits vip
provide
What is Current block and CR block in RAC
What wait events are there in RAC??
What is gc buffer busy wait
What is gc 2 way current wait, gc 3 way current wait.
What is node eviction. In which cases it happens.
How you do load balancing using services in your project. Explain
What activities you did using goldengate
How you did upgrade/migration using goldengate.
What is goldengate instantiation.
What is bounded recovery in goldengate
What is the difference between CSN in goldengate vs SCN in database?
Which type of os tool/commands you have used for os
monitoring/performance. And how you read it.
How you troubleshoot I/O issues in oracle
What type of sql performane tools you use.
How you migrate a plan from one db to another db
What is the dbms package for baseline and sql tuning?
Lets say, despite importing the baseline, query is not picking the plan, What
can the be reason?
What is index fast full scan and index range scan?
What is the purpose of standby redolog.
Any automation tools you have used?
Did you worked on jenkins or gitlab?
What activities you did in postgres.
How you implement backup in postgres.
Is your postgres cluster active active or active passive?
Can we create a active active postgres cluster?
Do you remember any postgres extensions?
What type of scripts you have written on ansible?
Tell me some modules of ansible.
How can we encrypt the server credentials in ansible.
First round:
oracle dba question
What happens during hot backup?
Why hot backup generates lot of redo?
Different between small file tablespace and big file tablespace . Can pros and
cons of both type of tablespace?
Cconfigure controlfile autobackup on- what is the use and is it mandatory?
Can I drop the controlfile of asm instance
How asm interacts with database
How I improve the i/o performance between asm and database instance
Different between expired and obsolete
What is partition pruning?
Can I restore a database from obsolete backup?
Difference between retention 7 days and recovery window 7 days
What it the role backgrond process LMON?
What is i/o fencing
How can u find the master nodes in oracle RAC
Between raw device and asm , whose performance is better
Can I have disks of different sizes in a diskgroup
What is the minimum number of diskgroups in rac and why ?
User is complaing that he is getting block sessoion in RAC, but when we
checked by connecting to that instance, then you don’t see any blocking
session. Even the sid is also not available, What could be the issue
What g stands in gv$sqll
Different paramters in Asm instance
What is hugepages and how it is related to RAC?
What is the default pagesize in linux
Where can you find the oratab entry in sun solaris.
Can I move the oratab entry to /etc/oratab
What is ACMS?
Are you aware of hangcheck?
How comfortable you are with shell scripting?
Yesterday RMAN backup ran for 10 min , But today it is taking more than 1
hour and it is still running. Note – databaese size is same, And load is normal.
Do you know what is hangcheck timer
What is load balancing advisory?
Postgres question:
What is ctid in postgres
I want to rebuild all the indexes in postgres, How can I do?
How can we reset the password in postgres?
How can u check the version in postgres
Explain datatypes in postgres
What type of querying language postgres support?
What is the latest postgres version. And which postgres version currently
you are using?
Which tools you use to manager and monitor postgres
Compare postgres and oracle
While dropping a database in postgres, I am getting error and unable to
drop. What could be the issue?
2nd round:
ROUND 1: ( Technical Round)
Background processes in oracle database
Explain about dbwr and lgwr
What is shared poool. What it does
What is the role of PGA
What it contains?
What is the persistent area in PGA
Explain how update statement works.
Lets say, one user is updating and other one is selecting, How it will get the
data.
What is touchcount and its related question?
What redo buffer contains.
How recovery happens?
What type of issues you face in oracle database
How can we control db writer process. What should the be the value of db
writer process.
Different status of buffer in buffer cache.
Different status of redo logfile
Can i drop a online redolog from oracle?
What core dba issues you face
How transaction recovery happens
How update statement work flow happens
RAC STARTUP SEQUENCE.
DIFFERENT BACKGROUND PROCESSES in RAC.
Role of ocr, vd.
How many. Number of voting disk for 8 node RAC?
Which file while starting cluster and what happens next?
Why OLR is required?
What is gpnp profile
Role of LMS, LMON, LMD, LCK,
What is dynamic remastering
What happens during instance reconfiguration
Which process responsible for instance reconfig
What is GCS,GES and GRD and which processes are responsible for this.
What is past image
Instance recovery in RAC
Which proess does node eviction, Which node gets evicted
Different protection modes in dataguard
Which process gets the data from primary to standby
Can we convert protection mode
AFFIRM and NOAFFIRM
What type of issues come in standby dataguard
What are some parameters in dataguard
What is fal_server
What is log_archive_dest_1 and log_archiv_dest_2
What is db_name and db_unique_name
New features of oracle 19c dataguard
What is far sync
What is db_flashback_retention_target
Different between dataguard and active dataguard
Difference between force logging and supplemental logging
Difference between classic and integrated
What is the parallel integrated apply
What is the coordinated integrated apply
What is handle collision
What is the common issues in goldengate and how you handle it
What is discard file
Where LCRs are stored
Lets say process touched limit and you are unable to login with sysdba ,what
you will do
What is huge pages and why we need to enable hugepages.
If database is running slow, what are the things we need to check.
Why dbwr called lazy writer?
What is the voting disk timeout value?
ROUND 2:( Technical Round)
How select statement processing happens?
How insert statement processing happens?
Explain cache fusion
Explain cluster startup sequence
How asm gets started
Explain flex asm
How to enable flex asm
What is asm proxy?
How you can recover undo tablespace corruption.
While installing grid, what happens when you do root.sh script.
In flex asm , how database connects with asm
If you lost your OLR , How will you troubleshoot?
If you cluster node gets rebooted, which logs you will observer usually.
Explain how you apply patch manually in RAC
Lets say, you applied patch on node 2, and ran rootcrs.sh -post , and now
the command is not coming out and error this patch mismatch, When you
checked the oracle inventory , you found that patches are same . How you
will troubleshoot it
Explain steps for node addition
Explain cache fusion , Which processes are responsing for cache fusion.
Explain write write scenarios in cache fusion
What is dynamic resource remastering? Is it good or bad?
If I remove the entry for spfile in gpnp profile, then will the crs start?
How you check the private interconnect issues.
How to do db upgrade.
What happens during db upgrade
How many phases are there in db upgrade
If I don’t do the timezone upgrade, will the database work.
What is far sync?
Which process send the redologs to standby
Which process received the data in standby
If block corruption happens in standby , how you willl recover it
What is multi instance redo apply . And how I can enable multi instance
redo apply in dataguard.
What is sql quarantine .
ROUND 3( Managerial Round):
Mostly questions about
The candidate education history and work experience history.
Some high level question on database migration.
High level question on RAC node eviction.
Asked some question about some real time RAC issues that you faced in the
past.
Asked about the certifications.
Asked whether any knowledge on oracle cloud or any other technologies
Asked whether any knowledge on exadata.
Asked how good you are at weblogic troubleshooting
Some discussion happened on salary and work related discussion
RMAN Interview QA
5. Can we take RMAN backup when the database is down?
For RMAN backup , database has to be in mount or open state. For cold backup only the
database has to be down completely.
6. What happens when we put the database in hot backup mode i.e
alter database begin backup? Why it generates lot of redo?
DBWn checkpoints the tablespace (writes out all dirty blocks as of a given
SCN)
CKPT stops updating the Checkpoint SCN field in the datafile headers and
begins updating the Hot Backup Checkpoint SCN field instead
LGWR begins logging full images of changed blocks the first time a block is
changed after being written by DBWn
Why lot of redo?
Lets say
Full block image logging during backup eliminates the possibility that the backup will
contain unresolvable split blocks. To understand this reasoning, you must first
understand what a split block is. Typically, Oracle database blocks are a multiple of O/S
blocks. For example, most Unix filesystems have a default block size of 512 bytes, while
Oracle’s default block size is 8k. This means that the filesystem stores data in 512 byte
chunks, while Oracle performs reads and writes in 8k chunks or multiples thereof. While
backing up a datafile, your backup script makes a copy of the datafile from the
filesystem, using O/S utilities such as copy, dd, cpio, or OCOPY. As it is making this copy,
your process is reading in O/S-block-sized increments. If DBWn happens to be writing a
DB block into the datafile at the same moment that your script is reading that block’s
constituent O/S blocks, your copy of the DB block could contain some O/S blocks from
before the database performed the write, and some from after. This would be a split
block. By logging the full block image of the changed block to the redologs, Oracle
guarantees that in the event of a recovery, any split blocks that might be in the backup
copy of the datafile will be resolved by overlaying them with the full legitimate image of
the block from the archivelogs. Upon completion of a recovery, any blocks that got
copied in a split state into the backup will have been resolved by overlaying them with
the block images from the archivelogs. All of these mechanisms exist for the benefit of
the backup copy of the files and any future recovery. They have very little effect on the
current datafiles and the database being backed up. Throughout the backup, server
processes read datafiles DBWn writes them, just as when a backup is not taking place.
The only difference in the open database files is the frozen Checkpoint SCN, and the
active Hot Backup Checkopint SCN.
7. What is a snapshot control file?
the snapshot controlfile is a backup creates before the actual backup starts in order to
have a ‘read consistent’ view of the controlfile during the backup and during resync
catalog.
This snapshot controlfile ensures that backup is consistent to point in time . So if you
add a tablespace after the backup has been started, Then the new file will not be backed
up.
No retention policy:
Means the backups will not be obsolete at all.
CONFIGURE RETENTION POLICY TO NONE;
19. How can i take rman backup to multiple directory is local file
sytem?
Yes we can do that.
22. Let’s say there is a requirement that ,Your daily rman backup
need to be completed within 40 min . If it exceeds backup should
be stopped automatically? Can we do that ?
backup duration command we can use:
27. Can we restore one table or table partition from rman backup?
Explain how it works internally?
Yes from 12c onwards we can do it.
28.When we duplicate the database using rman active duplication
method, does the dbid gets changed for the new database?
29. Can we change the dbid to a value of our own choice?
Yes we can do it.
38. For a schema, its tables are under tablespace DATA and its
indexes are under tablespace IDX. Now if i run rman duplicate by
skipping IDX tablespace, What will happen? Will it work?
So we cannot skip IDX tablespace. As this is not a self contained tablespace i.e there is a
dependency between objects in DATA and IDX tablespace. So either we need to skip
both or we need to duplicate both .
45. Explain how rman works internally , when you are running
rman database backup?
Lets say you connected as rman to target db and run backup database ;
rman target /
RMAN> backup database;
1. RMAN makes the bequeath connection with the target database.
2. It then connect to internal database user sys.RMAN and spawn multiple
channel as mentioned in script.
3. Then RMAN make a call to sys.dbms_rcvman to request database schema
information from controlfile( like datafile info,scn)
4. After getting the datafile list, it prepare for backup.To guarantee consistency
it either builds or refreshes the snapshot control file.
5. Now RMAN make a call to sys.dbms_backup_restore package to create
backup pieces.
6. If controlfile autobackup is set to on, then it will take backup of spfile and
controfile to backupset.
7. During backup, datafile blocks are read into a set of input buffers, where they
are validated/compressed/encrypted and copied to a set of output buffers.
The output buffers are then written to backup pieces on either disk or tape
(DEVICE TYPE DISK or SBT).
46. Rman full database started at 5:00 AM , while it is still running
at 5.30 AM , a new datafile has been added. The backup completed
at 6:00 AM. So will the new datafile will be part of this rman
backup set?
No it will not be added in the rman backup.
49. Lets say I have taken a full backup at 5 AM. and after 2 hour ( 7
AM ) I added 2 datafiles. And now At 9 am , the db
crashed/corrupted. So i need to restore/recover the database. Can
i use the full backup to restore/recover the database including
those datafiles which were added later?
Yes we will restore the full backup , then recover using the archivelogs to the current
point.
V$DATABASE_BLOCK_CORRUPTION , and then connect with rman and run block recover
command.
51. What is the relation between large pool size and rman backup?
52. How can we take RMAN cold backup ?
if the database is no archivelog mode, then also you can startup the database in mount
stage and take backup.
53. Suppose someone dropped a table mistakenly in production
database. What will be your immediate action plan to recover that
table?
54. What is backup optimization in RMAN?
WalWriter- > WAL buffers are written out to disk at every transaction commit. The
default value is -1.
Checkpoint
Stats Collector -> It collects and reports information about database activities. The
permanent statistics are stored in pg_catalog schema in the global subdirectory.
Archiver ->. Setting up the database in Archive mode means to capture the WAL data of
each segment file once it is filled and save that data somewhere before the segment file
is recycled for reuse.
4. What are the memory components in postgres?
Shared Memory:
shared_buffer ->
o − Sets the number of shared memory buffers used by the
database server
o − Each buffer is 8K bytes
o − Minimum value must be 16 and at least 2 x max_connections
o − Default setting is managed by dynatune.
o − 6% – 25% of available memory is a good general guideline
o − You may find better results keeping the setting relatively low
and using the operating system cache more instead
wal_buffer ->
o Number of disk-page buffers allocated in shared memory for
WAL data
o − Each buffer is 8K bytes
o − Needs to be only large enough to hold the amount of WAL
data created by a typical transaction since the WAL data is
flushed out to disk upon every transaction commit
o − Minimum allowed value is 4
o − Default setting is -1 (auto-tuned)
clog buffer
Private Memory used by each server process:
Temp_buffer -> for temp table operations.
work_mem ->
o − Amount of memory in KB to be used by internal sorts and hash
tables beforeswitching to temporary disk files
o − Minimum allowed value is 64 KB
o − It is set in KB and the default is managed by dynatune
o − Increasing the work_mem often helps in faster sorting
o − work_mem settings can also be changed on a per session basis
maintenaince_work_mem ->
− Maximum memory in KB to be used in maintenance operations such as
VACUUM, CREATE INDEX, and ALTER TABLE ADD FOREIGN KEY − Minimum
allowed value is 1024 KB
autovacuum_work_mem ->
− Maximum amount of memory to be used by each autovacuum worker
process
wal_keep_size =
wal_level =
1. pg_start_backup,
2. CREATE DATABASE,
3. pg_ctl stop|restart,
4. pg_stop_backup,
5. When we issue checkpoint command manually.
For periodic checkpoints below parameter play important role.
checkpoint_timeout = 5min
max_wal_size = 1GB
A checkpoint is begun every checkpoint_timeout seconds, or if max_wal_size is about to
be exceeded, whichever comes first. The default settings are 5 minutes and 1 GB,
respectivel
With these (default) values, PostgreSQL will trigger a CHECKPOINT every 5 minutes, or after
the WAL grows to about 1GB on disk
9. Lets say your work_mem is 4MB. However your query is doing
heavy sorting, for which work_mem required might be more than
4MB. So will the query execution be successful?
Yes it will be successful , Because it utilized the work_mem fully, it started using temp
files.
Trust: for this option users can connect to the database without specifying a
password. When using this option one should be cautious.
Reject: This option rejects a connection to a database(s) for a user for a
particular record in the file.
Password: this option prompts the user for a password before connecting to
the database. When this method is specified the password is not encrypted
between the client and the database.
Md5: this option prompts the user for a password before connecting to the
database. When this method is specified the client is required to supply a
double-MD5-hashed password for authentication.
rver.
Ident – Obtain the operating system user name of the client by contacting
the ident server on the client and check if it matches the requested database
user name. Ident authentication can only be used on TCP/IP connections.
When specified for local connections, peer authentication will be used
instead.
If you are doing any changes to pg_hba.conf file, then you need to reload /restart the
cluster for the changes to take place.
The first bit, if set, indicates that the page is all-visible( means those pages need not to
be vacuumed)
The second bit, if set means, all tuples on this page has been frozen. ( no need to
vacuum)
Note – > Visiblity map bits are set by VACUUM operation. And if data is modified ,bits
will be cleared.
When a tuple is inserted, postgres uses the FSM of the respective table, to select the
page, where it can be inserted.
The VACUUM process also updates the Free Space Map and usi
To deal with this problem, PostgreSQL introduced a concept called frozen txid, and
implemented a process called FREEZE.
In PostgreSQL, a frozen txid, which is a special reserved txid 2, is defined such that it is
always older than all other txids. In other words, the frozen txid is always inactive and
visible.
In version 9.4 or later, the XMIN_FROZEN bit is set to the t_infomask field of tuples rather
than rewriting the t_xmin of tuples to the frozen txid
Disadvantage is , it causes bloating. Also it needs more data to keep multiple version of
the data. If your database doing lot of DML activities, then postgres need to keep all the
old transaction records also(updated or deleted) . And maintenance activity like vaccum
need to be done remove the dead tuples.
The physical location of the row version within its table. Note that although the ctid can
be used to locate the row version very quickly, a row’s ctid will change if it is updated or
moved by VACUUM FULL.
27. What is oid?
Every row in postgres will have a object identifier called oid.
28. difference between oid and relfilenode?
29. Difference between postgres.conf and postgres.auto.conf file?
postgres.conf is the configuration file of the postgres cluster. But when we do any config
changes using alter system command, then those parameters are added in
postgres.auto.conf file.
When postgres starts , it will first read postgres.conf and then it will read
postgres.auto.conf file.
When this parameter is on, the PostgreSQL server writes the entire content of each disk
page to WAL during the first modification of that page after a checkpoint. This is needed
because dirty write that is in process during an operating system crash might be only
partially completed, leading to an on-disk page that contains a mix of old and new data.
The row-level change data normally stored in WAL will not be enough to completely
restore such a page during post-crash recovery. Storing the full page image guarantees
that the page can be correctly restored.
32. What is xmin and xmax in postgres?
xmin and xmax are in row header.
When a row is inserted, the value of xmin is set equal to the transaction id that
performed the INSERT command, while xmax is null.
When a row is deleted, the xmax value of the current version is labeled with the
transaction id that performed DELETE.
The new row will have xmin same as that of xman of previous version.
1. Bulk read
2. Bulk write ( like CTAS, COPY FROM , alter table)
3. During autovacuum process
Once these processes are completed, ring buffer is released.
Note – Dont get confused with Oracle RAC cluster, where multiple instances run from
multiple nodes for one database.
ADMINISTRATION:
1. What is the default port in postgres?
Default is 5432.
It is also important to note that the name of the tablespace must not start with pg_, as
these are reserved for the system tablespaces.
postgres=# \db+
4. What are the tablespaces created by default after installing
postgres cluster?
Pg_global – > PGDATA/global – > used for cluster wide table and system catalog
Pg_default – > PGDATA/base directory – > it stores databases and relations
Whenever any user creates any table or index, it will be created under pg_default and
this is the default_tablespace setting.
However you can change the default_tablespace setting using below one.
Template0 – > No changes are allowed to this database. If you messed up your
template1 database , then you can revert them by createing a new one from template0.
In the below screenshot, you can see for template1 , datallowconn is true, But template0
it is false.
8. What are common database object names in postgres compare
to industry terms.
smart. – > It is the normal mode . i.e shutdown will wait for all existing connection to be
terminated And it might take a lot of time. And no new connections will be allowed
during this.
fast -> It terminates all existing sessions and performs checkpoint . It is comparably
quick. But if there is lot of trnsactions that need to written to disk, then checkpoint might
take lot of time.
immediate – > It will abort the instance. I.e it will terminate all existing session and
shutdown the instance without performing checkpoint. It is the quickest . But upon
instance startup , recovery will happen.
So if you have lot of pending transactions in your database, then the best way is perform
checkpoint manually and then stop the server.
Because these extensions use shared memory, we need to restart the postgres cluster.
the reason we are preloading these libraries is to avoid the library startup time, when
the library is first used.
16. what different types of streaming replications are present in
Explain analyze -> It will execute the query and provides query statistics of the executed
query. This gives more accurate plan details. Please be careful while running
insert,update,delete like DML commands with explain analyze, as it will run the query
and cause data changes.
char,vchar,
int,float(n)
uuid
date
27. What are some key difference between oracle and postgres?
28. Between postgres and nosql database like mongodb , which
one is better?
29. What is table partitioning in postgres? What are the
advantages?
30. How you monitor long running queries in postgres?
we can use pg_stat_activities to track .
pg_track_settings
pg_profile
33. What are the popular tools for managing backup and recovery
in postgres?
edb bart , barman etc
MAINTENANCE:
1. What is vacuuming in postgres?
Whenever tuples are deleted or becomes obsolete due to update, then they are not
removed physicallly. These tuples are known as dead tuples. So vacuuming can be used
to clear those dead tuples and release the space.
autovacuum_vacuum_scale_factor = 0.2;
autovacuum_analyze_scale_factor = 0.1;
autovacuum_vacuum_threshold (integer)50
autovacuum_analyze_threshold (integer)50
5. How can we increase the performance of auto vacuuming?
autovacuum_max_workers parameter helps , it will release number of parallel threads as
per the defined value.
In oracle , we will this clustering factor. clustering factor is low means, table data are is
orderly manner.
If we are accessing only single row, then this clustering factor doesn’t impact the
performance. However you are accessing multiple rows, using index, then clustering
factor will improve the performance.
NOTE – > CLUSTER will put exclusive lock on the table during this activity.
1. Index is bloated
REPLICATION:
1. Explain the basic streaming replication architecture.
1. When we start the standby servers, the wal receiver process gets started on
standby
2. Wal receiver sends connection request to primary
3. When primary receives wal receiver connection request, it starts walsender
process and connection is established between wal sender and wal_receiver
4. Now wal receiver send the standby’s latest LSN to primary.
5. If standby’s LSN < Primary’s LSN, then the wal sender send the required WAL
data to keep standby in sync.
6. The received wal data is replayed on standby.
2. How can you check whether the database is primary or
standby(replication).
method 1 :
We can check using below query. If the output is t means true i.e it is standby(slave
server). If false means its primary or master.
select pg_is_in_recovery();
method 2:
You can run below query.
1. OFF – > Means, commit doesnt wait for transaction record to be flushed to
disk.
2. Local – > commit waits until the transaction record is flushed to disk.
3. ON – > Commit waits , until standby servers mentioned in
synchronous_standby_names , confirm that data is flushed to standby disk.
4. remote_write -< Commit waits, until standby servers mentioned in
synchronous_standby_names , confirm that data is written to os , but not
necessrily the disk.
5. remote_apply – > commit waits, until the standby servers mentioned in
synchronous_standby_names apply those changes to the database.
11. Suppose A is primary and B is standby ? Now failover
happened and then A becomes standby and B becomes primary.
Now i want to make the A as primary again and B as standby . How
can i achieve that?
12. Explain the architecture of EFM?
You need one master, one slave and one witness server.
So to avoid this issue, we can create replication slot. Replication slot will ensure that , the
wal data which is not applied on standby will not be deleted.
pg_dump -> Used for taking backup of databases, schema,table .backup format is like
sql, tar file or directory
pg_dumpall -> Used for taking backup of complete cluster. backup format is sql .
2. Explain how you do a major upgrade in postgres?
STEPS:
1. Install new pg binary:
2. Initialize new pg binary:
3. shutdown both old and new pg cluster
4. run pg_upgrade with -c option for verify
5. run the actual pg_upgrade ( either with or without link option)
3. What is the user of link option in pg_upgrade?
while doing upgrade, if we are using link option, then data will not be copied from old
directory to new directory. Only the symbolic links will be created in the new data
directory.
For using link option, both old and new data_directory need to be in the same file
system.
cons: If we use link option , then we wont be able to use the old cluster .
PERFORMANCE RELATED:
Apart from this we can add nologging parameter in create index statment.
Create index , generates a lot of redo and these logs are of no use. So we can create the
create the index with nologgin as below.
low CLUSTERING_FACTOR means, data is in ordered manner in table. I.e we can say it is
good clustering factor. The minimum clustering_factor is equal to number of block of the
table.
High CLUSTERING_FACTOR means data is randomly distributed. i.e bad clustering factor.
Maximum clustering factor is equal to number of rows of a table.
Please note – rebuilding index, will not improve the clustering factor. You need to
recreat the table to fix this.
For calculating CF, Oracle will fully scan the index leaf blocks. These leaf blocks contain
rowid, So from rowid, it can find the block details of the table. So while scanning the
index, whenever is a change in block id, it will increment the CF by one.
GOOD clustering factor:
In the below diagram, for the first 4 rows, the block is 15 , so CF is 1 at that time. But the
fifth row is in different block, so CF incremented by 1 . So similarly with every block
change, the CF will be incremented by one.
In the below diagram, for the rows, the block id is getting changed very frequently. So CF
is very high. Which is bad.
When we create index, the rowids and the column_values are stored in the leaf block of
the index.
1. object_id
2. block number
3. position of the row in the block
4. datafile number in which the row resides.(i.e relative file number)
13. What data is stored in the index leaf block?
Index leaf block stores the indexed column values and its corresponding rowid( which is
used to locate the actual row).
14. How oracle uses index to retrieve data?
When query hits the database, The optimizer creates an execution plan involving the
index. Then the index is used to retrieve the rowid. And using the rowid, the row is
located in the datafile and block.
15. Explain the scenario where , the data will be retrieved using
only index ,without accessing the table at all?
In case of covering index, i.e when we are only retrieving the indexed column data.
18. Why my query is doing full table scan , despite having index on
the predicate.
With below scenarios the optimizer might not use the index.
But when we make the index unusable, then oracle optimiser will not use it and also the
the index will not be maintained bz oracle further. So we want to use the index again
then, we need to rebuild again. This is usually used in large environments to drop index.
Because in large database, dropping index may take a lot time. So First it can be made
unusable for some time and during low business hours, index can dropped.
20. Why moving a table , makes the index unusable?Do you know
any other scenarios which will make the index unusable?
When we move the table , the rows moved to a different location and gets a new rowid.
But Index still points to the old rowids. So we need to rebuild the index, which will make
the index entries to use new set of rowids for the table row.
21. Do you know any scenario where making the index unusable
can be helpful?
In DWH envs, when huge data loading operations are performed on tables having lot of
index, the performance will slow down, as all the respective indexes need to be updated .
So to avoid this, we can make the indexes unusable and load the data. And once loading
is completed, we can recreate/rebuild the indexes.
22. How index works on query with null value? Lets say i am
running a query select * from emp where dept_Name=null; and an
index is present on dept_name . In that case, will oracle use the
index?
First we need to find, what type of index it is using. If the index is B-Tree then null values
are not stored in the b-tree index. So it will do a TABLE FULL SCAN.
But if the index is a BITMAP index, then for null values also index will be used ( i.e it will
do index scan), Because bitmap index stores null value also.
23. Between b-tree index and bitmap index, which index creation is
faster?
Bitmap index creation is faster.
26. What is the difference between heap organized table and index
organized table?
Also used where requirement is for fast data access of primary key column data.
It is not recommended in DWH ,because it involve bulk data loading, which will make
the physical guess to stale very quickly.
35. Which index scan method cause , db file scattered read wait
event( Ideally indexes cause db file sequential read).
Index fast full scan cause db file scattered read wait event. Because this method used
multi block read operations to read the index. This type of operations run in parallel.
36. If i use not equal condition in the where clause, will it pick the
index? like select * from table_tst where object_id <> 100?
If this query returns high number of rows, then optimizer will use FULL TABLE SCAN i.e
index will not be used.
And if you try to force the query to use index using index hint, then also it will do INDEX
FULL SCAN, but not Index range scan.
If query is like select * from emp where emp_name like ‘SCO%’; ->> Then it will use
index
And
If query is like select * from emp where emp_name like ‘%SCO’; –> This will not use
Index
38. What is a functional Index?
If the query contains function on a index column in the where then normal index will not
be used.
i.e query like select * from emp where lower(emp_name)=’VIKRAM’; — Then optimizer
will not use normal index.
So in this case, We can create a function index as below , which will be used by
optimizer.
optimizer_index_cost_adj
Because, index rebuild need twice the size to complete the activity. Also it will generate
lot of redo during the activity and there might be impact on CPU usage.
The general believe that index rebuild balances the index tree, and improves clustering
factor and to reuse deleted leaf blocks.But this is myth. Because Index is always balanced
and the rebuilding index doesn’t improve clustering factor.
1. 50-50 Splitting – > To accomodate the new key, The new block will be
created, 50 percent will be stored in the existing leaf block and 50 percent
will be in the new block.
2. 90-10 splitting – > If the index entry is the right most value( when index entry
is sequentially increasing like transaction_date), then leaf block will be
splitted and the new index key will added to the new leaf block
HISTOGRAM and STATISTICS
1. What are statistics in oracle? What type of data in stores?
2. Difference between statistics and histogram?
3. What is a histogram ? What are different types of histogram?
Histogram , we can say a type of column statistics which provides more information
about data distribution in table column.
Pre-12c , there were only 2 types of histogram.
Height based,
Frequency based
From 12c , another two types of histograms were introduced, apart from above two.
Top N frequency
Hybrid
But if the data is non-uniform, i.e skewed, then the cardinality estimate will be wrong. So
here histograms comes into picture. It helps in calculating the correct cardinality of the
filter or predicate columns .
COUNT(*)
----------
17933
Execution Plan
----------------------------------------------------------
Plan hash value: 3640793332
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 9 | | |
|* 2 | INDEX RANGE SCAN| TEST_IDX2 | 3843 | 34587 | 1 (0)| 00:00:01 | ---
optimizer estimates incorrect number of rows.
-------------------------------------------------------------------------------
2 - access("OBJECT_TYPE"='TABLE')
SQL>
SQL> BEGIN
DBMS_STATS.GATHER_TABLE_STATS (
ownname => 'SYS',
tabname => 'HIST_TEST',
cascade => true, ---- For collecting stats for respective indexes
granularity => 'ALL',
estimate_percent =>dbms_stats.auto_sample_size,
degree => 8);
END;
/ 2 3 4 5 6 7 8 9 10
COUNT(*)
----------
17933
Execution Plan
----------------------------------------------------------
Plan hash value: 3640793332
-------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------
| 0 | SELECT STATEMENT | | 1 | 9 | 1 (0)| 00:00:01 |
| 1 | SORT AGGREGATE | | 1 | 9 | | |
|* 2 | INDEX RANGE SCAN| TEST_IDX2 | 17933 | 157K| 1 (0)| 00:00:01 | ---
Optimizer using the exact nuumber of rows.
-------------------------------------------------------------------------------
8. When oracle creates histogram?
When we run gather stats command with method_opt set AUTO, It will check the
sys.col_usage$ , table to see, whether the columns of the table used as join or predicate (
ie. in where clause or join). And if columns are present, then oracle will create histograms
for the column ( Only for the columns having skewed data).
ie. if a column has not used in join or where clause of any query, Then even if we run
gather stats, the histogram will not be created.
no rows selected
-- Run status:
COUNT(*)
----------
625700
DBMS_STATS.REPORT_COL_USAGE(OWNNAME=>'SYS',TABNAME=>'HISTO_TAB')
--------------------------------------------------------------------------------
LEGEND:
.......
###############################################################################
###############################################################################
1. Creation of histogram .
Syntax – >
FOR ALL INDEXED COLUMNS – Gathers base statistics for columns included in index
FOR ALL HIDDEN COLUMNS – Gathers base statistics for virtual columns
Auto means – > oracle will create histogram automatically as per usage in col_usage$
table.
If we say method_opt => ‘ for all columns size 1’ : It means base statistics will be
collected for all columns , and as bucket size is 1, so Histogram will not be created.
11. Is there any way we can configure the database ,that the
statistics will never go stale( Despite having thousands of
transactions).
From 19c onwards it is possible. 19c introduced real time statistics, Mean for the DML
activities , the statistics data will be collected in real time ( So stale stats will never occur).
We can combine both the option to speed up .(But make sure these can increase the
load on the system).
GENERIC:
1. What is an ITL?
2. Lets say a user want to update 50 rows in a table( i.e a block) ,
then how many ITL slots are required?
We need one ITL slot for one transaction. i.e ITL slots are not allocated as per row. It is as
per transaction id. Each slot is for one transaction id.
34. As we know that awr reports only few top sql queries. But I
have a requirement that, a specific sql query should be reported in
the awr report, whether it is a top sql or not. Can we do that.
Yes we can do this. We need to get the sql_id of the sql query and mark it as coloured
using dbms_workload_repository package.
FIRST_ROWS
FIRST_ROWS n(1,10,100) – .
34. Is it true that , parallel query scan use direct path read
bypassing the buffer cache?
Yes parallel scans are direct path read , they by passes the data buffer.So they dont have
to worry about catering blocks from buffer cache.
But what is there is dirty buffer in buffer cache, which is not written to disk. And if direct
read is happens from disk, then parallel query will give wrong results. So to fix this,
parallel query will first issue a segment checkpoint, so that dbwr will write all the dirty
buffers of that segment to disk.
37. What is bind peeking? What is the problem with bind peeking?
What oracle did to fix this issue?
The problem with bind was that, the query optimizer doesn’t know the literal values. So
especially when we use range predicates like < and > or between, the optimizer plan
might varry when liternal values passed in the where clause( i.e full table scan or use of
index). So optimizer might give wrong estimates when using binds.
So to overcome this, In oracle 9i bind peeking concept was introduced.( As per dictionary
, peeking means to have a glance quickly.) With this, before execution plan is prepared,
the optimizer will peek into the literal values and use that to prepare the execution plan.
So now as the optimizer is aware of the literal values, it can produce a good execution
plan.
But the problem with bind peeking was that, the optimizer will peek into the literal
values only with the first execution . And the subsequent query execution use same
execution plan of the first one, despite these could have a better execution plan.
Adapative cursor sharing is introduced in 11g oracle. Adaptive cursor sharing kicks in
when the application query uses binds or when cursor_sharing is set FORCE.
39. What is the significance of cursor_sharing parameter? What is
the default one?
Below are the 3 possible values of cursor_sharing parameter.
EXACT
FORCE
SIMILAR
The public database link need to created in primary with user sys$umf , pointing to
primary db itself.
1 row selected.
SQL_ID: 11838osd6slfjh88
SPID – > It is the operting system process. SPID is mostly used by DBAs for tracing or
killing session.
HASH JOIN:
Hash joins are used for joining large data sets. The Optimizer uses the smaller of the two
tables or data sources to build a hash table, based on the join key, in memory. It then
scans the larger table and performs the same hashing algorithm on the join column(s). It
then probes the previously built hash table for each value and if they match, it returns a
row.
SORT MERGE JOIN:
Sort Merge joins are useful when the join condition between two tables is an in-equality
condition such as, <, <=, >, or >=. Sort merge joins can perform better than nested loop
joins for large data sets. The join consists of two steps:
REFERENCE – > https://sqlmaria.com/2021/02/02/explain-the-explain-plan-join-
methods/
48. What is lost write in oracle?
49. Which checks are performed in parsing phase of a query?
SYNTAX CHECK: It checks whether syntax is correct or not
SEMANTIC CHECK: It checks whether the user has permisson to access the table or
whether table and column name is correct or not.
SHARED POOL CHECK: It checks whether it should do soft parse or hard parse. To
explain in detail. When ever a sql query comes, db calculates the hash value of that
statement. and then it search that hash value in the shared pool ( esp shared sql area). If
it is already present, then it will try to reuse the same execution plan. In that case we call
it soft parse. If the hash value is not found or it cannnot reuse the existing plan, then
hard parsing happens.
VMSTAT: Virtual Memory statistics : Details like cpu run queues, swap, disk operations
per second. paging details and load information.
usage – > vmstat -5 10
52. Explain about ORA-1555 snapshot too old error. What is the
action plan for this error?
If application is causing log of commits , then log_file_sync wait can be a top wait event
in AWR.
This wait event can be reduced by checking with appication, whether they can reduce
unnecessary commit and do commit in batches.
One solution is to increase the log buffer size and reduce the number of commits. Also
try avoiding putting the database in hot backup mode.
3. How can you fix log file switch(checkpoint incomplete) wait
event?
To complete a checkpoint, dbwr must write every associated dirty buffer to disk and
every datafile and controlfile should be updated to latest checkpoint number
Let’s say LGWR finished writing to log file 2 and ready to switch to log file 1 and start
writing. However the DBWR is still writing checkpoint related redo information of logfile
1 to disk. So it cannot write to logfile 1 unless that checkpoint is completed ( i.e unless
dbwr completed its writing)
To fix this. Increase the redo log size or add more number of log file.
Solution:
When rows in the table are in random order, then clustering factor will be very high. . i.e
more table blocks need to be visited to get the row in index block.
Also when index blocks are fragmented, then more number of blocks need to be visited,
which can cause this wait event.
This wait happens, when server process, got a latch on the hash bucket, But another
session is holding the block in buffer( either it is writing to that block or may be buffer
getting is flushed to disk).Some times this is known as read by other sessions.
To troubleshoot this issue, first we need to find out the the responsible object. We can
get the p1(file_id),p2(block),p3 (reason) details of the query from v$session_wait . And
by using the p1 and p2 values, we can get the segment details from the dba_extents.
If it is data block – > Means the query might be inefficient . May be we need to try
moving the hot blocks to a di
Buffer header array is allocated in shared pool. These buffer header arrays stores
attribute and status details of the buffers. These buffer headers are chained together
using double linked list.and linked to hash bucket. And there are multiple hash buckets
and these buckets or chains are protected by latch cache buffer chain.
If a process want to search or modify a buffer chain , then first it need to get a latch i.e
CBC latch.
So when multiple users want to access the same block or the other block on the same
hash bucket, then there will be contention for getting a latch. So when contention is
severe( i.e process is finding to difficult to get a latch) , then this LATCH : CBC wait event
will occur.
Not only accessing the same block.,But when simultaneous insert, update runs on a same
block , then cloned buffers copies also gets attached to same hash bucket and it
increases the buffer chain length . So process after getting the latch, takes more time to
scan through the long chain.
The wait event occurs when logical I/O is high. So to avoid this wait event, we need to
find a way to reduce the logical I/O.
Solution:
If the issue is due to no. of users accessing the same data set, then we can increase the
PCTFREE of table/index., so that there will less rows per block. and the data will spread
across multiple chains.
When the requested block is not available in Buffer cache, then server process need to
read the block from disk to buffer cache. For that it needs a free buffer. So for this ,
server process search through the LRU list, to get the free block. And search or access a
LRU list , first it needs to get a Latch which is called LATCH – cache buffer lru chain.
Also when the dbwr writes the buffers to disk, first it scans through lru list to get the
dirty buffer which need to flushed. For this purpose also it needs to get a latch.
Below are some of the activities which are responsible for this wait event.
Small buffer cache- Means , very frequently dirty blocks need to be writen to disk very
often. which will increase the latch contention.
Very lot of full table scan – Means, it need to find lot of free buffer to read the the data
from disk. Means very often latch need to obtained.
So concurrent DMLs happens on a block and There are no free ITL slots, then it wait for a
free ITL slot, which cause this wait event.
To fix this we can increase the INITRANS value , if still issue present then increase the
PCTFREE.
Steps to initrans.
ALTER TABLE INITRANS 60;
But problem is the new value is applicable for the new blocks. So to apply to changes to
all the blocks, we need to move the table and rebuild respective indexes.
If after updating initrans also issue is there , then update PCTFREE and move table and
index.
Because, insert cause index splitting, if the transactions are accessing the index , during
these splitting, then there will a waiting.
There are few solutions , which can be applied . But before doing changes, need to be
tested thoroughly.
1. Creating a reverse key Index.( This helps where only inserts happen, but
might impact the range scan)
2. Increase pctfree value( This will reduce index splitting)
12. What are different types of mutex wait events?
cursor: mutex X
cursor: mutex S
cursor: pin S
cursor: pin X
cursor: pin S wait on X
library cache: mutex X
library cache: mutex S
13. Library cache lock wait event?
Below are some reason which might cause this wait event.
Reasons
Small shared pool
Less cursor sharing / i.e more hard parsing
library cache object invalided and reloaded frequently
Huge number of child cursor for a parent cursor ( i.e high version count)
Non use of binds by application queries
DDL during busy activity
Solutions:
Increase shared pool size
Always run critical DDLs during non-busy hours
Avoid object compilations during peak time
Check query with literates( Check cursor_sharing parameter)