Performance Coutners
Performance Coutners
Performance Coutners
Ramanjaneyulu Narra
10/21/2009
1
Table of Index
Table of Index..............................................................................................................................................2
Foreword: ...................................................................................................................................................3
Windows:.....................................................................................................................................................3
UNIX Flavors:...............................................................................................................................................4
Apache Server:.............................................................................................................................................4
JVM Statistics:..............................................................................................................................................5
Tomcat Server:.............................................................................................................................................6
WebLogic Server:.........................................................................................................................................7
WebSphere:.................................................................................................................................................8
Oracle Database:.......................................................................................................................................10
Windows:
CPU - % Processor Time – % of the Processor being utilized
CPU - % User Time – % of the processor being occupied by user (Processor Time – User time will yeild
the OS time)
Page Faults – The total no of times the Page was requested ( If the page is found in the RAM, it is called
Soft fault, if it is not found in the RAM and has to be retrieved from the harddisk, it is called hard page
fault. More Soft faults means the pages are read from RAM which is good. If hard faults are more, it
indicates that less RAM)
Disk – Reads per Second (No of times pages were read from the Disk in a second)
Avg Disk Queue Length – If the disk is already in use, and another request is made to access the disk it
will be in the Queue. Disk Queue length indicates the length of the queue. High Disk queue length
indicates a bottleneck and somewhere either more logging is enabled or the RAM is too less.
Network – Bytes Sent per Sec (Total bytes sent on the network card in a second)
Bytes Received per Second (Total bytes received on the network card in a second)
Total Bytes per Sec (Total bytes tranmitted over the network card in a second)
TCP – packets retransmitted (Total amount of packets retransmitted, if more retransmission are
happening, it indicates a network issue (the receiver could not acknowledge the receipt of the packets)),
TCP – Connections Established (total no of TCP connections established, should be in the same pattern
as user load)
TCP – Active Connections (total no of connections established by the physical machine to any other
servers)
TCP – Passive Connections (Total no of connections established from other servers or machines to the
machine)
TCP – Connections Reset (Total no of connections that were reset, More Connection resets indicates a
network issue or Port unavailabiltiy)
Processor Queuelength (from System counter) – Indicates the length of the processor queue, ideally the
average value should not be more than double the no of processors.
If we are monitoring the Web/APP/DB servers, we can also add the Process counter in the perfMon and
select all the above processes to know the CPU, Memory utilization at the process level. For example, if
we are monitoring the Apache Server, we can go to the Processor Object, select the Apache Process and
select the counters like % Processor Utilization, % User time, committed bytes etc.
To monitor the Windows resources, we use PerfMon a built in utility in Windows OS.
UNIX Flavors:
CPU – User Time, System Time, Idle Time
Memory – Free Memory, Cached Memory, Swapped Memory (if this counter is increasing, it indicates the
RAM is less)
Network – Rx Bytes (Received Bytes), Tx Bytes (Transmitted Bytes) on various ethernet cards.
Note: These names may vary based on the flavor of the OS. But these names should be fine. We use
VMSTAT to get the CPU and MEMORY statistics and iostat to get the Disk statistics, netstat –s -t, netstat
–I to monitor the Network and TCP statistics.
We can connect to the UNIX machine through PUTTY and execute these commands which write the
output to a log file. Later these log files can be brought to Local machine through another utility named
WinSCP, which provides the UI for the UNIX file structure.
Apache Server:
Ready Children – Total no of available threads (in windows) / Processes (in UNIX flavours) to serve the
incoming requests
Busy Children – Total no threads/processes that are serving the incoming requests
MinSpare Servers – The minimum no of servers that should be avialable to serve the new requests. If
this value is set to 10, there should be always 10 new processes available to take up the new load. If this
value is coming down, the new processes will be invoked automatically.
MaxSpare Servers – The maximum no processes that can be kept idle. If this value is set to 20, and the
total ideal processes are 45, it will terminate 25 processes and the idle processes will comedown to 20.
Threds per Processes – The maximum no of processes that can be established per processes
(applicable on Windows only)
Keep Alive (ON/OFF) – Based on this value, the connection will be closed after processing a request. If
this is set to on, it will wait for another request till the KeepAlive Timeout setting. If no request is sent
until the KeepAlive Timeout value, the connections will be closed automatically. This will help reduce the
CPU time to establish a new connection.
Maximum requests per Processes – If KeepAlive is on, the maximum no of requests that can be served
on a single connection. Setting this to Zero indicates, unlimited. Setting this value to a constant, will help
in releasing the memory occupied by the process. Will be helpful, if there are memory leaks in the
application.
While increasing any of these values, we should consider the hardware constraints. Each Process
typically occupies 15MB.This may be high and depends upon the application type.
We can monitor the Apache server using the Status page. The URL for this can be:
http://localhost/server-status/refresh=10 will refresh the status page every 10 seconds.
To enable status reports only for browsers from the foo.com domain add this code to your httpd.conf
configuration file
<Location /server-status>
SetHandler server-status
Order Deny,Allow
Deny from all
Allow from .foo.com
</Location>
JVM Statistics:
JVM stands for Java Virtual Machine. Monitoring JVM is key in identifying memory leaks with a Java
Based application.
» Young generation
» Permanent Generation
overall heap.
Jstat is used to monitor, the JVM usage. The main coutners that we can get out of this JSTAT is no of the
time the YGC (Young GC) is happening, Time spent in YGC, no of times the FGC (Full GC) happened
and the time taken for FGC and total GC time. If FGC is happening more frequently, which indicates
either enough Heap settings, are not available for JVM or there are more live objects (which indicates that
the objects are not being cleared, a memory leak).
Tomcat Server:
Similar to Apache, Tomcat also can be monitored using the status page. The URL looks like
http://localhost/manager/status. Again we need to add some code snippet to the Tomcat user files.
Add the following lines in green to $CATALINA_HOME/conf/tomcat-users.xml.
Tomcat server status page also gives the information like Total no of threads available, busy, and idle.
We can see each request and time taken to process each request in both the Apache and Tomcat status
pages. This will help us in identifying the network delay at each layer. For example the Login transaction
is taking 14.35 seconds at the load testing tool and the same is taking around 8.35 seconds at the Web
Server level and it is taking 7.05 seconds at the Application server level, that indicates that the network
latency between the Web to LoadTest environment is more compared to the latency between the Web
and App Server. This will help us identifying in the network latencies.
WebLogic Server:
WebLogic server is industry standard application server and it has its own adminstration console to
monitor the WebLogic server. We can monitor the JVM statistics, we monitor the thread statistics, we can
monitor the Bean statistics and we can also monitor the hogging threads and their status before they were
crahsed using thread dump. WebLogic uses self –tuning mechanism and hence we do not need to set
any thread count any where. Some more information about WebLogic server is:
» Thread Pools – Active Executive Threads, Executive total threads, Queue Length,
Pending user request count, Completed request count, Hogging thread count, standby
thread count. Added to these, we can get the information about each thread, its status,
the request it is processing etc
» Metrics of Web Application – Servlets, current sessions, sessions high, Total Sessions
» JDBC Connection Pools (To establish a connection to the DB) – Active connections
in the Data Source, Total available connections in the data source, Prepared statement
cache, Waiting conditions for connection
WebSphere:
Similar to WebLogic, Websphere also has its own Administration console from where we can monitor the
performance data.
Basic Counters:
Enterprise Beans CreateCount The number of times that beans were created
ProcessCpuUsage The CPU Usage (in percent) of the Java virtual machine
JCA Connection CreateCount The total number of managed connections that are created
Pools
CloseCount The total number of managed connections that are
destroyed
Servlet Session LiveCount The total number of sessions that are currently live
Manager
System Data CPUUsageSinceLast The average CPU utilization since the last query
Measurement
Web Applications RequestCount The total number of requests that a servlet processed
Specified are the basic counters provided by the WebSphere (as recommended by WebSphere).
Other counters can be added by selecting the extended option or custom option. Definition of each
counter is available in PMI Custom Monitoring levels.
Oracle Database:
We use AWR Reports to monitor the Oracle database from 10g onwards. To monitor Oracle 9i, we have
STATSPACK. Based on the requirement we generate the snaps from every 15 minutes or so. We can
generate the HTML reports that compare two snaps. The major things that should be looked while
analyzing the AWR Report are:
The SGA size – Shared Global Area (can be accessed all the users that access the database, contains
the data realted to execution plan, the data that is being fetched from the physical files (will be stored in
the Database buffer cache, a part of SGA) etc. If the cache hit ratio is not close to 100%, it indicates that
Bind Variables are not implemented (Can be confirmed by more Hard parsings of the SQL statements)
The PGA size – The private space for each user, used to sort, join and Union the tables and the rows and
contains the session information of the user.
Top Timed events – To find the reasons for more elapsed time of a SQL statement. If the SQL execution
time is less and elapsed time is more, that indicates that the SQL is waiting for other resources than CPU.
These resources can be the table which is already locked. Most of the times, the more elapsed time of a
SQL is due to the table locked by another user and this user is waiting for the same table. We can see
some counter named latches or locks. If the SQL has to fetch too many rows and it has to sort them, then
the execution time will be more. If we have more PGA, we will have more space to sort the results. We
will get SGA advisory and PGA advisory in the AWR report to find out the optimal settings of the SGA and
PGA. These values should be altered at the Init.Ora file on the Oracle Database.
Top SQLs - This section gives us the SQLs that were taking more time. If we suspect that there is
aproblem with a particular SQL, we can get the Explain Plan for that SQL and confirm whether Indexing
was proper or not.
MS SQL Server:
We can monitor the MS SQL server using PerfMon or using SQL Profiler that comes default with the SQL
server. The major things that we should consider in MS SQL Server are:
• Locks/Sec
• DeadLocks
• Processing Time