Abstract: Recent projects have stressed the "need for speed" while handling large amounts of data, with near zero downtime. An analysis of multiple environments has identified optimizations and architectures that improve both performance and reliability. The session covers data gathering and analysis, discussing everything from the network (multiple NICs, nearby catalogs, high speed Ethernet), to the latest features of extreme scale. Performance analysis helps pinpoint where time is spent (bottlenecks) and we discuss optimization techniques (MQ tuning, IIB performance best practices) as well as helpful IBM support pacs. Log Analysis pinpoints system stress points (e.g. CPU starvation) and steps on the path to near zero downtime.
Report
Share
Report
Share
1 of 33
Download to read offline
More Related Content
Architecting and Tuning IIB/eXtreme Scale for Maximum Performance and Reliability, Featuring TBC
1. CONNECT WITH US:
Architecting & Tuning IIB /
eXtreme Scale for Maximum
Performance and Reliability
AJ Aronoff
Infrastructure Practice Director, Prolifics
Robert Gus Fort
Manager, Integration Development
2. CONNECT WITH US:
Agenda
Speaker Introduction
Challenge: The need for Speed & Scalability
Extreme Scale Basics
Use Cases
Performance Tuning
Measurement - Bottlenecks
Infrastructure
Maximizing Reliability and Reducing Downtime
References & What’s Next
2
3. CONNECT WITH US:
Speaker Introduction
A.J. Aronoff
Infrastructure Practice Director,
Prolifics
AJ Aronoff is a Practice Director covering The IBM Integration Product Suite
(IIB, Extreme Scale, MQ, WAS, SmartCloud APM, Data Power) for Prolifics.
During his twenty plus years at Prolifics he has focused on reliability (Zero
Downtime, High Availability, Disaster Recovery), scalability, performance and
monitoring. He is a frequent speaker on IIB, MQ, Security as well as industry
specific topics.
Robert Gus Fort
Manager, Integration Development
Prolifics Customer
Gus Fort learned about the need for speed and 100% reliability in
the U.S. Air Force. Since then he has been applying that same
quest for reliability and performance in his current role with a leading
automotive services retailer. His current projects use eXtreme Scale
to give customers a satisfying and rewarding experience on a high
speed and reliable system. Nearly doubling the number of stores
while reducing risk and increasing customer satisfaction
3
4. CONNECT WITH US: 4
5-Years Compound
Annual Growth Rate
19%
Employees
Worldwide
1,500
Global Presence
United States, United Kingdom, Germany, India
20+
Technology
Accelerators
550+
Technical
Certifications
Over 10 Technology and
Solutions Awards since 2009
including Business Agility,
Customer Integration and Digital
Experience, the first ever
Beacon Laureate for Business
Agility
Over 160 global customers
are currently Fortune
1000 companies
Best-in-class architects and
specialty experts:
BPM, Integration, Digital
Experience, Security, Testing,
Business Analytics and
Enterprise Content Management
End-to-End Project
Expertise
Rate of Repeat
Engagements*
91%
Prolifics at a Glance
Years in
Business
35+
Offices
14
Awards
Technology Expertise
Fortune 1000
*based on % revenue Source – December 2013 internal revenue metrics
5. CONNECT WITH US:
Extreme Scale Caching & The Need for
Speed
“As the world becomes more instrumented, interconnected
and intelligent, Internet-based activities, online
transactions and data volumes increase. Further, these
increasing amounts of data, along with rising consumer
expectations and the need to maintain a competitive edge
require fast and reliable performance…. Elastic caching is
the answer.”
- IBM Software Technical White Paper
- ftp://ftp.software.ibm.com/software/au/data/pdf/Elastic_CachingWP.pdf
3/5/2015
5
6. CONNECT WITH US:
Challenge: The Need For Speed &
Scalability
6
• How scalable is your System?
• When the peak load doubles does response time:
• A) stay the same, B) double C) quadruple D) fail
• Have you tested your system to destruction?
• How high can the load go before failure
• Where are the bottlenecks and single points of failure
• Focus on Extreme Scale
• Benefit - the approach taken for measurement,
monitoring, bottleneck identification will work on many
systems.
7. CONNECT WITH US:
Extreme Scale Basics: Traditional Cache
Operation
App
App
App
App
EIS
A
A
A
A
App
Invalidationchatter
New Server with
cold cache
Redundant copies of data at
different versions
Invalidation load
increases with
cluster size
High load on EIS
Invalidationchatter
• Traditional in JVM cache
• Cache capacity determined by
individual JVM Size
• Invalidation load per server
increases as cluster grows
• Cold start servers hit the
information system even when
data is cached in cluster
• Lower performance as load
increases due to invalidation
chatter
• No redundancy of cached data
8. CONNECT WITH US:
Basics: WXS based Cache Operation
App
App
App
App
EIS
A
B
D
C
C’
D’
A
A
AppA
B’
A’
Cache is
4x larger!
Cache is
5x larger!
• Cluster Coherent cache
•Cache capacity determined by total
cluster size
Size of each cache = M
# JVMs = N
Total Cache = M x N
• No invalidation chatter
• Linearly scalable
• Less load on database and
no cold start spikes
• Predictable performance as load
increases
• Cached data can be stored
redundantly
Cache cluster can be co-
located with the application or
run in it’s own tier.
10. CONNECT WITH US:
Extreme Scale Basics: Architecture
10
Back End Services
Elastic Cache
Application or IIB Tier
Web Server Tier
11. CONNECT WITH US:
Use Case 1: Universal Application State
11
Single replacement for multiple local caches
Consistent response times
Reduces Application Server JVM heap size
Improved memory utilization - more memory
for applications
Faster Application Server start-up
Removes invalidation chatter of local caches
Applications move application state to grid
Stateless applications scale elastically
Application state can be shared across data
centers for high availability
12. CONNECT WITH US:
Use Case 2: Side Cache Pattern
12
Client first checks the grid before using the
data access layer to connect to a back end
data store
If an object is not returned from the grid (a
cache “miss”), the client uses the data access
layer as usual to retrieve the data
The result is put into the grid to enable faster
access the next time
The back end remains the system of record,
and usually only a small amount of the data is
cached in the grid
An object is stored only once in the cache,
even if multiple clients use it
Thus, more memory is available for caching,
more data can be cached, which increases
the cache hit rate
Improve performance and offload
unnecessary workload on backend systems
13. CONNECT WITH US:
Use Case 2: ESB - Side Cache Pattern
13
Easily integrates into the existing
business process
No code changes to the client application or
backend application
Simply add the side cache mediation at the ESB
layer
Significantly reduces the load on the
back-end system by eliminating
redundant requests
Eliminates costly MIPS by eliminating redundant
request
Allows for more “REAL” work to be performed
Improves overall response time
Minimizes the need to scale hardware to
increase processing capacity since the back-
end system no longer has to handle redundant
requests
Response time from elastic cache is
in milliseconds
14. CONNECT WITH US:
Use Case 2: ESB - Side Cache Pattern
14
Product Search
Existing performance without cache
Targeted SLA
Full Load of cache data
Delta processing
Crash preparation
15. CONNECT WITH US:
Performance Tuning - Measurement
15
Several useful tools include:
For MQ performance - the MS0P Support Pac - allows us to
see which Queues had buildup
For IIB WebAdmin - interfaces with the Flow Statistic to
identify nodes in a flow indentifying slow (high elapse time), or
inefficiency from a CPU usage point of view
For DP, am AJAX Web "Dashboard" - helps us to track
Overall CPU, TPS, Memory and specific MPG nodes and their
respective elapse times
Captured CPU time for all servers, the TPS rate (end to end
as well as per product) and by performing multiple runs, using
the IBM Performance harness
16. CONNECT WITH US:
SupportPac MS0P: MQ Explorer plug-ins
A variety of WMQ Explorer plug-ins are provided
Format event messages into human-readable text
Trace route function to determine the path messages take
Export WMQ Explorer display panels to CSV
Remote management of Windows, UNIX and Linux systems
Utilities
Qtune
MQIDecode turns codes into symbolic names
OAM logging utility records PCF commands sent to CMDSVR
16
17. CONNECT WITH US:
SupportPac MS0P: Event Message
Context Menu
After installing SupportPac MS0P, new context menu items appear
in WebSphere MQ Explorer. The "Format Event Message" option
shows up when right-clicking on an event queue.
17
19. CONNECT WITH US:
Details of Extreme Scale running
inside WAS 8.5
Use PMI for monitoring
WebSphere Application Server provides a sophisticated performance
monitoring infrastructure (PMI). It can be used to gather performance
statistics of all components inside an application server. WebSphere eXtreme
Scale takes advantage of this infrastructure to collect performance statistics
relevant to the grid. Metrics are grouped within the following PMI modules:
objectGridModule
Provides metrics about transaction response time.
mapModule
Provides metrics about map and index count and time statistics. These
include hit rate per map, # of entries per map, # of results per index
agentManagerModule
Provides statistics related to map-based agents. These include number of
agent executions, number of partitions involved, time required to run map
and reduce operations, time required to serialize agents, and related
results.
queryModule
Provides statistics about query processing and execution. These include
the amount of time to create a query plan and run the query, the number of
times that a query has been run, and so on.
19
20. CONNECT WITH US:
Infrastructure: Avoid Single Points of
Failure
20
Fast network: 10 gigabit +
Multiple Nics
Nearby Catalogs
Fast Disk Drives: The new slid state
drives (SSDs) are incredible.
Trace route: On multiple occasions
we have found very unexpected
network delays. Measure early and
often
21. CONNECT WITH US:
Erasmus: Prevention is Better than Cure
WXS 8.6 makes it easier to recover from crashes.
Log analysis was done exploring the cause of those crashes
The next few slides show how WAS 8.5 can
prevent some of those crashes
21
22. CONNECT WITH US:
WebSphere Application Server V8.5
Three primary goals:
Provide intelligent management and enhanced resiliency for your
application server environment
Improve operation, security, control
Integration of the application server - providing a fast, flexible, and simplified
application development environment that allows you to deliver rich user experience
faster
22
24. CONNECT WITH US:
Health Management
24
Automatically detect and handle application health problems
Without requiring administrator time, expertise, or intervention
Intelligently handle health issues in a way that will maintain continuous availability
Each health policy consists of a condition, one or more actions, and a target set of processes
Includes health policies for common application problems
Customizable health conditions and health actions
WebSphere Application Server Version 8.5 can monitor servers for common health
problems and take corrective action. There are several health conditions that can be defined
using health policies. When a violation of a health policy is detected, an action plan can be put
into effect automatically – including sending email, capturing diagnostic data, and restarting the
application server. Application server restarts are smart and done in a way to prevent outage and
service policy violations.
25. CONNECT WITH US:
Benefits of Extreme Scale running inside WAS 8.5
Catalog and container servers as managed application
servers
Catalog service domain configuration
Simplified connection to the grid
Simplified grid container management
Simplified administration and management
Use PMI for monitoring
25
26. CONNECT WITH US:
Details of Extreme Scale running inside WAS 8.5
Catalog and container servers as managed application servers
Catalog and container servers are defined as standard application server instances
within a WebSphere Application Server cell. They receive all the health/resiliency
benefits of running inside WAS 8.5
For high availability reasons, use dedicated clusters for catalog and container servers.
Best results can be achieved by having at least three catalog server instances that are
in different physical locations
Also, use a distinct WebSphere cluster to host the grid infrastructure (container
servers) to decouple it from catalog servers and applications
Catalog service domain configuration
Catalog service domains define a group of catalog servers that are associated to a
grid infrastructure. Catalog service domains can be easily defined as administered
resources within a WAS ND environment
Simplified connection to the grid
In a stand-alone environment, connecting to the grid requires knowledge of the host
name and port of the catalog servers. In an integrated environment, this requirement
can be simplified by using a catalog service domain. More specifically, the catalog
service domain can be queried to obtain the list of catalog service endpoints that client
uses to establish a connection to the catalog service
26
27. CONNECT WITH US:
Details of Extreme Scale running inside WAS 8.5
Simplified grid container management
Catalog and container servers are Grid configurations can be packaged
within a Java Platform, Enterprise Edition module (Web or EJB module)
and deployed as a standard enterprise archive (EAR) file. When a Java
Platform, Enterprise Edition application starts, WebSphere eXtreme Scale
checks for grid configuration files (objectGrid.xml and
objectGridDeployment.xml) within the META-INF folder of the EJB and
WEB modules. If only objectGrid.xml is found, the application server is
assumed to be a grid client. If both objectGrid.xml and
objectGridDeployment.xml are present, the application server acts as a
container by implementing the specified deployment policies
Simplified administration and management
Catalog service domains define a group of catalog servers that are
associated to a grid infrastructure. Catalog service domains can be easily
defined as administered resources within a WebSphere Application Server
Network Deployment environment
27
28. CONNECT WITH US:
Monitoring helps find problems
Monitor for operational exceptions and to maintain health of
the system:
Expand the default size of error logs
Check error logs for exceptions
Archive error logs for forensic analysis
Alert on messages in DLQ or error queue
Distinguish between operational vs. Application alerts
28
29. CONNECT WITH US:
FFSTSummary
Why learn to use the tools that summarize FFST and
error files?
“Because that’s where the errors are.”
FFST files contain useful information on serious
problems
http://hursleyonwmq.wordpress.com/2007/05/04/introduction-to-ffsts/
What is FFST?
FFST stands for First Failure Support Technology, and is
“designed to create detailed reports for IBM Service with
information about the current state of a part of a queue
manager together with historical data”
FFSTSummary is a tool that produces one line summaries of
these serious problems, arranged in chronological order
29
30. CONNECT WITH US:
Lesson Learned
Install and Use the best tools for measurement
Identify bottlenecks (flows, queuries)
Use the best available hardware:
Solid State Drives
Fast network: 10 gigabit +
Multiple Nics
Tune one thing at a time
It is a capital crime to theorize in advance of your data
30
31. CONNECT WITH US:
Learn More… Prolifics at InterConnect
31
Monday How Broadcast Music, Inc.
Devised and Enabled Enterprise
Architecture from Corporate
Strategy
12:15 PM - 1:15 PM
Integrating Salesforce.com and
Oracle ERP Using IBM WebSphere
Cast Iron
2:00 PM - 3:00 PM
Recommended Design
Considerations for Enterprise
Monitoring using SCAPM and
Netcool OMNIbus
5:00 PM - 6:00 PM
Tuesday Smarter Integration Using the IBM
SOA Foundation Stack: Best
Practices and Lessons Learned
8:00 AM - 9:00 AM
Best Practices for Monitoring Your
Cloud Environment and
Applications
9:30 AM - 10:30 AM
Applicability of IBM SOA Approach
In Manual Processes Automation
11:30 AM-11:50 AM
Leveraging Governance in the
IBM WebSphere Service Registry
and Repository for IIB and
DataPower
12:30 PM - 1:30 PM
Broadcast Music Inc. Release
Rockstars: Program-Wide DevOps
Success with UrbanCode Deploy
3:30 PM - 4:30 PM
Empowering SmartCloud APM -
Predictive Insights and Analysis: A
Use Case Scenario
5:30 PM - 6:30 PM
Wednesday Architecting and Tuning
IIB/eXtreme Scale for Maximum
Performance and Reliability,
Featuring TBC
8:00 AM - 9:00 AM
MasterCard's Modeling and
Governance of Decisions and
Processes for Improved Fraud
11:00 AM - 12:00 PM
How BMI is Revolutionizing the
Music Business Using IBM’s BPM
and Integration Technology
2:00 PM - 3:00 PM
Integrating IBM Pure Application
Systems and IBM Urbancode
Deploy: A GE Capital Case Study
2 :00 PM – 3:00 PM
Thursday Aetna’s Vision for a Healthier
World: Smarter Architecture and a
Scalable Integration Bus
9:00 AM - 10:00 AM
Meet the Expert - Delivering
Enterprise Applications: Faster.
Cheaper. Better
Thursday 12:00 PM – 12:50 PM
Using the Power of IBM Tivoli
Common Reporting to Make Smart
Decisions: The Untold Story
2:30 PM - 3:30 PM
32. CONNECT WITH US:
Let’s Continue the
Conversation….
A.J. Aronoff
aj@prolifics.com
Visit these useful links on the Prolifics website:
Case Studies http://www.prolifics.com/resources/case-studies
Webcasts http://www.prolifics.com/resources/webcasts
Videos http://www.prolifics.com/resources/videos
Solution Briefs http://www.prolifics.com/resources/solution-briefs
Blog http://www.prolifics.com/blog
Twitter http://www.twitter.com/prolifics
Facebook http://www.facebook.com/ProlificsTech
Prolifics TV http://www.youtube.com/prolificstv
33. CONNECT WITH US:
Thank You
Your Feedback is
Important!
Access the InterConnect 2015
Conference CONNECT Attendee Portal
to complete your session surveys from
your smartphone, laptop or conference
kiosk.