Global Mirror Whitepaper
Global Mirror Whitepaper
Global Mirror Whitepaper
Page 1 of 30
Contents
1. 2. 3.
3.1 3.2 3.3 3.4 3.5 3.6 3.7
4.
4.1 4.2 4.3 4.4
5.
5.1 5.2
6. 7.
7.1 7.2
8.
8.1 8.2 8.3 8.4 8.5
9.
9.1 9.2 9.3
Version: V2
Page 2 of 30
9.4 9.5
Performance at distance ...............................................................................................................26 Secondary disk subsystem performance......................................................................................27 Placement of devices on secondary disk subsystem ............................................................28
9.5.1
10.
10.1 10.2
References ........................................................................................................... 30
Redbooks, Manuals and Whitepapers ......................................................................................30 Websites....................................................................................................................................30
Table of Figures
Figure 1 Global Mirror device topology ........................................................................................................ 6 Figure 2 Dependant write consistency ......................................................................................................... 7 Figure 3 Communication links between primary disk subsystems............................................................... 8 Figure 4 Global Mirror setup process ........................................................................................................... 8 Figure 5 z/OS Global Mirror architecture...................................................................................................... 9 Figure 6 Consistency group creation process ............................................................................................ 11 Figure 7 Collisions in Global Mirror ............................................................................................................ 13 Figure 8 Parts of consistency group formation process where recovery action is required....................... 14 Figure 9 Global Mirror recovery process .................................................................................................... 15 Figure 10 Process to take an additional copy for testing ........................................................................... 15 Figure 11 Inband and out of band management of Global Mirror .............................................................. 17 Figure 12 TPC for Repli cation environnent .............................................................................................. 18 Figure 14 GDPS/GM environment ............................................................................................................. 19 Figure 14 Power HA for i environment ....................................................................................................... 19 Figure 15 Asymmetrical Global Mirror environment................................................................................... 20 Figure 16 Return to primary site with asymmetrical Global Mirror environment ........................................ 20 Figure 17 Symmetrical Global Mirror configuration.................................................................................... 21 Figure 18 Running in secondary location with symmetrical configuration ................................................. 21 Figure 19 Testing without an additional copy of data................................................................................. 21 Figure 20 Testing with Global Mirror .......................................................................................................... 22 Figure 21 Cascading configuration with Global Mirror ............................................................................... 22 Figure 22 Production workload profile and Global Mirror bandwidth options............................................. 24 Figure 23 Percentage of production writes sent by Global Mirror .............................................................. 25 Figure 24 RPO of Global Mirror at very long distances.............................................................................. 26 Figure 25 Throughput of Global Mirror at very long distances ................................................................... 27 Figure 26 Copy on write processing........................................................................................................... 27 Figure 27 Spreading B and C volumes over different ranks ...................................................................... 28 Figure 28 Options for placement of D volumes .......................................................................................... 28 Figure 27 Volume layout with two drive sizes on secondary disk subsystem............................................ 29
Page 3 of 30
Page 4 of 30
2. Design objectives
Global Mirror (Asynchronous PPRC) is a two-site unlimited distance data replication solution for both System z and Open Systems data. IBM's Global Mirror Solution addresses customer requirements for a long distance (typically over 300km) storage based data replication solution for both Open systems and System z data that scales and provides cross volume/cross storage subsystem data integrity/data consistency. Before examining in detail the architecture and operation of Global Mirror, it is helpful to first review the design objectives that guided the development of this solution. The section below provides some background on how the design objectives influenced the development of Global Mirror. Achieve an RPO of 3-5 seconds with sufficient bandwidth and resources Many customers have implemented z/OS Global Mirror (previously known as XRC) as an asynchronous replication solution and have become accustomed to the RPO that this best of breed solution is available to provide. Global Mirror has been designed to be able to provide a similar RPO of 3-5 seconds when sufficient bandwidth and resources are available. Do not impact production applications when insufficient bandwidth and/or resources are available Previous asynchronous replication solutions have used a cache sidefile to store updates before transmission to the remote site. As a result, they have included pacing mechanisms to slow down production write activity and allow the mirroring to continue if the replication solution falls behind. Global Mirror has been designed not to use a sidefile and so requires no pacing mechanism. Be scalable and provide consistency across multiple primary and secondary disk subsystems A replication solution should not be limited to a single primary or secondary disk subsystem as this may result in an existing solution becoming non-viable if the storage and throughput requirements outgrow the capabilities of a single disk subsystem. Allowing a single consistency group to span multiple disk subsystems also allows different cost storage to exist within the same Global Mirror environment. (For example a Global Mirror session might span an ESS, DS6800 and DS8300.) Allow for removal of duplicate writes within a consistency group before sending to remote site Bandwidth is one of the most expensive components of an asynchronous replication solution and minimising the usage of bandwidth can provide significant cost savings for a replication solution. If multiple updates are made to the same piece of data within a consistency group then only the latest update need be sent. Depending on the access patterns of the production workload, significant savings might be seen in the bandwidth required for the solution. Allow for less than peak bandwidth to be configured accepting a higher RPO at peak times Many customers have significant peaks in the write activities of their workloads, which may be 2-3 times higher than the average write throughput. These peaks are often at times where maintenance or batch activities are taking place and so there may not be sufficient justification to provide the bandwidth to maintain a very small RPO at these times. However, the activities that take place are likely to be time critical and so the production workload should not be impacted if sufficient bandwidth is not available Provide consistency between different platforms especially between System z and open systems With the increase in applications spanning multiple servers and platforms there is a requirement to be able to provide a consistent asynchronous replication solution that can handle workloads from multiple servers and specifically for both CKD (System z) and FB (open systems) data. Global Mirror can be used on any devices that are defined on the disk subsystem including System z, System i and UNIX/Windows workloads.
Page 5 of 30
3. Overall architecture
For any asynchronous replication function, we can consider three major functions that are required in order to provide consistent mirroring of data. These are: 1. Creation of a consistent point across the replicated environment 2. Transmission of required updates to the secondary location 3. Saving of consistent data to ensure a consistent image of the data is always available. With Global Mirror, the functions are provided as shown in Figure 1.
Consistency group Co-ordination and formation Data transmission
1 0 0 0 0 1 0 1 0 0 0
Global Copy
B C
Consistency group save
0 1 0
Figure 1 Global Mirror device topology The primary disk subsystems provide functionality to co-ordinate the formation of consistency groups across all involved devices which are said to be in a Global Mirror session. Fibre channel links provide low latency connections between multiple disk subsystems ensuring that this process involves negligible impact to the production applications. The consistency group information is held in bitmaps rather than requiring the updates to be maintained in cache. These consistency groups are sent to the secondary location using Global Copy (previously known as PPRC-XD). Using Global Copy means that duplicate updates within the consistency group are not sent and that if the data sent is still in the cache on the primary disk subsystems that only the changed blocks are sent. Once the consistency group has been sent to the secondary location this consistent image of the primary data is saved using FlashCopy. This ensures that there is always a consistent image of the primary data at the secondary location.
Page 6 of 30
1) Log update
P
If update 2) is not mirrored then update 3) must also not be mirrored
2) Database update
Figure 2 Dependant write consistency Using the data freeze concept, consistency is obtained by temporarily inhibiting write IO to the devices in an environment and then performing the actions required to create consistency. Once all devices have performed the required actions the write IO is allowed to resume. This might be suspending devices in a Metro Mirror environment or performing a FlashCopy when using consistent FlashCopy. With Global Mirror, this action is the creation of the bitmaps for the consistency group and we are able to create the consistent point in approximately 1-3ms; Section 4.3 discusses the impact of consistency group formation on average IO response times. Other solutions using a data freeze are not as highly optimised as Global Mirror. Therefore, they might take longer to perform the consistency creation process. For example creating a consistent FlashCopy might take a number of seconds depending on the size of the environment.
Page 7 of 30
Subordinate
LSS LSS
Master to Subordinate communication links 1 path between a Master LSS and one LSS in the Subordinate is required (although 2 paths are recommended). Note: A set of paths is not required for every LSS participating in the session.
LSS LSS
Subordinate
Data transmission and FlashCopy commands are performed using PPRC links from the subordinate to the secondary disk subsystem
LSS
Master
LSS LSS LSS
Figure 3 Communication links between primary disk subsystems With Global Mirror, a controlling function, known as the master, runs on one of the primary disk subsystems. This process coordinates the consistency group formation process over all the subordinate disk subsystems using the PPRC links. The master function can be started on any primary disk subsystem so long as the required connectivity is available.
LH
2.Establish Global Copy Pairs 4. Establish PPRC control paths between Master to Subordinate
RH RJ
LH
RH RJ
Master
LH
6.Start Asynchronous PPRC with Start command to master
Page 8 of 30
J
Write IO
A
Figure 5 z/OS Global Mirror architecture System z only scope for z/OS Global Mirror
z/OS Global Mirror is only able to manage CKD devices and can only provide consistency outside of a single storage control session where the production operating system is timestamping the application writes. Global Mirror is able to handle both CKD and FB devices as it uses the data freeze concept to provide consistency and so does not require a common time source for all servers. Use of cache by z/OS Global Mirror
Date: 15/09/2008 Version: V2
Page 9 of 30
z/OS Global Mirror like many other asynchronous replication solutions temporarily holds the production workload updates in the cache on the primary disk subsystem. If the environment is configured with sufficient bandwidth and resources this is not an issue, but as cache is a finite resource, we have two options when the cache fills up. We can either suspend the mirroring or we can pace the production systems in order to prevent further increase in cache usage. Global Mirror does not use a cache sidefile and so there is no requirement to pace production or suspend the mirror when behind. Predictability of RPO vs. impact to production As z/OS Global Mirror is often configured to pace the production writes it is possible to know both an average RPO (referred to as delay by z/OS Global Mirror) under non-stress conditions and what a reasonable maximum RPO might be. However, it is possible that the mirror will either suspend or production workloads will be impacted if the capability of the replication environment is exceeded due to unexpected peaks in the workload or a under configured environment. With Global Mirror, as we do not pace the production writes, it is possible that the RPO will increase significantly if the production workload exceeds the resources available for Global Mirror. However, we will not have to perform a suspension and subsequent resynchronisation of the mirror and we will not inject any delays into the production IO activity. Requirement for additional copy during resynchronisation When z/OS Global Mirror suspends the mirror due to a network outage or other cause in order to preserve a consistent image in the secondary site we need to take a point in time copy of the secondary devices before resynchronisation. With Global Mirror, as the architecture already includes two copies of the data on the secondary site, the C devices will contain a useable image of the production systems even during resynchronisation. Removal of duplicate updates before transmission With z/OS Global Mirror, the formation of consistency groups is performed by a z/OS software component (the SDM) and so when the data is read from the primary disk subsystem we have to read every write. With Global Mirror, as we have the consistency group formation on the primary disk subsystem, we are able to remove duplicate updates within a consistency group before they are sent to the secondary disk subsystem. Note: z/OS Global Mirror will also remove duplicate updates before writing to the secondary disk subsystems. However this does not result in bandwidth savings as for a disaster recovery solution the SDM is located in the secondary location. Mixed vendor implementations z/OS Global Mirror is also supported on disk subsystems from other vendors who have licensed and implemented the interfaces from IBM and it is possible to run with a heterogeneous environment with multiple vendors disks. Currently Global Mirror is only available on IBM disk subsystems and so a heterogeneous environment is not possible. It is possible that other vendors will choose to license the Global Mirror architecture as they have done with Metro Mirror and z/OS Global Mirror. Target z/OS Global Mirror volumes can also be from any vendor, even if the target subsystem does not support z/OS Global Mirror, thus enabling investment protection. However, in order to provide for failback, customers will likely prefer that the remote systems also support z/OS Global Mirror.
Page 10 of 30
Drain consistency group and send to remote disk subsystem using Global Copy. Application writes for next consistency group are recorded in change recording bitmap Maximum drain time eg 30s
If the maximum drain time is exceeded then Global Mirror will determine how long it will take before another consistency group could be expected to be formed and will transition to Global Copy mode until it is possible to form consistency groups again. The production performance will be protected but the RPO will be allowed to increase. In this way, Global Mirror maximised the efficiency of the data transmission process by allowing duplicate updates to be sent and allowing all primary disk subsystems to send data independently. Global Mirror will check on a regular basis whether it is possible to start forming consistency groups and will do so as soon as it calculates that this is possible.
Page 11 of 30
Page 12 of 30
4.4 Collisions
Global Mirror does not use a cache sidefile in order to avoid the issues with cache filling up that are seen with other asynchronous replication solutions. However, this does have the implication that if updates are made to tracks in a previous consistency group that have not yet been sent then this previous image needs to be protected. We call this situation a collision. In order to do this we will delay the completion of the write until the previous track image has been sent to the secondary disk subsystem. This is done immediately and with the highest priority.
Data in consistency group sent using Global Copy Global Copy
1 0 0
1 0 0 0
0 0 1 0
0 1 0
B C
First update to a track that has not been sent causes the track to be sent synchronously to Secondary. Subsequent updates are not affected
Figure 7 Collisions in Global Mirror For most intensive write workloads such as log volumes updates are performed in a sequential fashion and the same piece of data is not updated on a regular basis. However even for workloads such as an IMS WADS dataset, the collision for a particular track will only occur once for each consistency group and so the synchronous overhead is only seen once per consistency group. Considerable analysis was performed on the potential impact of collisions for the IMS WADS dataset and it was determined that a very small percentage of the writes would experience collisions and so, similar to the impact of consistency group formation, this effect would be very small.
Page 13 of 30
5. Recovery process
This section aims to give a high-level overview of the recovery process for a Global Mirror environment. More details are available in the various manuals referred to in the References section.
Action Required
Sequence numbers equal and mix of revertible and non revertible volumes
Figure 8 Parts of consistency group formation process where recovery action is required If the Global Mirror environment was part way through the FlashCopy process when the failure occurred then this is similar to an in-flight transaction in a database environment. If the commit process has not yet started then we must revert the FlashCopy to back out the consistency group and if the commit process has started, we must complete this process to commit the consistency group. 1) Commit/revert FlashCopy relationships if required. The second stage is to recover the environment and enable production systems to be restarted on the B devices and prepare for a potential return to the primary site. This is performed with the following process: 2) Failover the B devices. This will place the B devices in a primary suspended state and will allow for a resynchronisation of the Global Copy relationship to be performed in order to return to the primary site assuming the primary disk subsystem has survived. 3) Fast Reverse Restore the FlashCopy relationship with the C devices. This will restore the latest consistency group to the B devices and will start a background copy for those tracks that have been modified since the latest consistency group. 4) FlashCopy from the B devices to the C devices to save an image of the last consistency group. This step is optional but preserves an image of the production devices at the recovery point in case this might be required. 5) Restart Production systems. Figure 9 shows the Global Mirror recovery process with a green colour indicating the location of the restartable production image.
Page 14 of 30
Global Mirror
1 0 0 0 0 1 0
1 0 0 0 0 1 0
B C
1 0 0 0 0 1 0
2) Revert/commit FlashCopy if required 3) Fast Reverse Restore from journal to host devices
B C
1 0 0 0 0 1 0
B C
1 0 0 0 0 1 0
B C
1) Pause Global Mirror and suspend Global Copy (check revert/commit status)
1 0 0 0 0 1 0
2) Failover to B devices
1 0 0 0 0 1 0
B D C
1 0 0 0 0 1 0
B D C
1 0 0 0 0 1 0
B D C
1 0 0 0 0 1 0
B D C B C
1 0 0 0 0 1 0
Figure 10 Process to take an additional copy for testing This is similar to the procedure for recovering the Global Mirror environment with the addition of steps to pause and resume the mirroring plus the creation of the additional test copy.
Page 15 of 30
6. Autonomic behaviour
Global Mirror is designed to be able to handle certain conditions such as a loss of connectivity automatically without requiring user intervention. PPRC paths If PPRC paths are removed unexpectedly for some reason then the disk subsystem will automatically reestablish these paths when the error situation is resolved. In a z/OS environment, the following messages will be seen on the console indicating that these events have occurred. Similar SNMP alerts are produced for open systems environments. IEA498I 4602,PRSA53,PPRC-PATH ALL PPRC PATHS REMOVED UNEXPECTEDLY SSID=4600 (PRI)=0175-38711,CCA=02 IEA498I 4100,IMNA76,PPRC-PATH ONE OR MORE PPRC PATHS RESTORED SSID=4100 (PRI)=0175-38711,CCA=00 PPRC pairs If Global Copy pairs suspend for some reason other than a user command then the disk subsystem will automatically attempt to restart the mirroring for these pairs. As the consistent set of disks is the FlashCopy secondary devices, this will not compromise the integrity at the secondary site. This is different from Metro Mirror where the resynchronisation is always via command as the Metro Mirror secondaries are the consistent devices. *IEA491E 4003,PRSA60,PPRC SUSPENDED,COMMUNICATION_TO_SECONDARY_FAILURE, (PRI)=0175-38711,CCA=03(SEC)=0100-63271,CCA=03 IEA494I 4003,PRSA60,PPRC PAIR SUSPENDED,SSID=4000,CCA=03 IEA494I 4601,IMNA02,PPRC PAIR PENDING,SSID=4600,CCA=01 It is possible to disable this behaviour if desired. Global Mirror session Once the Global Copy pairs are restarted and have resynchronised then Global Mirror will resume the formation of consistency groups unless in doing so it might result in inconsistent data on the secondary disk subsystem. One example of such a condition is where a communications failure occurs half way through a FlashCopy event. In this case we have to perform a revert/commit action as described in section 4 before restarting Global Mirror. In this case the Global Mirror session will have entered what is called a Fatal state.
Page 16 of 30
7. Management tools
There are a number of options for management tools when using Global Mirror. System z provides inband management capabilities while in an Open Systems environment out of band management using TCP/IP is used.
TCP/IP TCP/IP
Figure 11 Inband and out of band management of Global Mirror Command interfaces for Global Mirror are available in both System z and Open Systems environments to allow a customer to develop their own automation solution as well as providing problem diagnosis capabilities if a solution offering is used to manage the environment. IBMs disaster recovery and replication management solution offerings also have been extended to provide support for Global Mirror. In most cases if these solutions fit the requirements for the environment being used these would be the recommended option as this provides a supported management solution without requiring code to be written specifically for each environment.
Page 17 of 30
The DS CLI requires IP connectivity to the HMC or copy services server for the disk subsystems it is managing both in the primary site and to the recovery site for the actions required for recovery or testing.
TPC-R active
TCP/IP TCP/IP
TPC-R standby
H1
1 0 0 0 0 1 0
H2 J2
Global Mirror
F2
Figure 12 TPC for Repli cation environnent TPC-R supports a number of different configurations for Global Mirror including the creation of an additional testing copy for disaster recovery testing and the ability to return back to the production site after it has become available again.
Page 18 of 30
As well as managing the operational aspects of Global Mirror GDPS/GM also provides facilities to restart System z production systems in the recovery site. By providing scripting facilities it provides a complete solution for the restart of a System z environment in a disaster situation without requiring expert manual intervention to manage the recovery process.
GDPS K-sys
TCP/IP TCP/IP
GDPS R-sys
Global Mirror
1 0 0 0 0 1 0
B C
FC1
Figure 13 GDPS/GM environment GDPS provides the capability to use an additional set of devices on the remote site for testing purposes. GDPS supports both System z and Open Systems devices in a Global Mirror environment. However, GDPS requires that the disk subsystems be shared between the System z and open systems environments, as it requires CKD device addresses in order to issue the commands to manage the environment.
HASM
S1
SYSBASE
SYSBASE
S2
1 0 0 0 0 1 0
iASP
Global Mirror
iASP
Figure 14 Power HA for i environment For earlier versions of i5/OS the Copy Services Toolkit provides support for Global Mirror. This also supports the use of Global Mirror to replicate a full system environment if not using independent ASPs.
Page 19 of 30
8. Solution scenarios
This section discusses a number of scenarios and topologies for implementation of Global Mirror.
Global Mirror
1 0 0 0 0 1 0
B C
Once production workloads are moved to the recovery site then Global Copy must be used to return to the primary site. As no disaster recovery capability would be provided in the reverse direction it is unlikely that in this type of configuration we would choose to run for extended periods of time in the secondary location unless forced to by unavailability of the primary site.
Global Copy
1 0 0 0 0 1 0
B C
Figure 16 Return to primary site with asymmetrical Global Mirror environment As Global Mirror uses two copies of data in the secondary location there would be twice as many physical drives in this location as in the production location if the same size drives were used. In some situations, it may be cost effective to use larger drives in the secondary location. Spreading the production data over all these drives should provide equivalent performance in a disaster situation while reducing the overall cost of the solution.
Page 20 of 30
Global Mirror
1 0 0 0 0 1 0
B C
F
Figure 17 Symmetrical Global Mirror configuration
As we have FlashCopy capacity in both sites, it is possible to provide a disaster recovery solution using Global Mirror in both directions between the two sites. This type of configuration would typically be used where the production workloads might run for extended periods of time in either location.
Global Mirror
A F
1 0 0 0 0 1 0
B C
Image for DR
1 0 0 0 0 1 0
1 0 0 0 0 1 0
B C
Page 21 of 30
In order to perform testing while the Global Mirror environment is running we need to provide an additional copy of data in the recovery location. Global Mirror is briefly suspended in order to take this copy and then can be restarted while the testing takes place on this extra copy.
Global Mirror
1 0 0 0 0 1 0
B D C
A D
Metro Mirror
irror bal M Glo
Page 22 of 30
Page 23 of 30
9. Performance considerations
This section contains information on the performance aspects of Global Mirror.
14
12
10
Version: V2
00 :0 0 01 :0 0 02 :0 0 03 :0 0 04 :0 0 05 :0 0 06 :0 0 07 :0 0 08 :0 0 09 :0 0 10 :0 0 11 :0 0 12 :0 0 13 :0 0 14 :0 0 15 :0 0 16 :0 0 17 :0 0 18 :0 0 19 :0 0 20 :0 0 21 :0 0 22 :0 0 23 :0 0
Time
Date: 15/09/2008
Page 24 of 30
With Global Mirror, the production workloads will continue without significant impact during the peak period and the Global Mirror solution will transition automatically to Global Copy mode when a consistency group is not transmitted to the secondary site within the maximum drain time. When the peak activity has passed and Global Copy is once more able to drain the changed data in a timely manner the disk subsystems will transition automatically back to Global Mirror and resume the formation of consistency groups. At all points, a consistent restartable image of the production data will be available in the recovery location.
Figure 23 Percentage of production writes sent by Global Mirror The graph in figure 23 shows the MB/s sent by Global Mirror compared to the production write activity for a particular customer environment over a period of a day. It also shows the percentage of the production writes that were sent by Global Mirror. The data is sorted by host activity from high to low. In this case, the environment is not bandwidth constrained and the RPO is generally of the order of a few seconds. Even in this case we can see that a reasonable percentage of the production writes do not need to be transmitted to the secondary location. Another factor we see is that for this workload when the activity is higher the percentage savings are actually lower as for this workload we have a higher proportion of sequential activity at these times. Sequential updates will not generally be duplicated and so the data that is written to the disk subsystem will have to be sent by Global Mirror.
Page 25 of 30
requirement to suspend the mirror (requiring manual intervention to resolve) or impact production if the resources available are not sufficient. In normal operation, the data sent by Global Mirror is contained in the read cache of the primary disk subsystem as it is sent before being prioritised out of cache by the cache management algorithms. If Global Mirror falls behind in its transmission of data to the secondary disk subsystem then the data that requires to be sent may have to be read from the RAID ranks on the primary disk subsystem. This activity is performed at a lower priority than production IO and so production performance will be protected at the expense of the speed of data transmission to the secondary disk subsystem. As well as improving the performance of the production workloads, having a balanced production workload across the available primary disk subsystem resources will improve the capability of Global Mirror to catch up when it falls behind and reduce the increase of the RPO.
12
10
0 0 25 50 75 100 125 Latency (ms) Pre R2.4 R2.4 150 175 200 225 250
Figure 24 RPO of Global Mirror at very long distances In order to send large amounts of data at long distances a significant degree of parallelism is required to ensure that the bandwidth can be filled. If there is not enough parallelism then the throughput will be reduced as more and more time is spend waiting for acknowledgements indicating that data has been received at the remote location. However at shorter distances the same degree of parallelism may be counter-productive.
Page 26 of 30
Global Mirror provides an extreme distance RPQ for environments where the distance/latency between the local and remote sites is over 5000km. The affect of distance is felt more for smaller updates as the bandwidth is less likely to be the bottleneck than for larger updates. Figure 25 shows the results of some extreme distance testing for 4KB random writes both with the default Global Mirror settings and with the changes provided by the extreme distance RPQ.
14000
10000
8000
6000
4000
2000
0 0 50 100 150 Latency (ms) - 1ms = 100km Defaults RPQ 200 250 300
Figure 25 Throughput of Global Mirror at very long distances Note : Global Mirror will send multiple updates to the same track in a single operation so for write streams such as a database log the number of writes from the server will be significantly higher than the number of individual updates that must be sent by Global Mirror. On open systems 16 4KB writes from a database log could be sent in a single operation.
1 4 3
Page 27 of 30
In normal operation when forming consistency groups every 3-5 seconds then the copy on write processing does not occur as we do not need to perform the destage before the next consistency group occurs. However when Global Mirror falls behind or is catching up there may be additional activity on the secondary RAID ranks. It is possible that in some environments, an unbalanced workload may result in bottlenecks being seen on the RAID ranks of the secondary disk subsystems. Features such as Arrays Across Loops and the increased RAID rank bandwidth of switched fibre channel disks the chance of this occurring is much reduced. However there are a number of optimisations that can be made to improve the performance of the Copy on Write processing by following some simple rules for placement of devices on the secondary disk subsystem.
If we have an additional D copy for testing purposes then we might either choose to place the D volumes on the same ranks as the B and C volumes. If we intended to perform a lot of heavy activity such as backup processing or stress testing then we might dedicate a rank to the D volumes in order to protect the Global Mirror environment from this workload.
Figure 28 Options for placement of D volumes A customer might also choose to implement smaller faster drives for the B volumes but larger drives for the C and D volumes in order to take advantage of the cost efficiencies in doing so. This does result in the ranks containing the B volumes having a higher workload that the larger ranks and so might not be
Page 28 of 30
recommended for optimal performance. Using a single drive size and spreading all the devices over all drives is generally a better approach. A better configuration might be to spread the B and C volumes over both sets of drives for optimal performance of both Global Mirror and the production workloads. The D volumes would then utilise the additional space on the larger drives.
Figure 29 Volume layout with two drive sizes on secondary disk subsystem There have also been further optimisations to FlashCopy that enhance the Copy on Write processing when the FlashCopy source and target are managed by the same storage server processor complex (cluster) on the disk subsystem. This means that the recommendation is to have a particular FlashCopy source and target both on either odd or even LSS.
Page 29 of 30
10. References
This section contains references to other publications and websites containing further information on Global Mirror.
10.2 Websites
IBM Storage Business Continuity website http://www-03.ibm.com/systems/storage/solutions/business_continuity/index.html GDPS website http://www-03.ibm.com/servers/eserver/System z/resiliency/gdps.html TPC for Replication website http://www-03.ibm.com/systems/storage/software/center/replication/index.html IBM PowerHA website http://www-03.ibm.com/systems/power/software/availability/index.html System i Copy services toolkit website http://www-03.ibm.com/systems/i/support/itc/copyservicessystemi.html
Page 30 of 30