ReplicationServerNotes PDF
ReplicationServerNotes PDF
ReplicationServerNotes PDF
Table of Contents
Document Revision 1.5 .............................................................................................................................. 4
Introduction & Disclaimer .......................................................................................................................... 4
Repserver Components ............................................................................................................................... 4
More Detailed Look at the Components .................................................................................................. 4
Examine replication environment ............................................................................................................ 4
Repserver BASICS ..................................................................................................................................... 5
General Install ........................................................................................................................................ 5
Table Defs Install.................................................................................................................................... 5
Warm Standby Install ............................................................................................................................. 6
Warm Standby Switch over .................................................................................................................... 7
Database (MSA) repdef .......................................................................................................................... 7
Manually set up connections ................................................................................................................... 8
setup primary db's for rep........................................................................................................................ 8
Function Repdefs (stored procedure replication) ..................................................................................... 9
Replication Tuning Notes ......................................................................................................................... 10
Golden rules ......................................................................................................................................... 10
Find Bottlenecks ................................................................................................................................... 10
Configure the rep agent to trace LTL--write output to a trace file (not to ASE log) Error! Bookmark not
defined.
Turn on Rep Agent tracing and DSI/function string tracing ................. Error! Bookmark not defined.
Turn off Rep Agent tracing and DSI/function string tracing ................ Error! Bookmark not defined.
Tuning .................................................................................................................................................. 12
Tuning RSSD.................................................................................................................................... 12
Tuning Replicate DB ........................................................................................................................ 13
Tuning DSI ....................................................................................................................................... 13
Monitor Counters.................................................................................................................................. 13
Not requiring Setup........................................................................................................................... 13
Requiring Setup ................................................................................................................................ 13
Disaster Recovery Notes ........................................................................................................................... 14
Recover from reloading Primary Database ............................................................................................ 14
Skipping transactions ............................................................................................................................ 14
Stop Replication ................................................................................................................................... 14
Replaying Transaction Logs ................................................................................................................. 14
Rebuild a Stable Device - with tran log ................................................................................................. 15
Rebuild a Stable Device - without tran log ............................................................................................ 15
Restore the RSSD from backup ............................................................................................................. 16
General Troubleshooting .......................................................................................................................... 16
Stable Queue Full ................................................................................................................................. 16
Ignoring duplicate keys when we have a lot, use error class! .............................................................. 17
Reverse Engineering an Error Class ...................................................................................................... 17
HowTo determine the error class configured for a connection ............................................................... 18
Displays all Replication Server configuration parameters. ..................................................................... 19
Determine Latency ............................................................................................................................... 19
Dropping Subscriptions Fast ................................................................................................................. 19
Detecting loss ....................................................................................................................................... 19
Repserver Trace Flags .......................................................................................................................... 20
Configure the rep agent to trace LTL--write output to a trace file (not to ASE log) ............................... 20
Turn on Rep Agent tracing and DSI/function string tracing ............................................................... 20
Turn off Rep Agent tracing and DSI/function string tracing .............................................................. 20
Appendix A Shell scripts ....................................................................................................................... 22
rs_checkreplag.ksh ............................................................................................................................... 22
www.ddsafe.co.uk 2
Version 1.5 Page 3 03/05/2013
sp__queueinfo ...................................................................................................................................... 23
Appendix B troubleshooting .................................................................................................................. 23
Uninstall repserver program .................................................................................................................. 23
Logical Connection will not Drop ......................................................................................................... 23
www.ddsafe.co.uk 3
Version 1.5 Page 4 03/05/2013
Repserver Components
====================
SQM (Stable Queue Manager) to manage inserts/deletes and prevent duplicates. One per
Queue
Inbound queue. Holds transactions from LTM. 'admin who, sqm' shows these, e.g. 456:1. the
':1' means inbound
Outbound queue. Holds trans. to be replicated 'admin who, sqm' shows these, e.g. 457:0.
the ':0' means outbound. Has 2 types of queue. Data Server Interface (DSI) and
Replication Server Interface (RSI), used across routes.
SQT (Stable Queue Transaction Manager) ensures queues are accessed in transactional manner. SQT has 4
queues:-
* Open queue that holds transactions until commit or rollback is read from LTM
* Closed queue holds completed transactions.
* Read queue holds data that has been read from the Closed queue and
a receipt of the transaction received. Tran is then removed from
queue.
* Truncation queue holds begin tran record. Queue is used to determine which
transactions can be deleted.
www.ddsafe.co.uk 4
Version 1.5 Page 5 03/05/2013
rs_helpdb -- in RSSD
rs_helproute -- in RSSD
rs_helpsub -- in RSSD, details table subscriptions
rs_helpdbsub -- in RSSD
rs_helppubsub -- in RSSD, if using publications
rs_helpdbpub -- in RSSD, details publication subscriptions, articles and subscibers
rs_helpuser -- in RSSD
Example:-
admin config, "connection", <servername>, <dbname>, dsi_quoted_identifier
#-----------------------------------#
Repserver BASICS
#-----------------------------------#
If you use rs_init to configure replication and it fails, you can sometimes get more
information out of the rs_init log files. These are located at
$SYBASE/$SYBASE_REP/init/logs
General Install
Use rs_init to install repserver & set up the RSSD. Create stable queue files first
(using touch). Once this is complete, you need to add connections to the primary and
replicate dataservers and databases. See the sections below on how to do this. To use
the GUI (rs_init), create a rep maint user in the DB using sp_adduser. Remove this later
and add as alias to dbo using sp_addalias.
In RS
=====
1> create replication definition prim_tab1_repdef with primary at SRV01_ASE.pdb1
2> with all tables named prim_tab1 (a int, b char(10)) primary key (a)
3> go
Alter repdef
============
** This also fixes the subscription automatically
alter replication definition prim_tab1_repdef
add c char(10) null
Testing
=======
declare @cnt int
declare @b_val char(10)
declare @c_val char(10)
select @cnt=2
while @cnt<10
BEGIN
select @b_val='test' + convert(char(5), @cnt)
select @c_val='test' + convert(char(5), @cnt+@cnt)
INSERT INTO pdb1..prim_tab1(a,b,c) values (@cnt, @b_val, @c_val)
select @cnt=@cnt+1
END
www.ddsafe.co.uk 5
Version 1.5 Page 6 03/05/2013
Logins
------
sp_addlogin warmsby_maint, thisisapassword
go
sp_role 'grant', replication_role, warmsby_maint
go
USE warmsby
go
sp_addalias 'warmsby_maint','dbo'
go
Sync syslogins (make sure that warmsby_maint is on both ASE servers)
* BCP OUT/IN syslogins between servers
www.ddsafe.co.uk 6
Version 1.5 Page 7 03/05/2013
sp_setreplicate rs_marker,"true"
go
sp_setreplicate rs_update_lastcommit,"true"
go
Dump'n'Load databases
---------------------
Immediatly dump and load the database from Active to Standby database.
Make sure the "warmsby_maint" has SELECT, DELETE, etc permissions are set on Standby
database
or
use warmsby_copy
go
sp_dropuser 'warmsby_maint'
go
sp_addalias 'warmsby_maint', 'dbo'
go
In RS
-----
resume connection to SRV2.warmsby_copy
go
In RS
-----
isql Uuser [-Syourrepserver]
-To switch over to warm standby server..
admin logical_status
go
--switch active for <logialserver.logicaldb> to <wsserver.wsdb>
switch active for logical_srv.logical_db to SRV2.warmsby_copy
go
admin logical_status
go
In RS
-----
resume connection to SRV2.warmsby_copy
go
Note: if the old primary database has been shutdown or is no longer contactable, the
logical status for it will remain as Suspended/Waiting for Enable Marker until it is
fixed. Once the server comes backon line, resume the connection and Operation in
Progress will go back to None
In PDB
=======
sp_reptostandby $DBNAME,"all"
sp_config_rep_agent pdb1, 'send warm standby xacts', 'true'
In RS
=====
1> create database replication definition pdb1_dbrepdef
2> with primary at SRV1_ASE.pdb1
3> replicate ddl
4> replicate functions
www.ddsafe.co.uk 7
Version 1.5 Page 8 03/05/2013
In RDB
======
To avoid any permission issues in replicate DB
Use test_rep_db
go
sp_addalias 'test_rep_db_maint','dbo'
go
At this point the live database should be dumped and loaded into replicate database.
When the dumps have completed, resume the connection to the standby sites.
In PRS
======
resume connection to SRV1_ASE.test_rep_db
go
www.ddsafe.co.uk 8
Version 1.5 Page 9 03/05/2013
--If your maint user is not dbo in the replicate db, then execute this in the RDB
grant execute on sp__testtable1_insert to maint_user
RS
==
Applied function = sp is executed by maint user
Request function = sp is executed by same user who executed SP at the primary database
create function replication definition deprecated in repserver 15, use applied or
requested instead.
-- create subscription
create subscription sp__testtable1_insert_sub
for sp__testtable1_insert_repdef
with replicate at SRV2_ASE.test_rep_db
without materialization
go
check subscription sp__testtable1_insert_sub
for sp__testtable1_insert_repdef
with replicate at SRV2_ASE.test_rep_db
go
www.ddsafe.co.uk 9
Version 1.5 Page 10 03/05/2013
-- TESTING in PDB--
sp__testtable1_insert 'gary', 1234
go
Golden rules
==================
1. Never have repdefs, which are not subscribed to. All transactions on replicated
tables are sent to the Inbound Queue (IBQ), sorted into commit order and translated to
Log Transfer Language(LTL). Only then are they checked for subscriptions. This results
in wasted space in the IBQ and processing by the SQT manager.
2. Make sure SQT has enought memory allocated. Also, check memory_limit
rs_configure 'sqt_max_cache_size' to 'xxxxx'
Find Bottlenecks
=======================
select * from master..syslogshold --check for large uncommitted transactions.
Measure diff between repagent position and end of log (1TP & 2TP)
-----------------------------------------------------------------
--rep agent - value of 'Current Marker' column, example (53550,1)
sp_help_rep_agent <db_name>
-- read until end of log
dbcc traceon(3604)
dbcc pglinkage(<dbid>, <current_marker>, 0,2,0,1)
example: dbcc pglinkage(5, 53550, 0,2,0,1)
example outout: "3909 pages scanned"
-- So repagent if 3909 pages behind log truncation marker.
-- We should have very little lag!
(see rs_checklag.ksh in
Repserver Trace Flags
The following Rep Server traceflags will track the commands being written to the stable
queue, and being passed to the Replicate dataserver.
both module and trace flag can be either upper or lower case.
Replication Server accepts trace flags from the config file. The syntax is
trace=module,trace_flag
Keep in mind that these will trace ALL commands, so will produce large amounts of output.
www.ddsafe.co.uk 10
Version 1.5 Page 11 03/05/2013
sp_who
go
get spid of RA
dbcc pss
dbcc stacktrace (<spid>)
www.ddsafe.co.uk 11
Version 1.5 Page 12 03/05/2013
Tuning
======
Tuning Primary DB
-----------------
sp_help_rep_agent <db_name>, 'config'
sp_config_rep_agent <db_name>, scan_batch_size, '10000' --max num records sent to RS
sp_config_rep_agent <db_name>, 'batch_ltl, 'true' --LTL cmds batched up then sent to RS
sp_config_rep_agent <db_name>, send_buffer_size, '16k' -- network packet size
sp_config_rep_agent <db_name>, priority, '2' --default is 5. lower=higher priority
WARNING: making changes to the rep agent can cause a warm stby connection to fail, if the
replicate DB name is different. Requires a resume connection..skip transaction. And the
config changes to be repeated at the replicates rep agent.
Tuning RSSD
-----------
sp_config_rep_agent <db_name>, priority, '2' --RSSD can have it's own repagent
Put on same machine as RS.
use 'localhost <port>' in interfaces file for ASE and RS
example:
REP1_RS
master tcp ether localhost 10010
master tcp ether <server> 10010
query tcp ether <server> 10010
www.ddsafe.co.uk 12
Version 1.5 Page 13 03/05/2013
Tuning Replicate DB
-------------------
change maint user priority in ASE
drop referential integrity checks (foriegn keys)
use func. strings instead of triggers.
Tuning DSI
----------
Incease replicate-ASE no. of locks
dsi_max_xacts_in_group
alter connection to RDS.rdb set db_packet_size to 'xxx'
switch on replicate minimal columns --use all columns if replicating to non-Sybase DB
Use parrallel DSI threads (do not do this lightly):-
parallel_dsi (sets standard values on multiple settings below)
dsi_num_threads
dsi_serialization_method
{none|wair_for_commit|isolation_level_3|single_transaction_per_origin}
dsi_sqt_max_cache_size
dsi_large_xact_size
dsi_num_large_xact_threads
dsi_partitioning_rule
Monitor Counters
========================
Requiring Setup
-------------
select * from rs_statdetails, rs_statrun
setup:
set stat_sampling to 'on'
admin stats_intrusive_counter, 'on'
stats_flush_rssd to on
stat_reset_afterflush to on
stat_daemon_sleep_time to '600'
admin stat_config_module, 'all_modules', 'on'
admin stat_config_connections
admin statatistics, flush_statistics
See White paper: "Sybase Replication Preformance and Tuning" by Jeff Tallman
http://my.sybase.com/detail?id=1015811
www.ddsafe.co.uk 13
Version 1.5 Page 14 03/05/2013
**If only a few tables are out of sync, you can use Sybase command-line utility called
rs_subcmp
Skipping transactions
---------------------
--If we encounter a duplicate insert error
#on RS:
resume connection to <ase>.<rdb> skip transaction
#on RSSD:
--find transaction id
rs_helpexception
--get SQL
rs_helpexception <tran_id>, v
Stop Replication
----------------
#on pdb:
select * from master..syslogshold where dbid=db_id(<pdb>)
go
sp_stop_rep_agent <pdb>
go
dbcc settrunc(ltm, ignore)
go
www.ddsafe.co.uk 14
Version 1.5 Page 15 03/05/2013
allow connections
go
-- Method shows the use of temporary database to hold database.
create database called 'temp_rep' then configure for replication.
use temp_rep
go
exec sp_config_rep_agent temp_rep, 'enable', '<RS>', 'sa', '<passwd>'
go
use master
go
load database temp_rep from '<dump_file>'
go
-- the "connect database" refers to <pdb>
exec sp_start_rep_agent temp_rep, recovery, '<ase>', '<pdb>', '<RS>'
go
--Once complete, RepAgent will shutdown
--Now repeat these steps for each tran. log. Load and start RepAgent.
--** Check replication Server errorlog for any messages about "loss detection". If none
found...
--restart RS in normal mode.
#on pdb
--put back 2TP
dbcc settrunc(ltm, valid)
go
sp_start_rep_agent <pdb>
go
--drop temp_rep!
www.ddsafe.co.uk 15
Version 1.5 Page 16 03/05/2013
General Troubleshooting
Stable Queue Full
Double check queue is full
In RSSD
=======
rs_helppartition
In RS
-----
Suspend connection to server1.pdb
Resume connection to pdb
touch /usr/replication/queue10.dat
www.ddsafe.co.uk 16
Version 1.5 Page 17 03/05/2013
The problem with this approach is that if there are a lot of duplicate keys, not only
could you be sitting for a while skipping the transactions, you run the risk of skipping
a transaction that isnt a duplicate key. Say if someone deleted the table on the
replicate database.. You could easily make a mess of things if you arbitrarily skip
transactions.
Replication Server has a feature called error classes that you can define the course of
action if an error occurs with a DSI connection. The only real issue is that the lowest
level of granularity is at the DSI connection level and the highest is all insert dbms
type (i.e. ASE) replicated systems. To create an error class:
The error classes can be inherited so if you wanted an error class to ignore duplicate
keys and another to stop replication on a duplicate key, you would do something like so:
Sybase ASEs error number for a duplicate key is 2601, but ASE will also raise the 3621
(aborted transaction) error. We need to set the error class ASEallowdupsErrorClass to
ignore duplicate keys:
Now that weve created the error class and set it to ignore duplicates, we need to do two
last things:
alter the DSI connections to use the new error class
suspend and then resume the DSI connections for the DSIs to use the new error class
Generally, applications should not be performing data entry of the same data across the
replicated databases as Replication Server is made for it.
First file
==========
select ds_errorid, action=v.name
from rs_erroractions e, rs_classes c, rs_tvalues v
where e.errorclassid=c.classid
and e.action=v.value
and v.type='ERR'
and c.classname='rs_sqlserver_error_class'
order by 1
go
Second File
www.ddsafe.co.uk 17
Version 1.5 Page 18 03/05/2013
===========
select ds_errorid, action=v.name
from rs_erroractions e, rs_classes c, rs_tvalues v
where e.errorclassid=c.classid
and e.action=v.value
and v.type='ERR'
and c.classname='ASEallowdupsErrorClass'
order by 1
go
Now do a diff against these files and any different codes will be displayed. To find
out what the codes are, in RSSD
rs_helperror 2601, v
Row count mismatch for the command executed on dataserver.database. The command
impacted x rows but it should impact y rows.
rs_init_erroractions composer_repserver_error_class,rs_repserver_error_class
go
--you will see the following rows inserted into rs_erroractions
-- 5185 0x010000650100006b 3 16777317
-- 5186 0x010000650100006b 2 16777317
-- 5187 0x010000650100006b 3 16777317
-- 5193 0x010000650100006b 2 16777317
-- rs_helpdb now shows connection with new Rep Server Error Class:-
--
dsname dbname dbid controlling_prs
errorclass repserver_errorclass funcclass
status
www.ddsafe.co.uk 18
Version 1.5 Page 19 03/05/2013
Determine Latency
RDB
===
Select PrimaryDBID=origin, datediff(ss, origin_time, dest_commit_time) Latency (sec),
LastXactOriginTime = origin_time
FROM rs_lastcommit where origin > 0
go
In PDB
======
Use <pdb>
go
Sp_config_rep_agent, <pdb>, disable --check master..syslogshold to confirm
go
In RSSD
=======
delete from rs_subscriptions where subname=<subname>
go
delete from rs_dbreps where dbrepname='<db_repdef_name>
go
Detecting loss
Sometimes replication stops without an error. This could happen after a restore of the
primary database. If message loss occurs we will not always see this using admin who
and repserver might not print a detecting loss message to the errorlog. Check the
rs_oqid and rs_exceptslast in the RSSD and to see if some of the queues show a status of
2 which indicates that the queue is suspended due to lost messages.
If repserver has not correctly recognised that loss has occurred, then in order for
repserver to ignore these errors, we must get it to find them. Restart repserver and
check the errorlog for message:
DSI: detecting loss for database
In RS
====
Ignore loss from prim_server.prim_db
go
www.ddsafe.co.uk 19
Version 1.5 Page 20 03/05/2013
both module and trace flag can be either upper or lower case.
Replication Server accepts trace flags from the config file. The syntax is
trace=module,trace_flag
Keep in mind that these will trace ALL commands, so will produce large amounts of output.
sp_who
go
get spid of RA
dbcc pss
dbcc stacktrace (<spid>)
www.ddsafe.co.uk 20
Version 1.5 Page 21 03/05/2013
>go
www.ddsafe.co.uk 21
Version 1.5 Page 22 03/05/2013
CURRMARKER=`cat $OUTFILE | awk '{print $4}' | sed -e 's/(//g' -e 's/)//g' |awk -F','
'{print $1}'`
rm $OUTFILE
# Now work out the pages scanned between 1TP & 2TP
isql -U$USERNAME -S$TRGSRV -D$DBNAME -w1024 <<-EOF | egrep -v "Password:|return status" >
$OUTFILE
$PWD
set nocount on
dbcc traceon (3604)
go
declare @dbid_num int
select @dbid_num=db_id('$DBNAME')
dbcc pglinkage(@dbid_num, $CURRMARKER, 0,2,0,1)
go
EOF
PAGESCANS=`cat $OUTFILE | grep 'pages scanned' | awk '{print $1}'`
rm $OUTFILE
www.ddsafe.co.uk 22
Version 1.5 Page 23 03/05/2013
sp__queueinfo
create proc sp__queueinfo
as
set nocount on
declare @total varchar(10),
@free varchar(10),
@freeperc varchar(10),
@repserver varchar(30),
@datetime varchar(20)
from <rssd_dbbname>..rs_diskpartitions
Appendix B troubleshooting
Uninstall repserver program
if you want to trash your repserver and start over agin, you may find that
it will not uninstall. If that is the case, follow these instructions
www.ddsafe.co.uk 23
Version 1.5 Page 24 03/05/2013
Server 'AGSIT_DB_REP_RS':
Can not drop logical connection to COMPOSER_DS.SIT_Composer because either subscriptions
of repdefs exist for it"
Check
select * from rs_databases
select * from rs_object
if the rs_databases.. dist_status or src_status are greater than 1, then this indicates an issue.
The connection could have any of the following
Status of the connection. Can be:
0x1 valid
0x2 suspended
0x4 suspended by a standby-related action
Example:-
1> select * from rs_databases
2> go
dsname dbname dbid dist_status src_status attributes errorclassid funcclassid prsid
rowtype sorto_status ltype ptype ldbid enable_seq rs_errorclassid
------------------------------ ------------------------------ ----------- ----------- ---
------- ---------- ------------------ ------------------ ----------- ------- ------------
----- ----- ----------- ----------- ------------------
AGSIT_DB_REP_ASA AGSIT_DB_REP_ASA 101 1 0 0 0x0000000001000002 0x0000000001000001
16777317 0 0 P A 101 0 0x000000000100001a
COMPOSER_DS SIT_Composer 102 17 17 0 0x0000000000000000 0x0000000000000000 16777317 1 0 L
L 102 0 0x0000000000000000
rs_drp0x0 is an internal repdef which belongs to 102. you can manually delete it, then, issue
drop logical connection.
www.ddsafe.co.uk 24