Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

EXAClusterOS-6 0 6

Download as pdf or txt
Download as pdf or txt
You are on page 1of 138
At a glance
Powered by AI
The document provides an overview and reference for EXAClusterOS 6.0.6, including its system structure, core daemons, utilities, and how to manage the cluster.

The main components described include the core daemon, EXAClusterOS utilities like cosexec and cosps, DWAd for distributed workload administration, and Loggingd for logging services.

Utilities described include cosexec for process execution, cosps for process status, cosadd for resource addition, and others for killing processes, moving/removing resources, and more.

The most powerful engine

for your analytics!

EXAClusterOS 6.0.6 Reference


EXAClusterOS 6.0.6 Reference

Table of Contents
1. Introduction ............................................................................................................................... 1
1.1. System structure .............................................................................................................. 1
1.1.1. PID namespaces .................................................................................................... 1
2. Core Daemon ............................................................................................................................. 3
2.1. Mission .......................................................................................................................... 3
2.2. Concepts ........................................................................................................................ 3
2.3. Startup ........................................................................................................................... 3
3. EXAClusterOS Core Utils ............................................................................................................ 7
3.1. cosexec .......................................................................................................................... 7
3.2. cosps ............................................................................................................................. 8
3.3. cosadd ............................................................................................................................ 8
3.4. coskill ............................................................................................................................ 9
3.5. coskillall ......................................................................................................................... 9
3.6. cosmod .......................................................................................................................... 9
3.7. cosmv .......................................................................................................................... 10
3.8. cosrm ........................................................................................................................... 10
3.9. cosstop ......................................................................................................................... 11
3.10. cos-timeout-start ........................................................................................................... 11
3.11. coswait ....................................................................................................................... 11
3.12. hddident ...................................................................................................................... 12
4. DWAd .................................................................................................................................... 13
4.1. Mission ........................................................................................................................ 13
4.2. Startup ......................................................................................................................... 13
4.3. User interface ................................................................................................................ 13
4.3.1. dwad_client ........................................................................................................ 13
4.4. Design .......................................................................................................................... 18
4.5. Recovery mechanisms ..................................................................................................... 18
4.5.1. Process failures .................................................................................................... 18
4.5.2. Node crashes ....................................................................................................... 19
4.6. Network splits ................................................................................................................ 19
4.7. Additional information .................................................................................................... 19
5. Loggingd ................................................................................................................................. 21
5.1. Mission ........................................................................................................................ 21
5.2. Startup ......................................................................................................................... 21
5.3. User interface ................................................................................................................ 21
5.3.1. logd_client .......................................................................................................... 21
5.3.2. logd_collect ........................................................................................................ 21
6. Lockd ..................................................................................................................................... 23
6.1. Mission ........................................................................................................................ 23
6.2. Startup ......................................................................................................................... 23
7. StorageD ................................................................................................................................. 25
7.1. User interface ................................................................................................................ 25
7.1.1. csinfo ................................................................................................................ 25
7.1.2. cslabel ............................................................................................................... 25
7.1.3. csvol .................................................................................................................. 26
7.1.4. csctrl ................................................................................................................. 27
7.1.5. csmd .................................................................................................................. 28
7.1.6. csmove ............................................................................................................... 29
7.1.7. csrec .................................................................................................................. 29
7.1.8. csresize .............................................................................................................. 30
7.1.9. cssetio ................................................................................................................ 31
7.1.10. cssnap .............................................................................................................. 31
7.1.11. csconf .............................................................................................................. 32
8. Management ............................................................................................................................ 33
8.1. Logging ........................................................................................................................ 33

iii
9. EXAoperation .......................................................................................................................... 37
9.1. Components .................................................................................................................. 37
9.2. Logging ........................................................................................................................ 37
9.3. Permissions ................................................................................................................... 37
9.4. Processes ...................................................................................................................... 41
9.4.1. Installation .......................................................................................................... 41
9.4.2. Booting .............................................................................................................. 41
9.4.3. Restore .............................................................................................................. 42
9.4.4. Restore (Storage) ................................................................................................. 42
9.5. Renaming of databases .................................................................................................... 43
9.6. Update servers ............................................................................................................... 43
9.7. Interfaces ...................................................................................................................... 43
9.7.1. Storage archive volumes ........................................................................................ 44
9.8. Maintenance user ........................................................................................................... 44
9.9. Failover ........................................................................................................................ 44
9.9.1. License server failure ............................................................................................ 44
9.9.2. Power outage and checksum mismatches .................................................................. 44
9.10. Automatic node reordering for Storage databases ................................................................ 45
9.11. Volume restore delay ..................................................................................................... 45
9.12. Using the EXAoperation browser interface ........................................................................ 45
9.12.1. Form "EXASolution Instances" ............................................................................. 45
9.12.2. Form "EXASolution Instance" .............................................................................. 48
9.12.3. Form "EXAStorage" ........................................................................................... 53
9.12.4. Form "EXAStorage Volume Node Information" ....................................................... 55
9.12.5. Form "EXAStorage Node Information" .................................................................. 56
9.12.6. Form "EXAStorage Node Device Information" ........................................................ 58
9.12.7. Form "EXABucketFS Services" ............................................................................ 58
9.12.8. Form "Cluster Nodes" ......................................................................................... 60
9.12.9. Form "Backups Information" ................................................................................ 66
9.12.10. Form "Access Management" ............................................................................... 68
9.12.11. Form "Versions" ............................................................................................... 69
9.12.12. Form "UDF Libraries" ....................................................................................... 71
9.12.13. Form "JDBC Drivers" ........................................................................................ 73
9.12.14. Form "EXACluster Debug Information" ................................................................ 75
9.12.15. Form "Monitoring Services" ............................................................................... 76
9.12.16. Form "Threshold Values" ................................................................................... 77
9.12.17. Form "Network" ............................................................................................... 78
9.12.18. Form "License" ................................................................................................ 82
9.13. EXAoperation Add/Edit Forms ........................................................................................ 83
9.13.1. Form "Create EXACluster Node" .......................................................................... 83
9.13.2. Form "EXACluster Node Properties" ..................................................................... 84
9.13.3. Form "EXACluster Node Disk Properties" .............................................................. 84
9.13.4. Form "Edit EXASolution Instance" ........................................................................ 85
9.13.5. Form "Create EXASolution Instance" ..................................................................... 85
9.13.6. Form "EXACluster Logging Service" ..................................................................... 86
9.13.7. Form "Create Remote Volume Instance" ................................................................. 86
9.13.8. Form "Create Jdbc Driver" ................................................................................... 87
9.13.9. Form "EXACluster Jdbc Drivers" .......................................................................... 87
9.13.10. Form "Create EXACluster Route" ........................................................................ 87
9.13.11. Form "EXACluster Route Properties" ................................................................... 87
9.13.12. Form "Create EXACluster Vlan" .......................................................................... 87
9.13.13. Form "EXACluster Vlan Properties" ..................................................................... 87
9.13.14. Form "Create EXACluster Public Vlan" ................................................................ 88
9.13.15. Form "EXACluster Public Vlan Properties" ........................................................... 88
9.13.16. Form "Create EXACluster Ipmi Group" ................................................................ 88
9.13.17. Form "EXACluster Ipmi Group Properties" ........................................................... 88
9.13.18. Form "Create Key Store" .................................................................................... 88
9.13.19. Form "Key Store Properties" ............................................................................... 88

iv
EXAClusterOS 6.0.6 Reference

9.13.20. Form "EXACluster Default Disk Configuration" ..................................................... 89


9.13.21. Form "EXACluster System Properties" ................................................................. 89
9.13.22. Form "EXACluster Update Url" .......................................................................... 89
9.13.23. Form "EXACluster Remote Syslog Settings" ......................................................... 90
9.13.24. Form "EXACluster Monitor Thresholds" ............................................................... 90
9.13.25. Form "EXACluster Password Properties" .............................................................. 90
9.14. XML-RPC interface ...................................................................................................... 90
9.14.1. Fetch log messages ............................................................................................. 90
9.14.2. Fetch only new log messages ................................................................................ 91
9.14.3. Get state of a database ......................................................................................... 92
9.14.4. Get connection state of a database ......................................................................... 93
9.14.5. Get current connection string of a database .............................................................. 93
9.14.6. Get current database nodes ................................................................................... 94
9.14.7. Get current operation of a database ........................................................................ 94
9.14.8. Start a database .................................................................................................. 95
9.14.9. Stop a database .................................................................................................. 95
9.14.10. Backup a Storage database ................................................................................. 96
9.14.11. Backup a database ............................................................................................ 97
9.14.12. Upload and activate an iptables firewall configuration. ............................................. 97
9.14.13. Get current firewall configuration. ....................................................................... 98
9.14.14. Startup node .................................................................................................... 98
9.14.15. Shutdown node ................................................................................................. 99
9.14.16. Get hardware information ................................................................................... 99
9.14.17. Get list of cluster nodes .................................................................................... 100
9.14.18. Get current EXAoperation main node ................................................................. 100
9.14.19. Get current EXASuite version ........................................................................... 101
9.14.20. Get list of archive volumes ................................................................................ 101
9.14.21. Get list of volumes .......................................................................................... 101
9.14.22. Get information about volumes .......................................................................... 102
9.14.23. Get list of databases ......................................................................................... 102
9.14.24. Get information about database .......................................................................... 103
9.14.25. Get list of database backups .............................................................................. 104
9.14.26. Get information about backup ............................................................................ 104
9.14.27. Get database statistics ...................................................................................... 105
9.14.28. Start EXAStorage service ................................................................................. 106
9.14.29. Stop EXAStorage service ................................................................................. 107
9.14.30. Get state of EXAClusterOS services ................................................................... 107
9.14.31. Get IPMI sensor status of a node ........................................................................ 108
9.14.32. Get state of a node ........................................................................................... 109
9.14.33. Get disk state(s) of a node ................................................................................. 109
9.14.34. Show list of installed plugins ............................................................................. 110
9.14.35. Call plugin function ......................................................................................... 111
9.14.36. Show plugin functions ..................................................................................... 114
9.15. Libvirt interface for managing cluster nodes ..................................................................... 114
9.16. Hardware Security Modules .......................................................................................... 115
9.17. Disk wipe .................................................................................................................. 115
9.18. Increase disk space ...................................................................................................... 115
9.19. Hugepages ................................................................................................................. 115
9.20. Compatibility and known issues ..................................................................................... 116
9.20.1. Web browsers ................................................................................................... 116
9.20.2. Pulling backups ................................................................................................ 117
9.20.3. Synchronizing backups between EXASuite clusters ................................................. 117
9.20.4. SW-RAID ....................................................................................................... 117
9.20.5. NTP symmetric key exchange ............................................................................. 117
9.20.6. Python and XML-RPC ....................................................................................... 117
9.20.7. Block sizes ...................................................................................................... 117
9.20.8. Enlarging databases ........................................................................................... 118
9.20.9. Uploading multiple backups at once ..................................................................... 118

v
9.20.10. Remote syslog servers ...................................................................................... 118
10. Installation ........................................................................................................................... 119
10.1. Installation on a license server via installation medium ....................................................... 119
10.2. Automated installation of a license server via network ........................................................ 119
10.2.1. Configuration file settings .................................................................................. 121
10.3. Installation of EXAClusterOS on a bare CentOS server system ............................................ 122
10.4. Client Nodes .............................................................................................................. 123
10.4.1. Client boot process ............................................................................................ 123
10.5. Updates ..................................................................................................................... 124
10.5.1. Updates from version 4.x ................................................................................... 124
10.5.2. Updates and defect nodes ................................................................................... 124
10.6. Downgrades ............................................................................................................... 124
10.7. Add another license server into a cluster .......................................................................... 124
10.8. Add new disks to client nodes without re-installation .......................................................... 125
Glossary ................................................................................................................................... 127

vi
EXAClusterOS 6.0.6 Reference

List of Figures
1.1. PID namespaces ....................................................................................................................... 1
9.1. Show foreign database backups ................................................................................................. 42
9.2. Example view: Form "EXASolution Instances" ............................................................................ 45
9.3. Example view: Form "EXASolution Instance" ............................................................................. 48
9.4. Example view: Form "EXAStorage" .......................................................................................... 53
9.5. Example view: Form "EXAStorage Volume Node Information" ....................................................... 55
9.6. Example view: Form "EXAStorage Node Information" .................................................................. 56
9.7. Example view: Form "EXAStorage Node Device Information" ....................................................... 58
9.8. Example view: Form "EXABucketFS Services" ........................................................................... 58
9.9. Example view: Form "Cluster Nodes" ......................................................................................... 60
9.10. Example view: Form "Backups Information" .............................................................................. 66
9.11. Example view: Form "Access Management" .............................................................................. 68
9.12. Example view: Form "Versions" .............................................................................................. 69
9.13. Example view: Form "UDF Libraries" ...................................................................................... 71
9.14. Example view: Form "JDBC Drivers" ....................................................................................... 73
9.15. Example view: Form "EXACluster Debug Information" ............................................................... 75
9.16. Example view: Form "Monitoring Services" ............................................................................... 76
9.17. Example view: Form "Threshold Values" ................................................................................... 77
9.18. Example view: Form "Network" .............................................................................................. 78
9.19. Example view: Form "License" ............................................................................................... 82
9.20. DB RAM and hugepages ...................................................................................................... 116

vii
viii
EXAClusterOS 6.0.6 Reference

List of Tables
4.1. Valid parameters ..................................................................................................................... 17
9.1. EXAoperation permissions ....................................................................................................... 39

ix
x
Chapter 1. Introduction

Chapter 1. Introduction
1.1. System structure

1.1.1. PID namespaces


For a better isolation of system processes, EXAClusterOS uses so-called PID namespaces. There are two PID
namespaces on each node: a root namespace with /sbin/init as the init process and an EXAClusterOS sub-namespace
with cos_cored as init process. Thus, processes started in the EXAClusterOS namespaces are not able to leave the
logical EXAClusterOS cluster, even by calling daemon(). Figure 1.1, “PID namespaces” shows this appropriate
structure and some important services of each namespace.

Figure 1.1. PID namespaces

1
2
Chapter 2. Core Daemon

Chapter 2. Core Daemon


2.1. Mission
The Core daemon is the main cluster service. It is executed on every cluster node and its primary task is to maintain
the cluster infrastructure and provide it to applications executed in a cluster. For this purpose, the Core daemon
uses a membership protocol that allows to detect nodes joining a cluster as well as node failures, so that every
node belonging to a cluster has the same view to the cluster and is able to respond properly to cluster events, such
as node losses.

Furthermore, the Core daemon maintains all processes executed in a cluster. All processes in an EXAClusterOS
cluster are direct or indirect child processes of the Core daemon. They are executed in so called partitions so that
every cluster process can be identified by a distinct partition ID and every instance of a distributed cluster process
has an appropriate node ID. Partitions are hierarchical, thus every partition can have an arbitrary number of sub-
partitions.

Hint: In case a cluster process requests to create a new subpartition, it has to send an appropriate request to the
Core Deamon. Processes created by means of fork()/execve() system calls are identified by the same partition and
node ID as their parent processes.

2.2. Concepts
There exist three main concepts for process management in an EXAClusterOS cluster applied by the Core daemon:

• Physical node: A physical node is a machine (physical or virtual) that runs an operating system. It can be em-
bedded in one or more EXAClusterOS clusters, each having a unique cluster ID. A physical node cannot appear
multiple times in the same EXAClusterOS cluster. A physical node is member of an appropriate cluster once
a Core daemon of that cluster started on this node.

• Partition: An EXAClusterOS partition is a collection of one or more processes with the same start command
on one or more physical nodes. A partition may contain a physical node multiple times. A process of this par-
tition is identified by a logical node ID (a logical node ID may also identify some more processes; see below).

• Logical node: A logical node identifies a process in a partition. Node IDs are counted from zero and reach the
value of the partition size - 1. Furthermore, all threads and forked processes of this process share the same lo-
gical node ID in this partition with the "original" process.

2.3. Startup
The COS cluster daemon can be executed without providing any command line parameters. However, it might be
necessary to change some default settings. The COS kernel module must be installed for execution.

The following command line arguments are allowed:

• -a, --exit-always: Force CoreD to exit on shutdown of root process.

• -b address, --broadcast-address=address: Broadcast address for internal communication.

• --broadcast-port=port: Broadcast/multicast port for internal communication (only for testing purposes).

• --local-to-broadcast-port=port: Local UDP port to use for connecting with multicast/broadcast address (only
for testing purposes).

• -c id, --cluster-id=id: Cluster ID to use by CoreD.

3
2.3. Startup

• -d, --daemonize: Daemonize CoreD after startup.

• -e, --exit-on-error: Force Cored to exit after the root process exits with an error code not 0.

• -f, --force-broadcast-address: Force broadcast address, even if this is a wrong one.

• -g group-id, --gid=group-id: Start Cored with specified group ID.

• -i signals, --ignore-signals=signals: Specifies a list of signals to ignore.

• --inherit-priority: Let child processes inherit reniced priority (else, use setpriority() and set default priority for
child processes when using --renice).

• --initial-nodes=nodes: Use specified nodes for initial cluster configuration.

• -l dir, --log-directory=dir: Use specified log directory to redirect output of EXAClusterOS processes.

• --logfile-pattern=pattern: Use appropriate standard logfile pattern for logfile redirection (default:
%l/%e.%c.%p.%n.%P.log).

• -m address, --multicast-address=address: Multicast address for internal communication.

• -n id, --node-id=id: Specify node ID to use for startup.

• -o value, --oom-adjustment=value: Specify out-of-memory adjustment used for Cored.

• -p num, --port=num: UDP port for internal communication.

• -r, --restart-root-process: Force Cored to restart the root process if it exits.

• -s address, --interface-address=address: Bind Cored to specified interface.

• -t id, --cluster-port=id: ID to use for cluster and UDP port for internal communication.

• --auth-sock-dir=directory: Specify directory in which to locate authentication socket(s) for clients (default:
/tmp).

• --master-auth=master-socket-name: Open specified UNIX socket of master authentication process.

• -u user-id, --uid=user-id: Start Cored with specified user ID.

• -v, --full-version: Provide information output.

• -w, --wait: Wait up to 60 seconds for messages from other core daemons at startup before doing anything else.

• --default-file-mode=file-mode: Specify default file mode for stdout/stderr of partition process(es).

• --renice=priority: Renice Cored to given value via setpriority(2).

• --consensus-timeout=seconds: Timeout for reaching consensus with a new cluster configuration.

• --join-timeout=seconds: Join timeout for new cluster configuration.

• --token-delay-time-ms=milliseconds: Specifies minimum delay time between two regular tokens in operational
state.

• --token-loss-timeout=seconds: Specifies timeout of token after which cored tries to gather a new cluster).

• --token-retransmission-timeout=seconds: Specifies timeout of token after which cored resends token into
cluster.

4
Chapter 2. Core Daemon

• --fail-to-receive-counter=number: Specifies number of membership messages a node is allowed to miss before


being excluded from a cluster.

• --max-fragment-size=number: Specifies maximum fragment size for internal (broadcast) messages..

• --membership-window-size=number: Specifies maximum number of broadcast messages per token rotation..

• --membership-max-messages=number: Specifies maximum number of broadcast messages per Core daemon


per token rotation..

• --vpid: Start Cored in a new PID namespace.

• --vipc: Start Cored in a new IPC namespace.

• --vnet: Start Cored in a new net namespace.

• --vuts: Start Cored in a new uts namespace.

Furthermore, the following arguments may/must be specified on the command line: {root process command}
[command arguments]

Example usage: cos_cored -t 12345 -l /tmp/my_dir /bin/bash start.sh

5
6
Chapter 3. EXAClusterOS Core Utils

Chapter 3. EXAClusterOS Core Utils


The EXAClusterOS Core Utils are a collection of command line tools for executing cluster programs, manipulating
cluster partitions and requesting current cluster status. They are described in the following sections.

3.1. cosexec
The newly started partition includes all nodes in the same order they are specified with -n or -N. With -c, the CoreD
will choose the corresponding nodes, depending on their current usage. If not specifying any node, a new partition
with only one node will be created.

The following command line arguments are allowed:

• -a, --all-nodes: Start new partition on all nodes.

• --allow-offline-nodes: Allow one or more offline nodes in started partition.

• -c number, --count=number: Number of nodes to use for partition.

• -e, --enhance-env: Enhance process environment(s) instead of creating a new one.

• -f mode, --file-mode=mode: Specify file mode for stdout/stderr of partition process(es).

• -g groupid, --gid=groupid: Start partition process(es) with specified group ID (root user only).

• -l name, --alternative-name=name: Use alternative name for partition (instead of first command line argument).

• -N nodes, --nodenames=nodes: Comma separated list of node names for partition.

• -n ids, --node-ids=ids: Comma separated list of (root) node ids.

• -o filename, --ofile=filename: Filename for output of stdout/stderr.

• --only-online-nodes: Only use online nodes when specifying --all-nodes.

• -r, --redirect-io: Redirect I/O of new partition.

• -t, --show-root-node-ids: Show root node IDs when redirecting I/O to stdout/stderr instead of logical node IDs..

• -s, --single-instance: Run executed binary only once in the cluster.

• -u userid, --uid=userid: Start partition process(es) with specified user ID (root user only).

• -v, --verbose: Make some output.

• -w, --wait: Wait for partition to be finished.

• --wait-status: Wait for partition to be finished and exit with status 2 in case partition exited with code not equal
to 0.

• -x, --exclusive: (not used).

• -z node-ids, --except-nodes=node-ids: Start partition on nodes except the specified ones and do not recognize
these nodes for auto-add.

• --auto-add: Automatically add new root nodes to partition.

• --auto-restart: Restart process(es) of partition automatically after process exit.

7
3.2. cosps

• --env=environ: Specify process environment, e.g. --env="A=xyz B=123".

Furthermore, the following arguments may/must be specified on the command line: command [command ar-
guments]

Example usage: cosexec --redirect-io --all-nodes bash

3.2. cosps
Show general information about the cluster. The first part of output displays known cluster nodes, their root node
ID and state (offline or online) while the second part contains the partition table. Each entry of this table is charac-
terized by a partition ID, a user and group ID, the parent partition ID, a list of partition nodes and the executed
command.

The following command line arguments are allowed:

• -e, --show-env: Show partition environment.

• -f, --full: Show full executable path and node configuration.

• -x, --no-maximize: Disable to try using all columns of terminal.

• -m, --names: Show name(s) of groups/users instead of IDs.

• -n, --root-nodes: Show root nodes.

• -N, --physical-nodes: Show physical node IDs for each logical node.

• -p partition-id, --parent-partition=partition-id: Show only partitions with specified parent partition.

• -r, --offline-nodes: Show physical node ID of offline nodes.

• -s, --show-real-name: Do not show alternative name of partition.

• -u userid, --user=userid: Show only partitions with specified owner user.

• -g groupid, --group=groupid: Show only partitions with specified owner group.

3.3. cosadd
Extend an existing partition by one node.

The following command line arguments are allowed:

• -s, --single-instance: Add physical node only once to partition.

• -n id, --node-id=id: Physical node ID for new logical node.

• -N name, --node-name=name: Name of node for new logical node.

• -r id, --root-node-id=id: Node ID of the new node when adding nodes to the root partition.

Furthermore, the following arguments may/must be specified on the command line: {partition ID}

Example usage: cosadd -n 0 3

8
Chapter 3. EXAClusterOS Core Utils

3.4. coskill
Send a signal to one or all cluster process instances (logical nodes) addressed by its partition (and logical node
ID).

The following command line arguments are allowed:

• -w, --wait: Wait for exit of logical node.

• -s signal, --signal=signal: Signal to send to node(s)..

• -a, --all: Send signal to all processes of partition..

• -SIGNO: Signal number to send.

Furthermore, the following arguments may/must be specified on the command line: [-SIGNO] {partition
ID} {node ID}

Example usage: coskill -SIGTERM 3 0

3.5. coskillall
Send a signal to every cluster process instance (logical node) of one or more partitions addressed by their name.

The following command line arguments are allowed:

• -w, --wait: Wait for exit of all logical nodes.

• -a, --wait-nodes: Wait for exit of nodes in partition, but not for exit of whole partition (useful for auto-restart
partitions).

• -s signal, --signal=signal: Signal to send to node(s)..

• -n nodenumber, --node=nodenumber: Signal specified node instead of all nodes..

• -SIGNO: Signal number to send to logical nodes of partition..

Furthermore, the following arguments may/must be specified on the command line: [-SIGNO] {partition
ID}

Example usage: coskillall -SIGKILL 5

3.6. cosmod
Set/unset several partition flags.

The following command line arguments are allowed:

• --force-synchronization: Force synchronization of cluster.

• -a, --set-auto-add: Set auto-add flag to partition.

• -u, --unset-auto-add: Unset auto-add flag from partition.

• -r, --set-auto-restart: Set auto-restart flag to partition.

• -s, --unset-auto-restart: Unset auto-restart flag from partition.

9
3.7. cosmv

• -c seconds, --set-consensus-timeout=seconds: Timeout for reaching consensus with a new cluster configuration.

• -j seconds, --set-join-timeout=seconds: Join timeout for new cluster configuration.

• -d milliseconds, --set-token-delay-time=milliseconds: Specifies minimum delay time between two regular


tokens in operational state.

• -l seconds, --set-token-loss-timeout=seconds: Specifies timeout of token after which cored tries to gather a
new cluster.

• -m seconds, --set-token-retransmission-timeout=seconds: Specifies timeout of token after which cored resends


token into cluster.

• -f number, --set-fail-to-receive-counter=number: Specifies number of membership messages a node is allowed


to miss before being excluded from a cluster.

Furthermore, the following arguments may/must be specified on the command line: {partition}

Example usage: cosmod --set-auto-add 3

3.7. cosmv
Move a cluster process instance to another cluster node. The addressed logical node - respectively its corresponding
UNIX process - will be stopped in case it is still running and then be started on the specified root node. It is not
really "moving", because all information of the former process will be lost. The new process will be addressed
with the same partition and logical node ID.

The following command line arguments are allowed:

• -n nid, --node-id=nid: Root node ID to which to move logical node to.

• -N name, --node-name=name: Name of node to move logical node to.

Furthermore, the following arguments may/must be specified on the command line: {partition} {node}

Example usage: cosmv -n 3 5 0

3.8. cosrm
Remove the last node or all nodes of a partition. In case the cluster process instance is still running, it will be
stopped. After removing the last logical node of a partition, the whole partition will be removed from the partition
table.

The following command line arguments are allowed:

• -a, --all-nodes: Remove whole partition.

• -n IDs, --node=IDs: Comma-separated list of root node IDs to remove (only when removing nodes from the
root partition).

• -f, --force: Force removal of physical node (only used if removing nodes from root partition).

Furthermore, the following arguments may/must be specified on the command line: {partition ID}

Example usage: cosrm -a 5

10
Chapter 3. EXAClusterOS Core Utils

3.9. cosstop
Set/unset several partition flags.

The following command line arguments are allowed:

• This program does not take any specific flags..

Furthermore, the following arguments may/must be specified on the command line: {partition ID} {node
ID}

Example usage: cosstop 5 0

3.10. cos-timeout-start
Execute a command and wait for completion during a user-specified timeout. Returns status of command if the
appropriate process finishes, else (or on any other error) the exit code 100 will be returned and the started process
will be killed with SIGTERM.

The following command line arguments are allowed:

• This program does not take any specific flags..

Furthermore, the following arguments may/must be specified on the command line: {timeout} {command}
[command arguments]

Example usage: cos-timeout-start 15 tar czf example.tar.gz example/

3.11. coswait
coswait may be used at cluster startup to wait for a number of nodes etc.

The following command line arguments are allowed:

• -c number, --cluster-nodes=number: Number of cluster nodes to wait for.

• -o number, --online-nodes=number: Number of online cluster nodes to wait for.

• -n nids, --root-nodes=nids: List of root node IDs to wait for.

• -N names, --node-names=names: List of names of root nodes to wait for.

• -r number, --retries=number: Specifies number of retries in case of communication errors.

• -t number, --seconds=number: Number of seconds to wait for condition to become true.

• -s number, --sleep=number: Number of seconds to sleep after the condition became true and before executing
command.

Furthermore, the following arguments may/must be specified on the command line: command [command ar-
guments]

Example usage: coswait -o 5 xterm

11
3.12. hddident

3.12. hddident
HDD Ident is used to store and restore metadata of HDD drives.

The following command line arguments are allowed:

• -m string, --meta-file string: Meta file to use.

• -a, --show-all: Show all fields from the meta file.

• -u string, --uuid-file string: File with UUID.

• -i, --id: Get the id.

• -I, --set-id: Set id from UUID file.

• -g, --gid: Get the global UUID.

• -G, --set-gid: Generate new UUID file.

• -n, --name: Get the device name.

• -N string, --set-name string: Set the device name.

• -f, --formated: Get the formated state.

• -F string, --set-formated string: Set the formated state.

• -s, --storage: Get the storage state.

• -S string, --set-storage string: Set the storage state.

• -p, --device-info: Get the device info.

• -l number, --identify number: Activate drives for given number of seconds.

Example usage: hddident -m /dev/sda1 -i

12
Chapter 4. DWAd

Chapter 4. DWAd
4.1. Mission
The DWAd service can be used to manage one or more Exasolution DW systems and provides a user interface to
access its database administration functionalities. It is able to do automatic failure recovery in case of process and
node crashes.

4.2. Startup
This daemon has to be started as a service in its own partition. This partition must not be the root partition and
may be resized later by appropriate EXAClusterOS tools. Keep in mind to start the DWAd service on every node
dedicated to be a database node when specifying systems. It requires one partition with an active LoggingD and
one with an active LockD service.

The following command line arguments are allowed:

• --global-log-dir=dir: Directory in which to write global log managed by LoggingD.

• --backupfile=file: Start DWAd with backup data.

• --overcommit-memory: Do not care about memory resource limits on nodes and balance SLB equal for EX-
ASolution systems.

• --store-config=file: Name of file in which to store configuration data at the end of a DWAd process.

• --store-interval=seconds: Store DWAd configuration periodically into file provided by --store-config.

Example usage: cosexec --all-nodes --auto-add --auto-restart --single-instance


-- dwad

4.3. User interface

4.3.1. dwad_client
This program is a client for the DWAd service.

The following command line arguments are allowed:

• add {name}: Add new system.

• del {name}: Delete system.

• rename {name} {new name}: Rename system.

• volume-restore: Show systems for which a volume restore may be necessary.

• start {name}: Start system.

• start-features {name} {features}: Start system with specified features.

• start-wait {name}: Start system and wait until it becomes reachable by DB client applications.

• start-create-new-db {name}: Start system with -create-new-db flag.

13
4.3. User interface

• start-create-new-db-features {name} {features}: Start system with specified features and -create-new-db flag.

• pdd-restore {name}: Start system in restore mode.

• pdd-restore-features {name} {features}: Start system with specified features and -create-new-db flag.

• start-maintenance {name}: Start system with -maintainance flag.

• start-failsafety {name}: (Re-)start system for failsafety.

• stop {name}: Stop system.

• stop-signal {name} {signal number} [timeout]: Stop system with specified signal and timeout.

• stop-wait {name}: Stop system and wait until all processes stopped.

• stop-force {name}: Stop system and set it immediately into setup state.

• setup {name} {setup file}: Setup system.

• setup-param {name} {setup parameter} {value}: Setup a parameter of a system.

• setup-node-groups {node group}: Group database nodes. Each group must be specified as one parameter on
the command line..

• get-node-groups: Retrieve node groups..

• extend-system {name}: Increase number of active system nodes by one.

• db-size-info {name}: Show size info of database.

• list: List all systems.

• print-params {name}: Print parameters of system.

• uptime {name}: Show uptime of system in seconds. This uptime starts with a system being connectable.

• print-original-setup {name}: Print original setup configuration of system.

• print-setup {name}: Print setup based on configuration of system.

• print-dummy-setup: Print dummy system setup.

• print-dummy-setup-gdb: Print dummy system setup with GDB usage.

• print-dummy-setup-dbram: Print dummy system setup with GDB usage.

• conn {name}: Print main host and port of system.

• sys-nodes {name}: List nodes of system.

• shortlist: List systems.

• show-files {name} {iproc}: List datafiles of system for specified iproc number.

• pdd-proc {name}: Print information about PDD process of system.

• pdd-proc-wait {name}: Wait for information about PDD process of system and print.

• flush-pdd-proc {name}: Delete information about PDD process of system.

• bg-restore-state {name}: Show background restore state of system.

14
Chapter 4. DWAd

• check-restore-ready-state {name}: Check whether PDD server is able to receive restore requests.

• space-info {name}: Print space information of system.

• volume-space-info {name}: Print space information for volumes of system.

• insert-rnode {name} {nodename}: Add reserve node to system.

• remove-rnode {name} {nodename}: Delete reserve node from system.

• mark-inactive-node {name} {nodename}: Mark node as inactive for system.

• remove-inactive-node {name} {nodename}: Mark inactive node as active/reserve for system.

• switch-nodes {name} {active node} {reserve node}: Move active node to reserve node list and vice versa.

• start-on-nodes {name} {exec}: Start command on all nodes of system.

• start-on-nodes-wait {name} {exec}: Start command on all nodes of system and wait for partition shutdown.

• storage-backup {name} {volume id} {level} {expire time}: Do backup of a system (via Storage daemon).

• abort-backup {name}: Abort current backup(s) of a system.

• shrink-db {name} {size} {on-persistent-volume (1|0)} {dry-run (1|0)}: Shrink database volume size to specified
number of MiB.

• abort-shrink {name}: Abort current shrink(s) of a system.

• shrink-status {name}: Get status of shrink operation.

• storage-restore {name} {volume id} {backup name}: Restore backup into system (via Storage daemon).

• storage-restore-nonblocking {name} {volume id} {backup name}: Restore backup into system (via Storage
daemon).

• storage-restore-virtual {name} {volume id} {backup name}: Restore backup into system (via Storage daemon).

• wait-state {name} {state} {timeout}: Wait for database system to reach defined state (running, setup).

• dump-data {1|0} {1|0} {filename}: Produce DWAd data dump. Consistent state = yes[1]/no[0]; all nodes =
yes[1]/no[0].

• dump-data-xml {1|0] {1|0] {filename}: Produce DWAd data dump in XML format. Consistent state =
yes[1]/no[0]; all nodes = yes[1]/no[0].

• protect-node-mem {mem}: Save memory on nodes for other purposes (in GB).

• print-protected-node-mem: Print amount of memory saved on each node from EXASolution (in GB).

• allow-inserts {name}: Allow INSERT statements.

• disallow-inserts {name} {rawsize} {memsize}: Disallow INSERT statements (all values in MiB).

• check-dwad-interfaces: Check that all DWAd interfaces are up.

• translate-err {error code}: Translate error code.

Example usage: dwad_client add test

15
4.3. User interface

EXASolution system setup parameters


Before starting a system an appropriate configuration has to be provided. The following table specifies all parameters
that are accepted and will be interpreted by the DWAd for that reason. Some parameters are mandatory whilst
others are not and use standard values as specified:

16
Chapter 4. DWAd

Table 4.1. Valid parameters


Name Type Mandat- Semantics
ory
USE_DBRAM yes/no no use -dbram for database instead of -slb and -
HeapMemory
WRAPPER string yes full path to EXASolution wrapper binary
CONTROLLER string yes full path to EXASolution controller binary
CHECKDBFILE string no full path to EXASolution checkdbfile binary (re-
quired with CHECK_FILES=yes)
MAIN_PORT number yes main port of system
NUMNODES number yes number of system nodes
VERSION string yes system version identifier
CREATE_NEW_DB yes/no no start with -create-new-db flag at first startup; de-
fault=no
NODES_EXCLUSIVE yes/no no use nodes exclusive; default=no
MAY_SHARE_RE- yes/no no by setting to true, reserve nodes may be shared in
SERVE_NODES exclusive system
DWACFG string no extra configuration file for system
TMP_DATAFILE string yes local temporary data file for system
OUTPUTDIR string yes directory for system logs
TRANSACTION_LOGDIR string no directory for transaction logs
DATAFILEPREFIX string yes prefix for local data files
SHORT_DATAFILENAMES yes/no no if set to true, names of datafiles won't include iproc
numbers
USE_COS_STORAGE yes/no no Use StorageD
PERSISTENT_VOLUME_ID number no ID of persistent volume (only valid when using
StorageD)
VOLUME_RE- number no Time after which a volume should be moved to
STORE_DELAY online database nodes (only for information yet)
OVERALL_SLB_SIZE number no overall SLB size of system in MB (must be given
in case of USE_DBRAM=false)
OVERALL_DBRAM number no overall DB RAM size of system in MB (must be
given in case of USE_DBRAM=true)
HEAP_SIZE_PER_NODE number no amount of memory that can be used additionally to
SLB on each node
MEM_OVERHEAD number no memory overhead in percent for system
OPTIMIZE_NODECONF yes/no no optimize node configuration for better load; de-
fault=no
DEBUG yes/no no debug system; default=no
DEBUG_CHILDS yes/no no debug childs of controller process(es); default=no
DO_NOT_CHANGE_FLAGS yes/no no do not unset -create-new-db or -mode=restore flags
after first system stop; default=no
GDBTERM string no terminal used for GDB; system display will be used
by default
GDB_BINARY string no GDB binary to use
XTERM_BINARY string no Xterm binary to use for debugging

17
4.4. Design

Name Type Mandat- Semantics


ory
USE_GDBSERVER yes/no no use GDBServer instead of GDB; default=no
GDBSERVER_BINARY string no full path to GDBServer binary
GDBSERVER_BASEPORT number no GDBServer base port
GDB_DEBUG_NODE string no logical node of controller to debug; use -1 to debug
all system controllers
PARAMS string no extra parameters for systems (may overwrite para-
meters set by DWAd)
FIRST_TIME_PARAMS string no extra parameters for systems used at first startup
NODENAMES string yes names of all system nodes including reserve nodes;
nodes will be inserted in given order
IPROC_NUMBERS numbers no explicit order with which to set iproc numbers to
nodes
IPROC_AFFINITIES numbers no explicit order with which to set iproc affinities to
nodes
ENVIRONMENT string no environment of system processes
RESTART_RETRYS number no number of restarts after system process crashes;
default=0 (infinite number of retrys)
ADMIN_UIDS numbers no optional set of user IDs that are allowed to start/stop
system
OWNER_UID number no owner user ID of system partition; by default set to
user who added system
OWNER_GID number no owner group ID of system partition; by default set
to user group who added system
SHUTDOWN_WAIT_TIME number no time to wait for shutdown of system before sending
SIGKILL; default=0 (infinite time)
SHUTDOWN_RETRY_TIME number no time for resending retry shutdown event/signal to
system partition before sending SIGKILL; de-
fault=0 (disabled)
DEFAULT_BACKUP_DIR string no default local backup directory
RESTORE_DB yes/no no use -mode=restore for first startup of system; de-
fault=no
REDUNDANCY number no redundancy of system; redundancy-1 nodes may
fail at once to immediately restart system
NO_SUBPROCESS_KILL yes/no no do not kill remaing EXASolution processes after
controller exit; default=no

4.4. Design

4.5. Recovery mechanisms

4.5.1. Process failures


Unexpected exits of processes are detected by receiving appropriate events (ev_node_stopped or ev_process_exited)
from the Cored. Process failures are distinguished in several categories, each having an own recovery mechanism:

18
Chapter 4. DWAd

• A process exits while an appropriate system is in state "starting": Here, a SIGABRT signal will be sent to all
remaining system processes by the current DWAd master node and the system will be set to state "setup" im-
mediately.

• A process exits with return code 0 while an appropriate system is in state "running": Here, a SIGTERM signal
will be sent to all remaining system processes by the current DWAd master node checking for shutdown of all
system processes in an appropriate time frame as specified by the user at setup time.

• A process exits with return code 1 while an appropriate system is in state "running": This case is handled as
in the former one except that the system will be restarted after all processes exited. It will be logged that a
controller requested a system restart.

• A process exits with a return code not being 0 or 1 while an appropriate system is in state "running": This case
is handled as in the former one except that the system log shows up with an unexpected process exit not being
a controller request for a restart.

4.5.2. Node crashes


Node crashes are detected by receiving an ev_node_config_update event from the Cored. This kind of event will
only be regarded for system node crashes if being meant for the DWAd partition as a system can only be started
on nodes being included in a DWAd partition. All elements in the removed_nodes vector will cause a lookup for
active system processes on an appropriate node.

4.6. Network splits


A split of a cluster network will cause actions also being regarded for node crashes. A subcluster having 50% or
less of all DWAd nodes is regarded as not having a quorum required to continue in making global data updates
being used for starting/stopping system partitions. If no subcluster has a quorum, no global data update can be
performed until one subcluster is large enough again to continue.

4.7. Additional information

19
20
Chapter 5. Loggingd

Chapter 5. Loggingd
5.1. Mission
The Loggingd service is used for situations in which it would be too expensive to look into all local service logs
while searching for a certain expression like a warning or an alert. It is able to collect logs from an appropriate
service over all nodes and can show them in an time-sorted fashion with a provided client tool. Thus, important
log entries may be found very fast and almost independently from cluster size.

5.2. Startup
This daemon has to be started as a service in its own partition. This partition must not be the root partition and
may be resized later by appropriate COS tools. Keep in mind to start the Loggingd service on every node dedicated
to collect global logs.

The following command line arguments are allowed:

• --default-log-dir=dir: Default directory in which to write global logs.

Furthermore, the following arguments may/must be specified on the command line:

Example usage: cosexec --all-nodes --auto-add --auto-restart --single-instance


-- logd

5.3. User interface

5.3.1. logd_client
This program shows information about a LoggingD partition and triggers some administrative tasks.

The following command line arguments are allowed:

• --trigger-log-setup=pid: Send event to processes in specified partition to open a global log.

• --trigger-log-rotation=service: Do a log rotation for the specified service.

• --reopen-logfiles: Force local LoggingD to reopen all service logfiles.

• --reopen-all-logfiles: Force all LoggingD instances to reopen all service logfiles.

• --show-services: Show registered services in cluster.

5.3.2. logd_collect
This program show global log entries from specified services.

The following command line arguments are allowed:

• -s time, --start-time=time: Search log entries since specified time.

• -h time, --stop-time=time: Search log entries until specified time.

• -p priorities, --prio=priorities: Search log entries with specified priorities separated by comma.

21
5.3. User interface

• -o priorities, --not-prio=priorities: Search log entries not having specified priorities separated by comma.

• -n nids, --nodes=nids: Comma separated list of Loggingd nodes that should be asked for log entries.

• -l directory, --log-dir=directory: Directory to search for logs.

• -q, --quiet: Do not show additional information.

Furthermore, the following arguments may/must be specified on the command line: {service name 1}
[[service name 2] ...]

Example usage: logd_collect --start-time "2008-12-03 09:00:00.000000" -o LOGD_INFO


DWAd

22
Chapter 6. Lockd

Chapter 6. Lockd
6.1. Mission
The Lockd service is able to manage global operations for EXAClusterOS processes by providing a simple interface
without need to know about implementation details of parallel algorithms used. Its interface may be utilised to
manage global locks as well as global barriers. The Lockd is furthermore able to detect e.g. process crashes in
global critical sections and communicates this kind of problems to its clients.

6.2. Startup
This daemon has to be started as a service in its own partition. This partition must not be the root partition and
may be resized later by appropriate EXAClusterOS tools. Keep in mind to start the Lockd service on every node
dedicated to use global locks. It requires one partition with an active LoggingD service.

The following command line arguments are allowed:

• This program does not take any specific flags..

Example usage: cosexec --all-nodes --auto-add --auto-restart --single-instance


-- lockd

23
24
Chapter 7. StorageD

Chapter 7. StorageD
7.1. User interface

7.1.1. csinfo
Csinfo queries (and displays) information about exiting volumes and nodes. The level of detail for the displayed
information con be specifed usign the --level option.

The following command line arguments are allowed:

• -v, --volume: print information about one (or all) volume(s).

• -V, --vol-config: print configuration for given/all volume(s).

• -n, --node: print information about one (or all) node(s).

• -N, --node-config: print configuration of given node(s).

• -r, --range: print range of bytes (or blocks) on each node of the given volume.

• -D, --hdd-info: print HDD info.

• -H, --hdd-state: print HDD state.

• -U, --hdd-usage: print HDD usage.

• -M, --md-info: get info about persistent metadata on given node(s).

• -b, --block_range: print range in blocks instead of bytes.

• -m, --masters_only: consider only master segments (regardless of their current state) when requesting block-
ranges (default: use node IDs of the deputy segment if the master is offline)..

• -i id, --id=id: ID of the node or volume (default : all nodes/volumes).

• -l level, --level=level: level of detail for the information displayed (default: 0).

• -p, --include-partitions: include partitions when requesting volume or node information (may slow down the
request)..

• -R, --red-dist: show the redundancy distribution of the given node (or all nodes)..

• -g, --graph: get graph description of a volume.

Example usage: csinfo -v -i 3

7.1.2. cslabel
With cslabel you can perform the following actions: - add a label to a volume - remove a label (or all labels) from
a volume - find all volumes with a given label Every volume can have an arbitrary number of labels. You can
add/remove labels at any time (no matter whether the volume is online or offline) if you are the owner of the
volume. The labels of one volume must be unique, i.e. no label is added more than once.

The following command line arguments are allowed:

25
7.1. User interface

• -v volume ID, --volume=volume ID: ID of a volume..

• -l string, --label string: the label to be added, removed or searched for.

• -F, --front: add label to the front of the list (default: end of the list).

• -a, --add: Add a label.

• -r, --remove: remove a label (or all if no label given).

• -f, --find: Find all volumes containing the given label.

Example usage: cslabel -v 0 -l dont_crash

7.1.3. csvol
With csvol you perform the following actions: - create a volume - delete a volume - close a volume - change per-
missions - change owner and group - lock a volume - unlock a volume - change shared flag - change priority -
clear data on a volume

The following command line arguments are allowed:

• -c, --create: create a volume.

• -d, --delete: delete a volume.

• -E, --close: close a volume.

• -M, --chmod: change permissions.

• -O, --chown: change owner/group.

• -l, --lock: lock a volume.

• -L, --unlock: unlock a volume.

• -s number, --size number: number of blocks.

• -b number, --block-size number: number of sectors.

• -S number, --stripe-size number: number of blocks for a stripe.

• -r number, --redundancy number: the rundancy for the volume (max. 255).

• -h string, --hdd-type string: type-specifier for the HDD.

• -p string, --permissions string: permissions for the volume (e.g. rwxr--r--).

• -H, --shared: volume can be opened by different partitions simultaneously..

• -C, --use-crc: use checksums to ensure data integrity.

• -B VERTICAL | HORIZONTAL, --block-distribution=VERTICAL | HORIZONTAL: the type of block distri-


bution for the new volume.

• -P number, --prio number: priority for I/O operations.

• -m number, --num-master-nodes number: the number of master nodes.

• -n string, --nodes string: list of node IDs to be used for the volume..

26
Chapter 7. StorageD

• -N node names, --node_names=node names: list of node names to be used for the volume.

• -f string, --conf-file string: /path/to/the/volume/configuration/file.

• -v number, --volume number: volume ID.

• -V, --verbose: enable some additional output.

• -t string, --partition string: Partition for whom the volume should be closed..

• -u number, --uid number: user id.

• -g number, --gid number: group id.

• -U string, --user string: user name.

• -G string, --group string: group name.

• -a string, --set-shared string: set shared flag (on/off).

• -A number, --set-priority number: change priority of an existing volume (0-20).

• -D, --clear-data: clear data by overwriting it with zeroes.

• -Z number, --clear-bytes number: nr. of bytes per segment to be cleared.

Example usage: csvol --create --conf-file /path/to/file.conf

7.1.4. csctrl
This program implements some control mechanisms for the storage service. It is able to: - start (or restart)
EXAStorage - shut down EXAStorage - print the current UUID-NodeID mapping - suspend (or resume) an
EXAStorage node

The following command line arguments are allowed:

• -s, --start: start the storage service.

• -d, --stop: stop the storage service.

• -u, --suspend: suspend a storage node.

• -U, --resume: resume a storage node.

• -M, --uuid_map: print the current NodeID-UUID mapping.

• -C, --clear-md: clear metadata on given nodes.

• -n node IDs, --nodes=node IDs: list of physical node IDs.

• -N node names, --node-names=node names: list of node names.

• -f, --force: force suspend action..

• -c string, --conf-file string: Storage configuration file.

• -v, --valgrind: Start EXAStorage with valgrind.

• -P, --valgrind-supp: Use default EXAStorage valgrind suppression file.

• -V string, --valgrind-options string: additonal valgrind options.

27
7.1. User interface

• -r, --replace: Replace a running storage service.

• -t number, --timeout number: Timeout in seconds (default: infinite time).

• -A, --auto-add: Enable the 'auto-add' feature of COS (see cosexec(1)).

• -R, --auto-restart: Enable the 'auto-restart' feature of COS (see cosexec(1)).

• -S slave-mode, --slave-mode=slave-mode: Start program in slave-mode (DO NOT USE!).

• -p number, --master-pid number: Partition ID of the master process (SLAVE-MODE ONLY!).

• -m number, --master-nid number: Node ID of the master process (SLAVE-MODE ONLY!).

• -w, --write: write metadata (SLAVE-MODE ONLY!).

• -b string, --binary string: Storage service binary path (DUMMY).

Example usage: csctrl --start --binary /path/to/storage_bin --conf


/path/to/storage.conf

7.1.5. csmd
With csmd you can perform the following actions: - print information about existing metadata files - convert
metadata files to another version - compare different metadata files - print history of modifications

The following command line arguments are allowed:

• -p, --print: print info (content and version) about the serialized metadata.

• -v, --print-version: print version of serialized metadata.

• -r, --print-revision: print revision of serialized metadata.

• -c, --convert: convert given metadata to current version.

• -R, --revert: revert given metadata.

• -C, --compare: Compare two metadata files.

• -H, --history: print history of modifications.

• -l, --list: list all revisions of given metadata file.

• -f string, --file string: File to print/convert.

• -F string, --compare-file string: Compare metadate to this file.

• -d string, --directory string: Directory containing files to print/convert.

• -X, --to-text: convert given metadata file to text format (XML in case of COS serialization).

• -B, --to-binary: convert given metadata file to binary format.

• -b, --benchmark: benchmark (de)serialization of given file.

Example usage: csmd -v -f metadata

28
Chapter 7. StorageD

7.1.6. csmove
With csmove you can: - move one or more nodes of a given volume to another node. - move a single segment to
another node Moving a node/segment may be denied in any of the following cases: - one or more segments on the
source node have a snapshot map. - one or more segments on the source node have a redundancy segment on the
destination node. - the source node is used for recovering another node and no other suitable node is available.
However, one can force the movement using the --force flag (see below).

The following command line arguments are allowed:

• -m, --move-nodes: move one or more nodes.

• -M, --move-segment: move a single segment to another node.

• -v volume ID, --volume-id=volume ID: volume ID.

• -i segment ID, --segment-id=segment ID: segment ID.

• -s node IDs, --src-nodes=node IDs: list of physical node IDs that should be moved.

• -S node names, --src-names=node names: list of node names that should be moved.

• -d node IDs, --dest-nodes=node IDs: list of physical node IDs that contains the destination node for each node
that should be moved (matched by index).

• -D node names, --dest-names=node names: list of node names that contains the destination node for each node
that should be moved (matched by index).

• -f, --force: force movement.

Example usage: csmove -m -v 1 -s 0,1 -d 2,3

7.1.7. csrec
With csrec you can perform the following actions: - list all existing recovery maps (in the cluster or for a volume)
- see the recovery completion status of a given volume - start recovery on a given node (and volume) - enable and
disable background recovery (on selected nodes) Only one action can be performed at a time.

The following command line arguments are allowed:

• -l, --list: List all existing recovery maps for the given volume (-v) or all volumes (default).

• -s, --show: Show completion status of all recovery maps of the given volume (in percentage)..

• -r, --restore-node: Restore the data on a given node from a redundancy node.

• -R, --stop-restore-node: Stop restoration of given node.

• -g, --restore-segment: Restore the data on a given segment from redundancy.

• -G, --stop-restore-segment: Stop restoration of given segment..

• -d, --restore-hdd: Restore the data on the given hdd from a redundancy node.

• -h string, --hdd string: HDD name.

• -o, --bg-off: Turn background recovery OFF on the given nodes (default: all nodes)..

• -O, --bg-on: Turn background recovery ON on the given nodes (default: all nodes).

29
7.1. User interface

• -p number, --priority number: Set volume priority (default: 10).

• -P string, --bg-rec-profile string: Set background recovery profile ('one', 'two', 'three').

• -L number, --bg-rec-limit number: Set background recovery throughput limit.

• -C, --calibrate-limit: Restart calibration of background recovery throughput limit.

• -v number, --volume number: Volume ID.

• -n string, --nodes string: List of physical node IDs..

• -S string, --segments string: List of segment IDs..

• -N string, --node-names string: List of node names.

• -m, --master-only: Restore only master segment(s).

• -w, --wait-for-restoration: Wait for all restoration to finish.

• -t number, --timeout number: Time (in seconds) to wait for restoration (default: 300).

• -e, --offline: Also include offline nodes in --list and --show.

Example usage: csrec -l

7.1.8. csresize
With csresize one can resize an existing volume in various ways: - append new nodes to the volume - remove
nodes from the volume - enlarge each node/segment of a volume - shrink each node/segment of a volume - increase
the redundancy of an existing volume - decrease the redundancy of an existing volume Only one action can be
performed at a time. See below for explanations on each resizing method.

The following command line arguments are allowed:

• -a, --append: append nodes to the volume..

• -r, --remove: remove nodes from the given volume..

• -p, --purge: remove nodes without replacing the affected segments.

• -f, --force: force node (re)moving.

• -e, --enlarge: enlarge volume..

• -s, --shrink: shrink volume..

• -i, --inc-redundancy: increase redundancy by given level.

• -d, --dec-redundancy: decrease redundancy by given level.

• -l number, --redundancy-level number: level of redundancy (for increasing/decreasing redundancy).

• -b number of blocks, --blocks=number of blocks: number of blocks by which each segment of a volume should
be enlarged/shrinked.

• -m number of master-nodes, --master-nodes=number of master-nodes: number of master nodes that should be


appended.

• -v volume ID, --volume-id=volume ID: ID of volume to resize.

30
Chapter 7. StorageD

• -n node IDs, --nodes=node IDs: list of physical node IDs that should be appended/removed.

• -N node names, --nodenames=node names: list of node names that should be appended/removed.

Example usage: csresize -v 3 -n 0,1

7.1.9. cssetio
Some details.

The following command line arguments are allowed:

• -a set, --application-io=set: enable/disable application I/O.

• -i set, --internal-io=set: enable/disable internal I/O.

• -v volume, --volume-id=volume: Volume to apply changes to.

Example usage: csupconf -N host1 -c config.txt

7.1.10. cssnap
With cssnap you can perform the following actions: - create a new snapshot - release an existing snapshot relation
- list all existing snapshots (in the cluster or for a volume) - see the completion status of a snapshot - enable and
disable background snapshot creation (on selected nodes) Only one action can be performed at a time.

The following command line arguments are allowed:

• -c, --create: Create snapshot for the given volume.

• -e, --release: End/release snapshot relation of the given snapshot volume.

• -l, --list: List all existing snapshots for the given volume (-v) or all volumes (default).

• -s, --show: Show simple completion status of all snapshots of the given volume.

• -S, --show-detailed: Show detailed completion status of all snapshots of the given volume..

• -o, --bg-off: Turn background snapshot creation OFF on the given nodes (default: all nodes)..

• -O, --bg-on: Turn background snapshot creation ON on the given nodes (default: all nodes).

• -p number, --priority number: Set priority for background operations (default: 100).

• -v number, --volume number: Volume ID (snapshot or source, depending on the action).

• -r number, --redundancy number: Redundancy for the snapshot volume (default: 1).

• -h string, --hdd-type string: Type of HDD for the snapshot-volume (default: same as source vol.).

• -C, --copy-labels: Inherit all labels from the source volume (default: false).

• -d, --distinct-nodes: Use only distinct nodes for auto-selection (i.e. nodes that are not used by the source volume).

• -L, --local-nodes: Use the volume's current master nodes for building the snapshot.

• -n string, --nodes string: List of physical node IDs that should be used for the snapshot (default: auto-select)
or for enabling/disabling background snapshot creation (default: all nodes)..

31
7.1. User interface

• -N string, --node-names string: List of node names that should be used for the snapshot (default: auto-select)
or for enabling/disabling background snapshot creation (default: all nodes)..

• -w, --wait: Wait for snapshot completion (if bg.copy is ON).

Example usage: cssnap -c -v 0

7.1.11. csconf
With csconf one can modify the following parameters that are part of the EXAStorage configuration file: -
max_bg_mem - max_oth_mem - max_num_bg_ops - max_bytes_per_bg_op - use_group_io - optimize_sort -
use_nw_aio - use_nw_ooo - clean_interval It can also print the default values for the current installation.

The following command line arguments are allowed:

• -p, --print-defaults: print all default values.

• -m, --modify: modify values.

• -n number, --node-id number: phys node ID.

• -N string, --node-name string: hostname.

• -M number, --max-bg-mem number: max. memory usage (in bytes) for background operations.

• -S number, --max-oth-mem number: max. memory usage (in bytes) for other operations.

• -o number, --max-num-bg-ops number: max. nr. of concurrent background operations.

• -b number, --max-bytes-per-bg-op number: max. nr. of bytes per background operation.

• -g string, --group-io string: grouped I/O (on/off).

• -s string, --optimize-sort string: optimize I/O by sorting phys. operations (on/off).

• -a string, --nw-aio string: enable/disable async. nw. comm (on/off).

• -O string, --nw-ooo string: enable/disable out-of-order nw. comm (on/off).

• -c number, --clean-interval number: set interval (in sec.) for the I/O cleaner thread..

• -P number, --min-nw-perf number: the min. assumed network throughput (in bytes per sec)..

• -t number, --min-timeout number: the min. timeout for any operation..

• -d number, --rec-delay number: time period (in seconds) to wait before starting background data restoration.

• -w number, --space-warn-threshold number: the space usage threshold at which a warning is generated..

Example usage: csconf --print-defaults

32
Chapter 8. Management

Chapter 8. Management
8.1. Logging
Following logging types are available with EXAClusterOS:

• Syslog

This service is used to log internal Linux kernel information and should be used for debugging purposes only.
This information is locally available on the node. All syslog information is written into one file:
/var/log/all.log.

Syslog files are rotated with OS internal rotation mechanisms (e.g. logrotate).

• Loggingd

Loggingd is used for log information needed for cluster monitoring. This information is available on every
node and with the EXAoperation.

This service writes its data to the /var/log/logd directory and is rotated automatically with the cos-
logdir-rotate command.

• Directories

Some services use logging in individual process files. For example Cored uses it for writing the output of all
commands started with EXAClusterOS. This information should be used only for debugging purposes and is
available only locally on each node. Following services uses this logging type:

• EXASolution: /<disk>/<database>/log

• Cored: /var/log/cored

Alle log directories are rotated automatically with the cos-logdir-rotate command.

• EXAoperation

This service logs its data to the following directory:

/usr/opt/EXASuite-6/EXAClusterOS-6.0.6/var/exaoperation/log

This directory is rotated automatically with the cos-logdir-rotate command.

For rotating of log data, the command cos_logdir_rotate is available. Its command line parameters are as
follows:

1. <logdir> - directory for rotation.

2. <max files> - this number of logfiles will be let at the log directory.

3. <backups> - if number of backups is larger then this parameter, older backups will be removed.

4. <signal> - the signal which should be send to processes of open files. If this parameter is not given, open
files will not be rotated.

5. <pattern> - if this regular expression pattern matches a name of a file and this file is open, the file will be
renamed and the process which holds the file will receive the appropriate signal.

A log rotation is done with the following steps:

33
8.1. Logging

1. Create directory .logbackup for backups.

2. Read the list of files and the process id list of processes which use the files.

3. If signal is given, then all files with the matched pattern are renamed. Processes that used an appropriate file
will receive the designated signal.

4. Pack all files to a backupfile at .logbackup directory.

5. If <max files> is not equal to zero, then all files will be renamed to <filename>.<number>. If <max
files> is equal to zero, then the old files will be removed.

The following list gives an overview about logfiles on the license server.

1. /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/var/exaoperation/log/access.log

Web GUI access log

2. /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/var/exaoperation/log/output.log

EXAoperation log

3. /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/var/exaoperation/log/zeo.log

ZOPE Enterprise Objects log

4. /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/var/exaoperation/log/zope.log

ZOPE log

The following list gives an overview about logfiles in the initrd environment on client nodes (accessible with rssh
n{number} command)

1. /var/log/hddmount.log

log of hard disk(s) partitioning on activated nodes

2. /var/log/hddinit.log

log of hard disk(s) initialization on nodes to install

3. /var/log/cos_startup.log

last log at startup before changing into CentOS base environment

The following list gives an overview about logfiles that can be found on client nodes via login with SSH on port
22.

1. /d02_data/{database name}/log/process/*

local logs of EXASolution processes

The following list gives an overview about logfiles that can be found on the license server as well as on client
nodes via SSH to port 22.

1. /var/log/cored/*

local logs of EXAClusterOS processes (Cored, DWAd, Loggingd, Lockd)

2. /var/log/logd/*

34
Chapter 8. Management

global logs of EXAClusterOS process (Cored, DWAd, Lockd, EXASolution), can be viewed globally with
logd_collect command

35
36
Chapter 9. EXAoperation

Chapter 9. EXAoperation
9.1. Components
The EXAoperation service is composed of following components:

• Application server

This is the core component which implements the frontend and all backend processes.

• Configuration database

The database is used to store data of the application server.

• Command execution service

This service executes processes which are initiated from the frontend.

• Unix services

For following standard unix services are used for booting and scheduling:

• DHCP Server

• XINET Service

• Cron daemon

• TFTPd

• SSH

• Syslog

9.2. Logging
To show the information of the loggingd service, a periodical job is triggered with crond every minute. This
job uses the logd_collect command to collect the data and write it to a file. This file is afterwards accessible
with the EXAoperation frontend.

9.3. Permissions
To manage permissions in EXAoperation you have following predefined roles:

1. Master

This role has all possible rights.

2. Administrator

As administrator you can manage the cluster, but you can not change the license, password for disk encryption
or set a master role to a user.

3. Supervisor

37
9.3. Permissions

A supervisor has the same rights as administrator without the posibility to change something. It is used to
monitor the cluster.

4. User

A user has the same rights as supervisor, but can only view the basic state of nodes and databases.

In EXAoperation every user can have a different role for any object, so it is possible to have administrator rights
on one database and user or supervisor rights on all other objects.

38
Chapter 9. EXAoperation

Table 9.1. EXAoperation permissions


Description Anonym- Man- Master Adminis- Super- User
ous ager trator visor
View EXACluster password definitions - - Yes Yes - -
Use EXACluster cluster manager operations - - Yes Yes - -
Change EXACluster public network - - Yes Yes - -
View EXACluster users. passwords and grants - - Yes Yes - -
definitions
Add EXACluster logging service - - Yes Yes - -
View EXASolution instance - - Yes Yes Yes Yes
Use users view on EXACluster - - Yes Yes Yes Yes
Change EXASolution instance - - Yes Yes - -
Use BucketFS view on EXACluster - - Yes Yes Yes Yes
Change EXACluster backup password - - Yes Yes - -
View EXACluster node content - - Yes Yes Yes Yes
Change remote volume instance - - Yes Yes - -
Use EXACluster node disks view - - Yes Yes Yes Yes
Add EXACluster node disk - - Yes Yes - -
Edit EXACluster monitor thresholds - - Yes Yes - -
Add EXASolution instance - - Yes Yes - -
Change User Management folder content - - Yes Yes - -
View EXACluster content - - Yes Yes Yes Yes
Add EXACluster JDBC driver - - Yes Yes - -
View remote volume - - Yes Yes Yes Yes
Use EXACluster JDBC driver view - - Yes Yes Yes Yes
Use EXACluster operation field - - Yes Yes Yes Yes
Change EXACluster route - - Yes Yes - -
Change script extension instance - - Yes Yes - -
Use network view on EXACluster - - Yes Yes Yes Yes
Change EXACluster node disk - - Yes Yes - -
Use EXASolution manager operations - - Yes Yes - -
Use EXACluster Software view - - Yes Yes Yes Yes
Add EXACluster VLAN - - Yes Yes - -
Use EXASolution backup - - Yes Yes Yes Yes
View EXAoperation node priority list - - Yes Yes Yes Yes
Explicitely delete logfiles/coredumps - - Yes Yes - -
Change EXACluster BucketFS properties - - Yes Yes - -
Change EXACluster license - - Yes Yes - -
Add BucketFS - - Yes Yes - -
Use EXACluster node manager operations - - Yes Yes - -
Use EXASolutions view on EXACluster - - Yes Yes Yes Yes
View EXACluster logging manager - - Yes Yes Yes -
View EXACluster node - - Yes Yes Yes Yes

39
9.3. Permissions

Description Anonym- Man- Master Adminis- Super- User


ous ager trator visor
Use EXACluster JDBC manager views - - Yes Yes Yes -
Use jobs view on EXACluster - - Yes Yes Yes Yes
View EXACluster Server Management group - - Yes Yes Yes Yes
View EXACluster versions definitions - - Yes Yes Yes Yes
Use storage view on EXACluster - - Yes Yes Yes Yes
Use EXASolution manager view - - Yes Yes Yes Yes
View EXACluster license - - Yes Yes Yes -
Call plugin functions - - Yes Yes - -
Change EXACluster logging service - - Yes Yes - -
View EXACluster node Server Management - - Yes Yes Yes -
values
View EXASolution DB backup operations - - Yes Yes Yes Yes
Change EXACluster content - - Yes Yes - -
Change EXACluster default disk definitions - - Yes Yes - -
Change EXACluster network definitions - - Yes Yes - -
Add EXACluster node - - Yes Yes - -
View User Management folder content - - Yes Yes Yes -
Add EXACluster route - - Yes Yes - -
View script extension - - Yes Yes Yes Yes
Use software view on EXACluster - - Yes Yes Yes Yes
View EXACluster logging service - - Yes Yes Yes -
View EXACluster node disk - - Yes Yes Yes Yes
Change EXACluster versions definitions - - Yes Yes - -
Add EXACluster user - - Yes Yes - -
Change EXACluster storage - - Yes Yes - -
Use EXACluster cluster manager views - - Yes Yes Yes Yes
Use EXASolution DB backup operations - - Yes Yes - -
Change EXACluster Server Management - - Yes Yes - -
group
Edit EXACluster remote syslog settings - - Yes Yes - -
Use EXACluster node manager views - - Yes Yes Yes Yes
Change EXACluster node content - - Yes Yes - -
Add remote volume instance - - Yes Yes - -
Download support information - - Yes Yes - -
Add EXACluster instance - - Yes Yes - -
View BucketFS - - Yes Yes Yes Yes
Change BucketFS - - Yes Yes - -
Change EXACluster node - - Yes Yes - -
Change EXACluster JDBC driver - - Yes Yes - -
Change EXACluster key store - - Yes Yes - -
View EXASolution statistics - - Yes Yes Yes -
Add EXACluster public network - - Yes Yes - -

40
Chapter 9. EXAoperation

Description Anonym- Man- Master Adminis- Super- User


ous ager trator visor
Manage BucketFS - - Yes Yes - -
View EXACluster route - - Yes Yes Yes Yes
Add script extension instance - - Yes Yes - -
View EXACluster public network - - Yes Yes Yes Yes
View EXACluster storage - - Yes Yes Yes Yes
Use EXACluster JDBC manager operations - - Yes Yes - -
View EXACluster key store - - Yes Yes Yes Yes
View EXACluster network definitions - - Yes Yes Yes Yes
Use nodes view on EXACluster - - Yes Yes Yes Yes
Add EXACluster key store - - Yes Yes - -
Change EXACluster VLAN - - Yes Yes - -
Change EXACluster user - - Yes Yes - -
Change EXACluster disk password - - Yes - - -
Add EXACluster Server Management group - - Yes Yes - -
Manage EXACluster storage - - Yes Yes - -
Use logservices view on EXACluster - - Yes Yes Yes Yes
View EXACluster default disk definitions - - Yes Yes Yes Yes
View EXACluster VLAN - - Yes Yes Yes Yes

9.4. Processes

9.4.1. Installation
If a node is in installation mode its boot process is shown as follows:

1. Boot node over ethernet.

2. Check node parameters.

3. Initialize and format disks.

4. Transfer installation packages.

5. Install software.

6. Configure node parameters.

7. Start all required services.

9.4.2. Booting
On activation of a node its boot process is shown as follows:

1. Boot node over ethernet.

2. Check node parameters.

41
9.4. Processes

3. Reinitialize and check disks.

4. Configure node parameters.

5. Start all required services.

9.4.3. Restore
For restoring a database, the database must be created in EXAoperation, but not be started. The steps of the restore
process are the following:

1. Check whether enough files are available. This means having a node file for every node number and a metadata
file.

2. Copy files from archive nodes to database nodes.

3. Trigger database to read backup files, i.e. start the restore process over EXAoperation.

When using offline backups, the backup files must be moved to the archive nodes including an empty 'dontexpire'
file. They have to be located into a directory whose name matches the backup name (which is usually a timestamp)
The files also have to be located on the nodes where they were created first. If not, the metadata file ("backup.ini")
has to be adjusted first.

9.4.4. Restore (Storage)


Storage databases provide three different restore mechanisms:

1. Blocking restore: This restore mechanism loads all data into the database before setting the database into a
mode in which it accepts connections. This is the fastest restore mechanism.

2. Non-blocking restore: This mechanism only loads the most necessary part of the data into the database and
immediately sets the database into a mode in which it accepts connections. This mechanism is useful for de-
creasing the downtime of the database, but will load data slightly slower than the blocking restore mechanism.

3. Virtual-access restore: The mechanism starts a database in a read-only mode. Thus, no write operations are
possible. It is useful for restoring only a single object of a database backup into another database via IM-
PORT/EXPORT.

Hint: Remote archive volumes can only be used for blocking restore processes. All other restore types require
further functionality that is only available in internal cluster volumes. Thus, a remote backup must be moved to a
cluster archive volume first in such a situation.

Figure 9.1. Show foreign database backups

42
Chapter 9. EXAoperation

To restore backups from other databases, use the "Show foreign database backups" button in the "EXASolution
Database Backup List" form (see screenshot above). One can restore backups from arbitrary EXASolution databases
as long as the number of nodes is similar.

9.5. Renaming of databases


The following steps have to be done to rename a database:

1. Create a backup from the database with EXAoperation. You may stop it right afterwards.

2. Create a new database with a different name (and a different communication and connection port if the old
database is still running) but with the same number of nodes.

3. Edit the backup properties and insert the new database into "Systems". Afterwards, restore the backup into
the new database.

4. Now, the old database may be deleted.

Remember the following aspects:

1. After deleting the old database, all backups of this system will be deleted.

2. Change the references from the old database to the new one, especially in the scheduler, monitor and backup
view.

3. You may have to change the database name in external tools, e.g. in monitoring tools.

9.6. Update servers


EXAoperation is able to connect to remote FTP/SFTP servers to retrieve update packages. Thus, you may deliver
such packages to several EXASuite clusters without being forced to upload it to every single EXAoperation instance.
Consider the following file/link hierarchy on a specified remote server:

4.x.2/EXAClusterOS-4.x.2_LS-Update-CentOS-6.2_x86_64.pkg
4.x.3/EXAClusterOS-4.x.2_LS-Update-CentOS-6.2_x86_64.pkg ->
../4.x.2/EXAClusterOS-4.x.2_LS-Update-CentOS-6.2_x86_64.pkg
4.x.3/EXASolution-4.x.3_x86_64.pkg

This structure enables an EXASuite cluster in version 4.x.2 to find database version 4.x.3 (see the file link) and
to show this version as applicable database version in the appropriate EXAoperation form. It would further enable
an EXASuite cluster with a version smaller than 4.x.2 to update to EXASuite version 4.x.2.1 A patchlevel for
version 4.x.2 would have to be located in the 4.x.2 directory.

9.7. Interfaces
The following interfaces are available with EXAoperation.

• EXAoperation frontend - HTTP on port 80 and HTTPS on port 443 on all cluster nodes

• Storage archive volumes - Ports 2021 (FTP), 2022 (SFTP), 2080 (HTTP), and 2443 (HTTPS) on all cluster
nodes

1
Even with a wrong file hierarchy, EXAoperation will be able to detect whether an update is applicable. Every package is signed and will be
checked once the update process is in progress.

43
9.8. Maintenance user

9.7.1. Storage archive volumes


As noted above, Storage archive volumes can be reached via SFTP, FTP, HTTP, and HTTPS. The username and
password are specified in EXAoperation (see 'Users' form). A backup contains the following directory/file structure,
given by the following example of a system "testdb", its first level 0 backup with id 0 and two online database
nodes:

• testdb/id_0/level_0/node_0/metadata_{timestamp}

• testdb/id_0/level_0/node_0/backup_{timestamp}

• testdb/id_0/level_0/node_1/backup_{timestamp}

To store this backup to an offline storage system, all these three files must be downloaded. Restoring this backup
from an offline storage system requires all these files to be uploaded into exactly this file structure. Alternatively,
you may choose to download this backup in a compressed form. Therefore, you would have to download the vir-
tual file testdb/id_0.tar.gz (which must be uploaded to exactly the same location when restoring from an
offline archive). Consider that these tar.gz files are generated on-the-fly. This may limit the download speed to
somewhat around 20 MiB/s while the limiting factor of uncompressed files will normally be the network speed.

9.8. Maintenance user


An administrator may specify a dedicated maintenance on installation of a new EXASuite cluster. This user has
the ability to see and modify the current EXAoperation status as well as the public network interface configuration
via SSH or a login shell. It is also able to lock and unlock the password of the root account and to set its own
password.

9.9. Failover

9.9.1. License server failure


EXAoperation is able to do a failover in case a license server fails. This has the following consequences:

• Every cluster node is able to host EXAoperation (not only license servers).

• The interfaces of EXAoperation (HTTP/HTTPS/XML-RPC) are reachable on all cluster nodes, but only one
node has a running instance of EXAoperation (e.g. is able to boot nodes) and thus is the EXAoperation master
node. All other nodes act as proxy servers and will redirect requests.

• In case of a failover, EXAoperation will not be connectable for a short period of time (up to one minute) and
a user will experience "Connection refused" messages during this process.

• The default license server is named as 'n10'. This name should be used for XML-RPC functions such as call-
Plugin() as the former 'license' name only refers to the current EXAoperation master node. Additional license
servers may be named as 'n1' up to 'n9'.

9.9.2. Power outage and checksum mismatches


In case of a power outage, the database may not start because of checksum mismatch errors in EXAStorage. In
that case, there are two possible solutions (available in EXAoperation after EXAStorage has been shut down):

1. Fix checksums: All checksums on the selected node(s) are verified against the data and if there is a mismatch,
the affected checksum will be regenareted. This operation may take hours to complete (depending on the
amount of data that has already been written).

44
Chapter 9. EXAoperation

2. Discard checksums: All checksums on the selected node(s) are reset to 0 (as if no data has ever been written).
This operation takes only a few minutes, but the checksums are lost and only regenerated as new data is
written.

9.10. Automatic node reordering for Storage databases


In case of using a Storage database, the database reserve and active nodes will be shuffled to match the volume
master nodes (if possible and necessary). Thus, proactively moving a database node of a running database would
consist of the following steps:

1. Move appropriate volume node to target node and wait for completion.

2. Stop database.

3. Start database.

9.11. Volume restore delay


For Storage databases there is a so called restore delay. This delay can be used to specify a timeout after which a
Storage data volume moves to the nodes of its database after a failover in case the failed node stays offline during
that period of time. The recommended value is 10 minutes.

9.12. Using the EXAoperation browser interface


This section provides a brief description of use cases that can be solved with the EXAoperation browser interface.
For this reason, each use case has a test that can be executed by an administrator. To start all tests, the following
command has to be used: $COS_DIRECTORY/var/exaoperation/inst/bin/test -m exaoperation
--all. Note that this test requires more than one valid running client node to be configured, a logservice (logser-
vice1) that fetches all EXAClusterOS service sources and valid EXASolution binaries. There must be no database
and no scheduler or backup service. One more note: The backup password must be set to the default backup
password.

9.12.1. Form "EXASolution Instances"


Figure 9.2. Example view: Form "EXASolution Instances"

Add a database
Precondition(s):

A database with the configured name does not exist.

45
9.12. Using the EXAoperation browser interface

Postcondition(s):

The newly configured database is shown in the "EXASolution Instances Information" form.

Steps to do:

1. Click "Add" button

2. A new form opens. Configure the database and click "Add" once again.

Expected message in logservice(s) as regular expression:

1. User \d added system.

Delete a database
Precondition(s):

The selected database(s) are not running and there is no logservice or backup that is configured with this database.

Postcondition(s):

The selected database(s) will not show up anymore in the "EXASolution Instances Information" form.

Steps to do:

1. Select database(s).

2. Click "Delete" button.

3. Answer question with "OK"

Expected message in logservice(s) as regular expression:

1. User \d deleted system.

Start a database
Precondition(s):

The selected database(s) must exist and not be running.

Postcondition(s):

The selected database(s) will be started.

Steps to do:

1. Select database(s).

2. Click "Start" button.

Expected messages in logservice(s) as regular expressions:

1. User \d requests startup of system.

2. System started successfully in partition \d.

46
Chapter 9. EXAoperation

Restart a database
Precondition(s):

The selected database(s) must be running.

Postcondition(s):

The selected database(s) will be restarted.

Steps to do:

1. Select database(s).

2. Click "Restart" button.

Expected message in logservice(s) as regular expression:

1. Successfully restarted database \'\w+\'

Stop a database
Precondition(s):

The selected database(s) must be running.

Postcondition(s):

The selected database(s) will be stopped.

Steps to do:

1. Select database(s).

2. Click "Shutdown" button.

Expected message in logservice(s) as regular expression:

1. User \d requests shutdown of system.

47
9.12. Using the EXAoperation browser interface

9.12.2. Form "EXASolution Instance"


Figure 9.3. Example view: Form "EXASolution Instance"

Start database
Precondition(s):

The database must exist and not be running. More than 50% of all cluster nodes and all necessary database nodes
must be online.

Postcondition(s):

The database will be started.

Steps to do:

1. Select operation "Startup".

2. Click "Submit" button.

Expected messages in logservice(s) as regular expressions:

48
Chapter 9. EXAoperation

1. User \d requests startup of system.

2. System started successfully in partition \d.

Restart database
Precondition(s):

The database must be running.

Postcondition(s):

The database will be restarted.

Steps to do:

1. Select operation "Restart".

2. Click "Submit" button.

Expected message in logservice(s) as regular expression:

1. Successfully restarted database \'\w+\'

Stop database
Precondition(s):

The database must be running. More than 50% of all cluster nodes must be online.

Postcondition(s):

The database will be stopped.

Steps to do:

1. Select operation "Shutdown".

2. Click "Submit" button.

Expected message in logservice(s) as regular expression:

1. User \d requests shutdown of system.

Create database
Precondition(s):

The database must not be running or in any other operation (e.g. backup). More than 50% of all cluster nodes must
be online. Each node that is member of the database system must be online or must have been online before.

Postcondition(s):

All database directories will have been created. The DWAd service knows about the database and will start it in
create mode next time. Within the next minute, a Samba share for this system will have been exported.

Steps to do:

1. Select operation "Create".

49
9.12. Using the EXAoperation browser interface

2. Click "Submit" button.

Expected message in logservice(s) as regular expression:

1. User \d requests new system setup.

Start database in maintenance mode


Precondition(s):

The database must exist and not be running. More than 50% of all cluster nodes must be online.

Postcondition(s):

The database will be started in maintenance mode.

Steps to do:

1. Select operation "Start maintenance".

2. Click "Submit" button.

Expected message in logservice(s) as regular expression:

1. User \d requests startup of system with -maintenance flag.

Remove database
Precondition(s):

The database must exist on not be running. More than 50% of all cluster nodes must be online.

Postcondition(s):

All files of the database (log files, data files and backups) will be removed.

Steps to do:

1. Select operation "Delete".

2. Click "Submit" button.

Expected messages in logservice(s) as regular expressions:

1. User \d deleted system.

2. User \d added system.

Change EXASolution database parameters


Precondition(s):

The database may be in any state if only changing reserve nodes. For all other parameters, the database has to be
created.

Postcondition(s):

The database has the same state (created, running) as before.

50
Chapter 9. EXAoperation

Steps to do:

1. Click "Properties" button.

2. Change database parameters appropriately and click "Apply".

Start a backup
Precondition(s):

The selected database system has to be started beforehand.

Postcondition(s):

A backup process for the selected system will have been started.

Steps to do:

1. Click "Start backup" button.

Expected messages in logservice(s) as regular expressions:

1. Start backup 20[0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]_[0-9][0-9] on system *[a-zA-Z0-9_\.-


].

2. User 1000 requests system backup.

Enlarge EXASolution database


Precondition(s):

The database has been created, is not running and has an appropriate number of reserve nodes.

Postcondition(s):

The database will have been enlarged and an appropriate number of reserve nodes are active nodes now. The
database will have been started.

Steps to do:

1. Select operation "Enlarge".

2. Click "Submit" button.

3. A new form opens. Enter the number of new active nodes and click "Apply".

Expected messages in logservice(s) as regular expressions:

1. User \d requests to increase system size by one node (*[a-zA-Z0-9_\.-]).

2. User \d requests startup of system.

After startup, you should explicitely issue the command "REORGANIZE DATABASE" in the database. This assures
that data are reorganized in a balanced approach over all database nodes.

Shrink database
Precondition(s):

51
9.12. Using the EXAoperation browser interface

The database has been started and is connectible.

Steps to do:

1. Select operation "Shrink".

2. Click "Submit" button.

3. A new form opens. Enter the target volume size and click "Apply".

Add a new scheduler job


Precondition(s):

A scheduler and an appropriate database exists.

Steps to do:

1. Click "Schedule" button.

2. Fill form fields and click "Add" button.

Get EXASolution statistics


Precondition(s):

The database must be online (connectable).

Postcondition(s):

The browser will download a ZIP file containing the database statistics of the last month, which can be sent to
EXASOL to provide useful usage graphs over its web portal.

Steps to do:

1. Select "Get Statistics".

2. A new form opens. Insert database user and password.

3. Click "Get Statistics".

52
Chapter 9. EXAoperation

9.12.3. Form "EXAStorage"


Figure 9.4. Example view: Form "EXAStorage"

Start EXAStorage
Precondition(s):

EXAStorage has not been started yet.

Steps to do:

1. Click "Startup storage service" button.

Create a new data/archive volume


Postcondition(s):

An appropriate volume with a new ID will have been created.

Steps to do:

1. Click "Create volume" button.

2. Set volume parameters.

3. Click "Apply" button.

53
9.12. Using the EXAoperation browser interface

The number of volume nodes must be a multiple of the "Master nodes" property. In case of an archive volume,
the size will be enhanced properly to match 4 GB boundaries per node and disk.

Enlarge an archive volume


Precondition(s):

The volume has been created and no database backup to this volume is in progress.

Postcondition(s):

The volume will be enlarged by at least the number of GiB entered in the form.

Steps to do:

1. Click on archive volume.

2. Click "Enlarge SDFS".

3. Enter number of GiB this volume should be enlarged by.

Only enlarge an archive volume in case no write operation (database backup) is currently made to it. Thus, check
the backup state for all databases on the "EXASolution form" and, in case of no backup, afterwards enlarge the
volume.

Create a new remote archive volume


Postcondition(s):

An appropriate remote volume with a new ID will have been created.

Steps to do:

1. Click "Create remote volume" button.

2. Set volume parameters.

3. Click "Apply" button.

In contrast to data/archive volumes, remote archive volumes will be named as r0000/r0001/... (instead of
v0000/v0001/...). Restore processes from remote archive volumes must be made in blocking mode. Thus, the non-
blocking and virtual restore mode is not available. Furthermore, EXAoperation will not delete expired backups
from remote volumes (it is under another control). To simplify automatic backup deletion processes on the server
side, an expire file will be created. This is placed as {database name}/id_{x}/level_{y}/node_0/expire_{expiration
timestamp}, where the expiration timestamp has the format "%Y%m%d%H%M".

Remove a volume
Precondition(s):

No database refers to the selected volume as a data volume.

Steps to do:

1. Select volume(s).

2. Click "Remove volume" button.

54
Chapter 9. EXAoperation

Add unused disks for one or more nodes


Postcondition(s):

All formerly unused Storage disks of the selected node(s) will have been configured to be usable by the Storage
service. The selected node(s) must be online or suspended.

Steps to do:

1. Select node(s).

2. Click "Add unused disks" button.

Restart Storage service on one or more nodes


Postcondition(s):

The Storage service will have been restarted on the selected node(s).

Steps to do:

1. Select node(s).

2. Click "Restart service on node" button.

Shutdown Storage service on all nodes


Postcondition(s):

The Storage service will have been stopped.

Steps to do:

1. Click "Shutdown service on all nodes" button.

9.12.4. Form "EXAStorage Volume Node Information"


Figure 9.5. Example view: Form "EXAStorage Volume Node Information"

55
9.12. Using the EXAoperation browser interface

Restore volume on node


Postcondition(s):

Volume segments will be restored on this node.

Steps to do:

1. Click "Restore node" button.

Stop restore of volume on node


Postcondition(s):

Any active restore of volume segments will be stopped on this node.

Steps to do:

1. Click "Stop node restore" button.

9.12.5. Form "EXAStorage Node Information"


Figure 9.6. Example view: Form "EXAStorage Node Information"

Suspend/Resume node
Postcondition(s):

The node will be suspended/online after this action.

Steps to do:

1. Click the "Suspend node" or "Resume node" button.

56
Chapter 9. EXAoperation

Restart Storage service on one node


Steps to do:

1. Click "Restart service on node" button.

Enable Storage node background recovery


Steps to do:

1. Click "Enable background recovery" button.

Set storage node options


Steps to do:

1. Set space_warn_threshold and background recovery limit.

2. Apply to all nodes.

Disable Storage node background recovery


Steps to do:

1. Click "Disable background recovery" button.

Enable Storage devices


Steps to do:

1. Select device(s) which should be activated.

2. Click "Enable devices" button.

Disable Storage devices


Steps to do:

1. Select device(s) which should be deactivated.

2. Click "Disable devices" button.

Remove Storage devices


Steps to do:

1. Select device(s) which should be removed.

2. Click "Remove devices" button.

Add Storage devices


Steps to do:

57
9.12. Using the EXAoperation browser interface

1. Select device(s) which should be added.

2. Click "Add selected devices" button.

9.12.6. Form "EXAStorage Node Device Information"


Figure 9.7. Example view: Form "EXAStorage Node Device Information"

Disable Storage device


Steps to do:

1. Click "Disable device" button.

Enable Storage device


Steps to do:

1. Clock "Enable device" button.

9.12.7. Form "EXABucketFS Services"


Figure 9.8. Example view: Form "EXABucketFS Services"

58
Chapter 9. EXAoperation

Create new EXABucketFS service


Postcondition(s):

The service will be reachable soon afterwards on all database nodes in case of using HTTP and/or HTTPS.

Steps to do:

1. Click "Add" button and insert appropriate attributes.

2. Click "Add" button.

59
9.12. Using the EXAoperation browser interface

9.12.8. Form "Cluster Nodes"


Figure 9.9. Example view: Form "Cluster Nodes"

Upload node list


Precondition(s):

No node of the node list file exists.

Postcondition(s):

All nodes of the node list file will be shown in the "EXACluster Nodes Information" form.

60
Chapter 9. EXAoperation

Steps to do:

1. Click "Browse" button and select node list file.

2. Click "Submit" button.

Add a node
Precondition(s):

A node with the provided number and/or same private/public MAC address(es) does not exist.

Postcondition(s):

The newly configured node is shown in the "EXACluster Nodes Information" form and may be installed/booted
after switching power on.

Steps to do:

1. Click "Add" button.

2. A new form opens. Configure the node and click "Add" once again.

If specifying a private and/or public failsafety network interface, network interfaces will be bonded in a active-
backup mode fashion. Thus, these interfaces may be connected to two different switches and will always commu-
nicate over one active link. If specifying no RAID type on a Storage device, each disk device will contain the size
of the disk. Thus, a size of 50 GB on disk devices /dev/sda and /dev/sdb will result in a total size of 100 GB for
the usable Storage space.

Delete a node
Postcondition(s):

The deleted node will not show up again in the "EXACluster Nodes Information" form and is not installed/booted
after power on/reboot.

Steps to do:

1. Select node(s).

2. Click "Delete" button.

Copy node
Precondition(s):

A node with the provided number and/or same private/public MAC address(es) does not exist.

Postcondition(s):

The newly configured nodes is shown in the "EXACluster Nodes Information" form and may be installed/booted
after switching power on.

Steps to do:

1. Click "Add" button.

2. A new form opens. Configure the node and click "Add" once again.

61
9.12. Using the EXAoperation browser interface

3. Open a newly created node and press "Copy" Button

If specifying a private and/or public failsafety network interface, network interfaces will be bonded in a active-
backup mode fashion. Thus, these interfaces may be connected to two different switches and will always commu-
nicate over one active link. If specifying no RAID type on a Storage device, each disk device will contain the size
of the disk. Thus, a size of 50 GB on disk devices /dev/sda and /dev/sdb will result in a total size of 100 GB for
the usable Storage space.

Change node properties


Precondition(s):

The node must exist.

Postcondition(s):

The node properties are changed appropriately.

Steps to do:

1. Select node.

2. Click "Properties" button.

3. Change node properties appropriately and click "Apply".

Toggle ID LED of a node


Precondition(s):

The selected node(s) has/have a LOM card.

Postcondition(s):

The front panel identify light of the selected node(s) will light up for some time.

Steps to do:

1. Select node(s).

2. Select operation "Toggle ID LED".

3. Click "Submit" button.

Start a node
Precondition(s):

The selected node(s) has/have a LOM card.

Postcondition(s):

A "power on" command will be sent to the LOM card of the selected node(s).

Steps to do:

1. Select node(s).

2. Select operation "Startup".

62
Chapter 9. EXAoperation

3. Click "Submit" button.

Reboot a node
Precondition(s):

The node has been started before and can be reached via SSH.

Postcondition(s):

The node will be rebooted.

Steps to do:

1. Select node(s).

2. Select operation "Reboot".

3. Click "Submit" button.

Stop a node
Precondition(s):

The node has been started before and can be reached via SSH.

Postcondition(s):

The node will be stopped.

Steps to do:

1. Select node(s).

2. Select operation "Shutdown".

3. Click "Submit" button.

Reset a node
Precondition(s):

The selected node(s) has/have a LOM card.

Postcondition(s):

A "power reset" command will be sent to the LOM card of the selected node(s).

Steps to do:

1. Select node(s).

2. Select operation "Shutdown".

3. Click "Submit" button.

63
9.12. Using the EXAoperation browser interface

Power off a node


Precondition(s):

The selected node(s) has/have a LOM card.

Postcondition(s):

A "power off" command will be sent to the LOM card of the selected node(s).

Steps to do:

1. Select node(s).

2. Select operation "Power off".

3. Click "Submit" button.

Install a node
Postcondition(s):

The selected node(s) will be installed during the next boot process. This includes deleting all former data.

Steps to do:

1. Select node(s).

2. Select operation "Install".

3. Click "Submit" button.

Activate a node
Precondition(s):

The selected node(s) has/have been installed before.

Postcondition(s):

The selected node(s) will not be installed during the next boot process. Thus, all database data will remain.

Steps to do:

1. Select node(s).

2. Select operation "Install".

3. Click "Submit" button.

Force filesystem check on a node


Precondition(s):

The selected node(s) has/have been installed before.

Postcondition(s):

The selected node(s) will do a filesystem check during the next boot process.

64
Chapter 9. EXAoperation

Steps to do:

1. Select node(s).

2. Select operation "Force filesystem check".

3. Click "Submit" button.

Apply default disk layout


Precondition(s):

The selected node(s) must be marked to be installed.

Postcondition(s):

The selected node(s) will be configured the the default disk layout after the next installation.

Steps to do:

1. Select node(s).

2. Select operation "Apply default disk layout".

3. Click "Submit" button.

Start cluster services on a node


Postcondition(s):

The cluster node is online and can be used for starting databases. Logs of this node will be shown in monitoring
services.

Steps to do:

1. Select node(s).

2. Select "Start cluster services" as operation.

3. Click "Submit" button.

Cluster services are started automatically on each node as part of the startup procedure. This action is only necessary
after having stopped the cluster services of this node.

Stop cluster services on a node


Postcondition(s):

The cluster node is offline and cannot be used for databases. Logs of this node will not be shown in monitoring
services anymore.

Steps to do:

1. Select node(s).

2. Select "Stop cluster services" as operation.

3. Click "Submit" button.

65
9.12. Using the EXAoperation browser interface

This operation can be useful in case of a defect node. Thus, a node can be analyzed without the danger of using
logs as if doing reboots.

9.12.9. Form "Backups Information"


Figure 9.10. Example view: Form "Backups Information"

Delete backup
Postcondition(s):

The backup and appropriate backup files will have been deleted.

Steps to do:

1. Select backup(s) to delete.

2. Click "Delete" button.

Edit backup expiration


Postcondition(s):

The backup(s) will be shown with the new expiration.

Steps to do:

1. Change expiration of appropriate backup(s).

2. Click "Apply" button.

Restore a backup
Precondition(s):

The selected database system must be created and not be started.

Postcondition(s):

A restore process for the selected system will have been started.

66
Chapter 9. EXAoperation

Steps to do:

1. Select appropriate backup.

2. Select restore type (Blocking, Non-blocking, Virtual access).

3. Click "Restore" button.

Expected messages in logservice(s) as regular expressions:

1. Start restore of backup 20[0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]-[0-9][0-9] [0-9][0-9] on system *[a-zA-


Z0-9_\.-].

2. User \d requests startup of system in restore mode.

3. System is ready to receive a restore event.

4. User ([0-9]*) requests restore of system.

Restore offline backup


Precondition(s):

The selected database system must be created and not be started.

Postcondition(s):

A restore process for the selected system will have been started. An backup.ini.out file will have been generated
in each timestamp directory and contains error messages in case of a failure.

Steps to do:

1. Copy backup files to appropriate archive volume or remote archive volume.

2. Refresh window. The backup should appear now.

3. Select restore type (Blocking, Non-blocking, Virtual access).

4. Click "Restore" button.

Expected messages in logservice(s) as regular expressions:

1. Start restore of backup 20[0-9][0-9]-[0-9][0-9]-[0-9][0-9] [0-9][0-9]-[0-9][0-9] [0-9][0-9] on system *[a-zA-


Z0-9_\.-].

2. User \d requests startup of system in restore mode.

3. System is ready to receive a restore event.

4. User ([0-9]*) requests restore of system.

Virtual access restore processes are not possible from remote archive volumes.

67
9.12. Using the EXAoperation browser interface

9.12.10. Form "Access Management"


Figure 9.11. Example view: Form "Access Management"

Add a new user


Precondition(s):

A user with the configured name does not exist.

Postcondition(s):

The newly configured user is shown in the "EXACluster Users Information" form and may login with {cluster
prefix}.{username} and his appropriate password.

Steps to do:

1. Click "Add" button.

2. A new form opens. Configure the user and click "Add" once again.

Delete a user
Postcondition(s):

The user is not shown anymore in the "EXACluster Users Information" and not able to login.

Steps to do:

1. Select user.

68
Chapter 9. EXAoperation

2. Click "Delete" button.

Change user properties


Postcondition(s):

The user can login with its new credentials.

Steps to do:

1. Select user.

2. Click "Edit" button.

Change EXAoperation TLS certificate


Steps to do:

1. Select file to use as certificate

2. Select file to use as private key

3. Press "Upload key" button.

EXAoperation has to be restarted to use the new certificate

9.12.11. Form "Versions"


Figure 9.12. Example view: Form "Versions"

List EXAClusterOS version and available EXASolution version(s)


Steps to do:

1. All versions are shown in the "EXACluster Versions Information" form.

69
9.12. Using the EXAoperation browser interface

Install new EXASolution version


Steps to do:

1. Click "Browse" button and select EXASolution update file.

2. Click "Submit update file" button.

3. Change versions appropriately for EXASolution systems.

Install new EXAClusterOS version


Precondition(s):

All client nodes have to be stopped.

Steps to do:

1. Click "Browse" button and select EXAClusterOS update file.

2. Click "Submit update file" button.

3. Wait for EXAoperation to issue a "Please shutdown databases and nodes and restart license server" message.

4. Shutdown databases, Storage and cluster nodes (including additional license servers).

5. Restart license server.

6. Wait until EXAoperation can be reached again via HTTP/HTTPS.

7. Change EXAoperation configuration appropriately, e.g. version of EXASolution instances.

8. Do a final restart of the license server.

Remove EXASolution version


Precondition(s):

No database is configured with the specified version.

Steps to do:

1. Select EXASolution version.

2. Click "Remove EXASolution version".

Upload EXAoperation patchlevel


Steps to do:

1. Click "Browse" button and select patchlevel file.

2. Click "Submit update file" button.

3. Restart EXAoperation.

70
Chapter 9. EXAoperation

Download software update from remote site.


Precondition(s):

An update URL has been defined.

Steps to do:

1. Click "List available updates" button.

2. Choose update.

3. Click "Update" button.

9.12.12. Form "UDF Libraries"


Figure 9.13. Example view: Form "UDF Libraries"

Add new UDF Library


Steps to do:

1. Click "Add" button and fill the opened form.

2. Click "Add" button to add a new library.

The following parameters should be defined(* means a parameter is required):

• *Name - name of the exported module. For Java, this defines the name, by which the module is referenced in
the script with the "%jar NAME.jar" command. Note: the extension ".jar" will be automatically added by
EXASolution, so you do not have to specify it. For Python and R, the name should be defined according to
the documentation.

• *URL - HTTP or FTP link to the .jar (Java) or tar.gz (Python, R) file. The module will be copied from this
location and installed on each configured EXASolution node.

71
9.12. Using the EXAoperation browser interface

• *Language - language of the module.

• Description - short description of the module.

• HTTP Proxy - settings for an HTTP proxy.

• HTTPS Proxy - settings for an HTTPS proxy.

Delete UDF Library


Steps to do:

1. Select the desired library in list.

2. Click "Delete" button and confirm the dialog.

EXAoperation deletes only the settings for the selected module. If a module was already installed, it remains on
the nodes and can still be used.

Install UDF Library


Steps to do:

1. Select the desired library in list.

2. Click "Install" button.

EXAoperation installs the selected module on all nodes. When installing Python or R modules, EXAoperation
checks all existing dependencies. The installation logs for all nodes can be downloaded by pressing 'Show Install-
ation Logs'.

Cleanup all UDF Libraries


Steps to do:

1. Click "Cleanup all" button.

2. Confirm the dialog.

EXAoperation uninstalls and deletes all installed modules from all nodes. However, the settings are not deleted
and the modules can be reinstalled.

72
Chapter 9. EXAoperation

9.12.13. Form "JDBC Drivers"


Figure 9.14. Example view: Form "JDBC Drivers"

Add new JDBC driver.


Postcondition(s):

A new JDBC driver has been uploaded and can be used by databases.

Steps to do:

1. Click "Add" button.

2. Fill in appropriate driver meta information.

3. Click "Add" button.

4. Select radio button for new driver.

5. Select file to upload.

6. Click "Upload" button.

You may upload more than one file for a JDBC driver. Just repeat steps 4-6 as often as required.

Remove JDBC driver files.


Postcondition(s):

All driver files for the selected JDBC driver will be removed.

Steps to do:

73
9.12. Using the EXAoperation browser interface

1. Select driver.

2. Click "Cleanup" button.

Change JDBC driver properties.


Postcondition(s):

All properties will have been changed properly.

Steps to do:

1. Select driver.

2. Click "Properties" button.

3. Change properties and apply with "Apply" button.

Delete JDBC driver.


Postcondition(s):

The selected driver will have been removed and is not usable anymore by any database.

Steps to do:

1. Select driver.

2. Click "Delete" button.

74
Chapter 9. EXAoperation

9.12.14. Form "EXACluster Debug Information"


Figure 9.15. Example view: Form "EXACluster Debug Information"

Download debug information


Steps to do:

1. Select debug information to download from.

2. Select nodes to download debug information from.

3. Optionally, estimate the size of uncompressed debug information selected. This will be the maximum file
size that you have to download.

4. Click "Download debug information".

Please bear in mind that in case of granting access for this form to a user, this user will be able to download database
logs regardless of having permission on the database itself.

75
9.12. Using the EXAoperation browser interface

9.12.15. Form "Monitoring Services"


Figure 9.16. Example view: Form "Monitoring Services"

Add new logservice


Postcondition(s):

The new logservice will show up in the "EXACluster Logging Information"

Steps to do:

1. Click "Add" button.

2. Select lowest priority that logservice should show, EXAClusterOS services and EXASolution systems that
should be shown.

3. Click "Add" button.

Currently, a user may choose between five different EXAClusterOS services. These are: (1) EXAoperation - This
service logs general information about the cluster, e.g. boot processes of client nodes. (2) DWAd - The DWAd
service logs general information about EXASolution systems, like database startup/shutdown. (3) Lockd - This
base service is necessary for the DWAd. (4) Load - Each node in the cluster checks its load every minute. If it is
above a defined limit, a warning/error message will be logged. Furthermore, every client node will log its load as
an information each minute. (5) Storage - Yet unused. The message priorities have the following meaning: (A)
Information - State confirmation. (B) Notice - Some state of the system changed. (C) Warning - Something unex-
pected happened, but this should not affect usability of the system. (D) Error - An error occured that needs inter-
ventioni from an administrator.

Delete logservice
Postcondition(s):

The logservice will not show up again in the "EXACluster Logging Information"

Steps to do:

1. Select logservice.

2. Click "Delete" button.

Show logentries of logservice


Steps to do:

1. Select logservice.

76
Chapter 9. EXAoperation

2. Click "Entries" button.

3. Alternatively, just click on the link of the appropriate logservice.

Search logentries of logservice


Steps to do:

1. Select logservice.

2. Click "Entries" button.

3. Alternatively, just click on the link of the appropriate logservice.

Change properties of logservice


Postcondition(s):

The logservice will use its new properties.

Steps to do:

1. Select logservice.

2. Click "Properties" button.

3. A new form opens. Change logservice properties appropriately and click "Apply".

9.12.16. Form "Threshold Values"


Figure 9.17. Example view: Form "Threshold Values"

Change monitoring thresholds


Postcondition(s):

All thresholds will be changed to appropriate values.

Steps to do:

1. Click "Edit".

2. A new form opens. Change values appropriately and click "Apply".

77
9.12. Using the EXAoperation browser interface

Synchronize to NTP server(s) explicitely


Postcondition(s):

An ntpdate command will have been issued that synchronizes to the NTP server(s) specified. This is useful in
cases where a license server has been left unsynchronized in a way that the NTPd will not change the system time
but an explicit time synchronization is necessary.

Steps to do:

1. Click "Synchronize time now!".

9.12.17. Form "Network"


Figure 9.18. Example view: Form "Network"

Add a network/host route


Postcondition(s):

Client nodes will take over the new configuration after a reboot/startup and the specified network/host will have
been made reachable via the provided gateway.

Steps to do:

78
Chapter 9. EXAoperation

1. Click "Add route".

2. A new form opens. Insert route properties and click "Add".

Delete a network/host route


Postcondition(s):

Client nodes will take over the new configuration after a reboot/startup.

Steps to do:

1. Select route(s).

2. Click "Delete route".

Change default gateway, network address, NTP server and/or time zone
Postcondition(s):

Client nodes will take over the new configuration after a reboot/startup.

Steps to do:

1. Click "Properties".

2. A new form opens. Change properties appropriately and click "Apply".

Add a private network


Postcondition(s):

The new private network may be selected for database use. Network interfaces for this private network may be
defined for any node and can be used after the next boot.

Steps to do:

1. Click "Add Private Network".

2. Choose descriptive name for the private network and confirm.

Remove a private network


Precondition(s):

The selected private network must not be in use by any database or node.

Steps to do:

1. Select private network.

2. Click "Remove private network".

When deleting private networks and appropriate node interfaces, these node interfaces are accessible until the next
reboot, if those nodes are online. Thus, any started database using all network interfaces will continue to work as
before.

79
9.12. Using the EXAoperation browser interface

Add a public network


Postcondition(s):

The new public network may be selected for any node and can be used after next reboot

Steps to do:

1. Click "Add Public Network".

2. Choose descriptive name for the public network and confirm.

Add an IPMI card group


Postcondition(s):

The new IPMI card group may be used for any existing or new node IPMI card.

Steps to do:

1. Click "Add IPMI card group".

2. Insert description, user, and password information for group.

Remove an IPMI card group


Precondition(s):

The selected IPMI card group must not be in use by any node.

Steps to do:

1. Select IPMI card group.

2. Click "Remove IPMI card group".

Restart license server


Postcondition(s):

The license server will be rebooted.

Steps to do:

1. Click "Restart license server".

2. Answer question with "OK".

Stop license server


Postcondition(s):

The license server will start a shutdown.

Steps to do:

1. Click "Shutdown license server".

80
Chapter 9. EXAoperation

2. Answer question with "OK".

Move EXAoperation to another node


Postcondition(s):

EXAoperation will move to the specified node.

Steps to do:

1. Select online node from list.

2. Click "Move EXAoperation to specified node".

3. Answer question with "OK".

Change EXAoperation node priorities


Postcondition(s):

EXAoperation will try to move to nodes with higher priority in case it fails on the current main node.

Steps to do:

1. Click "EXAoperation node priorities".

2. Move nodes up (higher priority) or down (lower priority).

3. Click "Apply".

81
9.12. Using the EXAoperation browser interface

9.12.18. Form "License"


Figure 9.19. Example view: Form "License"

Upload license
Postcondition(s):

The new license is shown and the sum of memory sizes over all running databases may be less or equal to the allowed
database memory.

Steps to do:

1. Click "Browse" button and select license file.

2. Click "Upload license" button.

82
Chapter 9. EXAoperation

9.13. EXAoperation Add/Edit Forms


This section explains parameters and their meaning of add/edit forms in the EXAoperation browser interface.

9.13.1. Form "Create EXACluster Node"


Disk devices: Disk devices to use on this node. If empty, the cluster

Disk RAID: Software RAID type to use for node.

RAID 10 Redundancy: Redundancy of RAID 10 (if used).

Disk Encryption: Type of encryption to use on data disks.

Number: Number of node in cluster. Each node number must be equal and above 10. This parameter can only be
set when adding a new node.

Node Unique Identification: Unique Node Identification in this cluster.

External Number: This number defines the external IP address.

Console Redirection: Enable/disable redirection of kernel messages to a TTY instead of the monitor.

Spool Disk: Disk to use for spool data (data used for loader processes).

MAC Private LAN: MAC address of first (private) LAN interface.

MAC Public LAN: MAC address of second (public) LAN interface.

MAC SrvMgmt: MAC address of Server Management interface.

SrvMgmt Group: Group, that the Server Management Card of this node belongs to (if any).

Extra Private Network Interfaces: Further private network interfaces of node.

Extra Public Network Interfaces: Further public network interfaces of node.

PXE Boot Interface: Interface to use for PXE boot.

Install Node: Enable/disable to install node on next boot.

Wipe Disks: Enable/disable wipe of disks of node on boot. Wiping may be a very time-consuming process.

Force Filesystem Check: Force filesystem check on next boot of this node.

Use 4 KiB Sectors for Disks: Use 4 KiB alignment for hard disks, is required for hard disks without 512 byte sector
size emulation and improves I/O performance on regular disks.

Enable Virtualization: Enable usage of virtualization features on this node for startup of virtual machines.

Label: Label of node, e.g. an ID string that identifies a node in a data center.

CPU Scaling Governor: Governor to use for power saving of node.

Hugepages (GiB): Amount of hugepages in GiB to use for databases on this node. This is recommended for nodes
with large amounts of RAM (> 512 GiB) to save process memory and must be smaller than the amount of DB
RAM on this node. See "Hugepages" chapter in manual for details.

83
9.13. EXAoperation Add/Edit Forms

9.13.2. Form "EXACluster Node Properties"


Disk devices: Disk devices to use on this node. If empty, the cluster

Disk RAID: Software RAID type to use for node.

RAID 10 Redundancy: Redundancy of RAID 10 (if used).

Disk Encryption: Type of encryption to use on data disks.

Number: Number of node in cluster. Each node number must be equal and above 10. This parameter can only be
set when adding a new node.

Node Unique Identification: Unique Node Identification in this cluster.

External Number: This number defines the external IP address.

Console Redirection: Enable/disable redirection of kernel messages to a TTY instead of the monitor.

Spool Disk: Disk to use for spool data (data used for loader processes).

MAC Private LAN: MAC address of first (private) LAN interface.

MAC Public LAN: MAC address of second (public) LAN interface.

MAC SrvMgmt: MAC address of Server Management interface.

SrvMgmt Group: Group, that the Server Management Card of this node belongs to (if any).

Extra Private Network Interfaces: Further private network interfaces of node.

Extra Public Network Interfaces: Further public network interfaces of node.

PXE Boot Interface: Interface to use for PXE boot.

Install Node: Enable/disable to install node on next boot.

Wipe Disks: Enable/disable wipe of disks of node on boot. Wiping may be a very time-consuming process.

Force Filesystem Check: Force filesystem check on next boot of this node.

Use 4 KiB Sectors for Disks: Use 4 KiB alignment for hard disks, is required for hard disks without 512 byte sector
size emulation and improves I/O performance on regular disks.

Enable Virtualization: Enable usage of virtualization features on this node for startup of virtual machines.

Label: Label of node, e.g. an ID string that identifies a node in a data center.

CPU Scaling Governor: Governor to use for power saving of node.

Hugepages (GiB): Amount of hugepages in GiB to use for databases on this node. This is recommended for nodes
with large amounts of RAM (> 512 GiB) to save process memory and must be smaller than the amount of DB
RAM on this node. See "Hugepages" chapter in manual for details.

9.13.3. Form "EXACluster Node Disk Properties"


Disk label: User-specified label for disk.

Size of disk (GiB): Size of disk in GiB.

Size of disk (GiB): Size of disk in GiB.

84
Chapter 9. EXAoperation

Disk number: Number of disk used in the disk name.

9.13.4. Form "Edit EXASolution Instance"


Add Reserve Nodes: List of nodes that should be added to the system. Normally, these nodes will be added as reserve
nodes, but may be added as active database nodes in case of too many nodes failed and there are no enough active
database nodes.

Remove Reserve/Failed Nodes: List of nodes that should be removed from the system. Only inactive database
nodes can be selected here.

Deactivate Database Nodes: List of nodes that should be deactivated for this system. When the database is running,
only reserve or failed nodes can be selected here.

Reactivate Database Nodes: List of deactivated nodes that should be reactivated for this system.

Comment: User comment for database

Storage data volume: Volume for EXASolution database data.

Storage volume restore delay: Move failed volume nodes to use reserve nodes automatically after given amount
of time, or disable it with no value.

Max volume size (GiB): Maximal size of database data volume (in GiB).

Network interfaces: List of network interfaces to use for database. Leave empty to use all possible network interfaces.

Version: EXASolution Version to use for this system.

LDAP Server URLs: LDAP server URL(s) to use for remote database authentication, e.g. ldap://192.168.16.10.
Multiple servers must be separated by commas.

Extra DB parameters: Further EXASolution parameters. Use with care! This can only be set by a user with role
"Master".

Connection port: Port to use for client connections.

Database RAM (GiB): Database RAM consumption of system over all nodes (memory will be shared evenly
between nodes).

Auditing: Enables/disables auditing for this database system.

9.13.5. Form "Create EXASolution Instance"


DB Name: Name to use for database system.

Comment: User comment for database

Number of online Nodes: Number of online database nodes. If specifying more nodes than this number, all further
specified nodes will be used as reserve nodes.

Node List: List of nodes to use for this database system.

Disk Name: Logical disk to store data and log files to.

Storage data volume: Volume for EXASolution database data.

Storage volume restore delay: Move failed volume nodes to use reserve nodes automatically after given amount
of time, or disable it with no value.

85
9.13. EXAoperation Add/Edit Forms

Max volume size (GiB): Maximal size of database data volume (in GiB).

Network interfaces: List of network interfaces to use for database. Leave empty to use all possible network interfaces.

Version: EXASolution Version to use for this system.

LDAP Server URLs: LDAP server URL(s) to use for remote database authentication, e.g. ldap://192.168.16.10.
Multiple servers must be separated by commas.

Extra DB parameters: Further EXASolution parameters. Use with care! This can only be set by a user with role
"Master".

Connection port: Port to use for client connections.

Database RAM (GiB): Database RAM consumption of system over all nodes (memory will be shared evenly
between nodes).

Auditing: Enables/disables auditing for this database system.

9.13.6. Form "EXACluster Logging Service"


Minimum Log Priority: Lowest priority of messages that this logservice will show. E.g. if set to "Warning" it will
also show "Error" but neither "Notice" nor "Information".

EXAClusterOS Services: Specifies all EXAClusterOS services that will log into this monitor.

EXASolution Systems: Specifies all EXASolution systems that will log into this monitor.

Remote Syslog Server: Specifies IP address of remote syslog service to which messages of this logservice should
be sent to via TCP.

Remote Syslog Protocol: Specifies protocol to use to remote syslog (TCP/UDP).

Default Time Interval: Default time interval that is used for this logservice.

Description: Description of logservice. Can be chosen arbitrarily.

9.13.7. Form "Create Remote Volume Instance"


Archive URL: Remote URL for archive volume. May be of type FTP/HTTP/HTTPS/SMB. URLs are specified
like "ftp://192.168.2.1:12345" or "smb:////192.168.2.1/backupshare".

User: Name of user for accessing remote volume (if any).

Password: Password for accessing remote volume (if any).

Allowed Users: Users that can access this volume.

Read-only Users: Users that have read-only access to this volume.

Labels: Labels for remote volume.

Options: Options for remote volume: (1) cleanvolume (database backup processes delete expired backups from
all databases), (2) noverifypeer (do not check server certificate), (3) nocompression (write plain data), (4) forcessl
(use STARTTLS in FTP connection), (5) webdav (use WebDAV for http-URL), (6) webhdfs (for WebHDFS
URLs), (7) delegation_token (for WebHDFS with Kerberos) (8) s3/s3s (for servers providing S3 compatible API
- with or without server-side encryption).

86
Chapter 9. EXAoperation

9.13.8. Form "Create Jdbc Driver"


JAR Files: List of JAR files, which should be used as JDBC Driver. Should be given as list of URLs.

Driver Name: Name of driver.

Main Class: Main class name of the JDBC Driver.

Prefix: Prefix of the JDBC name, must begin with "jdbc:" and ends with ":", like in "jdbc:mysql:".

Comment: Description of the driver (only for user).

9.13.9. Form "EXACluster Jdbc Drivers"


JAR Files: List of JAR files, which should be used as JDBC Driver. Should be given as list of URLs.

Driver Name: Name of driver.

Main Class: Main class name of the JDBC Driver.

Prefix: Prefix of the JDBC name, must begin with "jdbc:" and ends with ":", like in "jdbc:mysql:".

Comment: Description of the driver (only for user).

9.13.10. Form "Create EXACluster Route"


Type: Type of route (network or host).

Destination: Network/host into which define a route.

Gateway: Gateway to use for sending packets into given destination.

9.13.11. Form "EXACluster Route Properties"


Type: Type of route (network or host).

Destination: Network/host into which define a route.

Gateway: Gateway to use for sending packets into given destination.

9.13.12. Form "Create EXACluster Vlan"


Description: Description for VLAN.

MTU: MTU to use in this VLAN (default, 1500, 9000).

9.13.13. Form "EXACluster Vlan Properties"


Description: Description for VLAN.

MTU: MTU to use in this VLAN (default, 1500, 9000).

87
9.13. EXAoperation Add/Edit Forms

9.13.14. Form "Create EXACluster Public Vlan"


Description: Description for public network

MTU: MTU to use in this public network (default, 1500, 9000).

Network address: Network address, e.g. 192.168.16.0/24

9.13.15. Form "EXACluster Public Vlan Properties"


Description: Description for public network

MTU: MTU to use in this public network (default, 1500, 9000).

Network address: Network address, e.g. 192.168.16.0/24

9.13.16. Form "Create EXACluster Ipmi Group"


Description: Description for IPMI group.

IPMI Type: Type of IPMI card(s) in this group.

IPMI Username: Name of user to use for accessing IPMI card(s).

IPMI Password: Password of IPMI card(s).

IPMI Multiline Password: Multiline password for IPMI card(s). May be useful if using SSH keys.

Public IP Addresses: If set, IP addresses are only reachable in the public net and no DHCP is used.

9.13.17. Form "EXACluster Ipmi Group Properties"


Description: Description for IPMI group.

IPMI Type: Type of IPMI card(s) in this group.

IPMI Username: Name of user to use for accessing IPMI card(s).

IPMI Password: Password of IPMI card(s).

IPMI Multiline Password: Multiline password for IPMI card(s). May be useful if using SSH keys.

Public IP Addresses: If set, IP addresses are only reachable in the public net and no DHCP is used.

9.13.18. Form "Create Key Store"


Description: Description for key store.

Type: Type of key store.

Attributes: Attributes for key store.

9.13.19. Form "Key Store Properties"


Description: Description for key store.

88
Chapter 9. EXAoperation

Type: Type of key store.

Attributes: Attributes for key store.

9.13.20. Form "EXACluster Default Disk Configuration"


Device: List of disk devices to use on each node per default.

Default RAID: Type of software RAID to use on disks per default.

Default RAID 10 Redundancy: Software RAID 10 redundancy on each node per default (if used).

Default Data Encryption: Default data encryption to use for data disks.

Default OS Size (GiB): Default size of OS disk on each node.

Default Swap Size (GiB): Default size of swap disk on each node.

Default Data Disk Size (GiB): Size of default disk reserved for EXAStorage service.

9.13.21. Form "EXACluster System Properties"


Cluster Name: Name of cluster. Can be chosen arbitrarily.

Public Network: Network and appropriate network mask for external network interfaces of client nodes. Example:
192.168.16.0/24

Gateway: Gateway to use for external network interfaces of client nodes.

NTP Server 1: IP address of first NTP server to use for time synchronization.

NTP Server 2: IP address of second NTP server to use for time synchronization.

NTP Key: Key for NTP server (consisting of Key ID and Key [space separated])

DNS Server 1: IP address of first DNS server.

DNS Server 2: IP address of second DNS server.

Search Domain: Search domain to use with DNS servers.

Backup Network Bandwidth per Node (MiB/s): Maximum bandwidth (MiB/s) a backup job is able to transfer
backups from one node to another at once.

OS Memory per Node (GiB): Memory that must not be used by EXASolution on each node. This value should be
2 for databases consuming up to 36 GiB/node, 4 for databases up to 72 GiB/node, 8 for databases up to 144 GiB,
else 16.

Time Zone: Time zone to use for cluster.

Private Network MTU: MTU to use for private network

Public Network MTU: MTU to use for public network

9.13.22. Form "EXACluster Update Url"


URL: URL at which to check updates.

89
9.14. XML-RPC interface

User: User name for accessing software update URL.

Password: Password for accessing software update URL.

9.13.23. Form "EXACluster Remote Syslog Settings"


Use TLS: Defines whether or not to use TLS for transmission of syslog messages.

Certificate of Remote Syslog Server(s): Text containing certificate of remote syslog server(s).

9.13.24. Form "EXACluster Monitor Thresholds"


Warning Level for Disk Usage (in %): Level upon which warnings will be issued about disk usage.

Error Level for Disk Usage (in %): Level upon which errors will be issued about disk usage.

Warning Level for Storage Usage (in %): Level upon which warnings will be issued about storage usage.

Error Level for Storage Usage (in %): Level upon which errors will be issued about storage usage.

Warning Level for Swap Usage (in %): Level upon which warnings will be issued about swap usage.

Error Level for Swap Usage (in %): Level upon which errors will be issued about swap usage.

Warning Level for Load: Level upon which warnings will be issued about load.

Error Level for Load: Level upon which errors will be issued about load.

Coredump Deletion Time (in days): Number of days after which coredumps will be deleted.

9.13.25. Form "EXACluster Password Properties"


Disk Password: Password for disk(s). When changed, all nodes must be reinstalled.

9.14. XML-RPC interface


EXAoperation provides an XML-RPC interface that may be used e.g. for automatically executed scripts.

9.14.1. Fetch log messages


Function name:

logEntries()

Parameter(s):

1. start (optional) of type (year, month, day, hour, minute, second, 0): Start time for log entries.

2. halt (optional) of type (year, month, day, hour, minute, second, 0): Stop time for log entries.

Result type:

[start, stop, [log_entry1, log_entry2, ...]]

Precondition(s):

90
Chapter 9. EXAoperation

An appropriate logservice exists in EXAoperation.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/logservice1');
my $result = $server->call('logEntries');
my $result = $server->call('logEntries', ['2009', '10', '2', '0', '0', '0', '0'], ['2009', '10', '2', '17', '0', '0', '0']);

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/logservice1")
s.logEntries()
s.logEntries([2009, 01, 01, 0, 0, 0, 0], [2009, 01, 01, 12, 0, 0, 0])

Python example output:

>>> pprint.pprint(s.logEntries([2010, 9, 29, 10, 59, 47, 0], [2010, 9, 29, 11, 00, 47, 0]))
[[2010, 9, 29, 10, 59, 47, 0],
[2010, 9, 29, 11, 0, 47, 0],
[{'message': 'n0011.c0001.exacluster.local 0.54 0.17 0.11',
'node': '1',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.780071+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.780071'},
{'message': 'n0013.c0001.exacluster.local 0.08 0.09 0.12',
'node': '4',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.700149+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.700149'},
{'message': 'n0014.c0001.exacluster.local 0.00 0.00 0.00',
'node': '3',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.678909+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.678909'},
{'message': 'n0012.c0001.exacluster.local 0.23 0.26 0.24',
'node': '2',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.571414+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.571414'}]]

9.14.2. Fetch only new log messages


Function name:

logEntriesTagged()

Parameter(s):

1. tag of type string or number: Tag for identifying source.

Result type:

91
9.14. XML-RPC interface

[start, stop, [log_entry1, log_entry2, ...]]

Precondition(s):

An appropriate logservice exists in EXAoperation.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/logservice1');
my $result = $server->call('logEntriesTagged', 'my source id');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/logservice1")
s.logEntriesTagged(3)
s.logEntriesTagged('my source id')

Python example output:

>>> pprint.pprint(s.logEntriesTagged('my source id'))


[[2010, 9, 29, 10, 59, 47, 0],
[2010, 9, 29, 11, 0, 47, 0],
[{'message': 'n0011.c0001.exacluster.local 0.54 0.17 0.11',
'node': '1',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.780071+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.780071'},
{'message': 'n0013.c0001.exacluster.local 0.08 0.09 0.12',
'node': '4',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.700149+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.700149'},
{'message': 'n0014.c0001.exacluster.local 0.00 0.00 0.00',
'node': '3',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.678909+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.678909'},
{'message': 'n0012.c0001.exacluster.local 0.23 0.26 0.24',
'node': '2',
'priority': 'Information',
'strtime': '2010-09-29 11:00:01.571414+02:00',
'system': 'load',
'timestamp': '2010-09-29 11:00:01.571414'}]]

9.14.3. Get state of a database


Function name:

getDatabaseState()

Result type:

92
Chapter 9. EXAoperation

string in ('none', 'setup', 'starting', 'running', 'shutdown')

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
my $result = $server->call('getDatabaseState');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.getDatabaseState()

Python example output:

>>> s.getDatabaseState()
'running'

>>> s.getDatabaseState()
'shutdown'

9.14.4. Get connection state of a database


Function name:

getDatabaseConnectionState()

Result type:

string in ('Yes', 'No')

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
my $result = $server->call('getDatabaseConnectionState');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.getDatabaseConnectionState()

Python example output:

>>> s.getDatabaseConnectionState()
'No'

>>> s.getDatabaseConnectionState()
'Yes'

9.14.5. Get current connection string of a database


Function name:

93
9.14. XML-RPC interface

getDatabaseConnectionString()

Result type:

connection string

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
my $result = $server->call('getDatabaseConnectionString');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.getDatabaseConnectionString()

Python example output:

>>> s.getDatabaseConnectionString()
'10.50.1.11..14:8563'

9.14.6. Get current database nodes


Function name:

getDatabaseNodes()

Result type:

Lists of active, reserve and failed nodes of database

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
my $result = $server->call('getDatabaseNodes');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.getDatabaseNodes()

Python example output:

>>> s.getDatabaseNodes()
{'active': ['n0011', 'n0012', 'n0013'], 'failed': [], 'reserve': ['n0014']}

9.14.7. Get current operation of a database


Function name:

getDatabaseOperation()

Result type:

94
Chapter 9. EXAoperation

string in ('None', 'Create', 'Remove', 'Startup', 'Shutdown', 'Cleanup', 'Backup', 'Restore', 'Failed')

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
my $result = $server->call('getDatabaseOperation');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.getDatabaseOperation()

Python example output:

>>> s.getDatabaseOperation()
'None'

>>> s.getDatabaseOperation()
'Backup'

9.14.8. Start a database


Function name:

startDatabase()

Result type:

This function does not return a result. In case something goes wrong, an exception will be raised.

Precondition(s):

The database is already created and not running.

Expected messages in logservice(s) as regular expression(s):

1. User \d requests startup of system.

2. System started successfully in partition \d.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
$server->call('startDatabase');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.startDatabase()

9.14.9. Stop a database


Function name:

95
9.14. XML-RPC interface

stopDatabase()

Result type:

This function does not return a result. In case something goes wrong, an exception will be raised.

Precondition(s):

The database must be running.

Postcondition(s):

The database will be stopped.

Expected messages in logservice(s) as regular expression(s):

1. User \d requests shutdown of system.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
$server->call('stopDatabase');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.stopDatabase()

9.14.10. Backup a Storage database


Function name:

startStorageBackup()

Parameter(s):

1. volume of type string: Volume to backup data into (e.g. 'v0000' or 'r0010')

2. level of type number: Level of backup

3. expire of type string: Expire time of backup (e.g. '1w 3d')

Precondition(s):

The database must be running. The archive volume must be configured and (if level is larger than 0) a base backup
must exist in the archive.

Postcondition(s):

The backup process will be started.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_exa_db1');
$server->call('startStorageBackup', 'v0000', 2, '3d');

Python example:

96
Chapter 9. EXAoperation

import xmlrpxlic
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_exa_db1")
s.startStorageBackup('v0000', 2, '3d')

9.14.11. Backup a database


Function name:

startBackup()

Parameter(s):

1. database_name of type string: Name of database to backup from.

Result type:

This function does not return a result. In case something goes wrong, an exception will be raised.

Precondition(s):

The database must be running and a backup service must be configured.

Postcondition(s):

The backup process will be started.

Expected messages in logservice(s) as regular expression(s):

1. Start backup 20[0-9][0-9]-[0-9][0-9]-[0-9][0-9]_[0-9][0-9]-[0-9][0-9]_[0-9][0-9] on system *[a-zA-Z0-9_\.-


].

2. User 1000 requests system backup.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/backup1');
$server->call('startBackup', 'exa_db1');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/backup1")
s.startBackup('exa_db1')

9.14.12. Upload and activate an iptables firewall configuration.


Function name:

submitFirewallConfiguration()

Result type:

This function returns 'OK' or an error code.

Python example:

97
9.14. XML-RPC interface

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.submitFirewallConfiguration('n11', file('fw.conf').read())

Python example output:

>>> s.submitFirewallConfiguration('n11', file('fw.conf').read())


'OK'

9.14.13. Get current firewall configuration.


Function name:

getFirewallConfiguration()

Result type:

IP-Tables configuration file of current firewall configuration.

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.getFirewallConfiguration('n11')

Python example output:

>>> print s.getFirewallConfiguration('n11')


*filter
:INPUT DROP [0:0]
:FORWARD DROP [0:0]
:OUTPUT DROP [477:28620]
-A INPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A INPUT -i eth0 -m state --state NEW -j ACCEPT
-A INPUT -i eth1 -p tcp -m tcp --dport 8563 -m state --state NEW -j ACCEPT
-A INPUT -i eth1 -j DROP
-A FORWARD -m state --state RELATED,ESTABLISHED -j ACCEPT
-A FORWARD -i eth0 -m state --state NEW -j ACCEPT
-A FORWARD -o eth0 -m state --state NEW -j ACCEPT
-A FORWARD -i eth1 -p tcp -m tcp --dport 8563 -m state --state NEW -j ACCEPT
-A FORWARD -i eth1 -j DROP
-A OUTPUT -m state --state RELATED,ESTABLISHED -j ACCEPT
-A OUTPUT -o eth0 -m state --state NEW -j ACCEPT
COMMIT

9.14.14. Startup node


Function name:

startupNode()

Result type:

This function returns nothing on success and an exception on error.

Python example:

98
Chapter 9. EXAoperation

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.startupNode("n0011")

9.14.15. Shutdown node


Function name:

shutdownNode()

Result type:

This function returns nothing on success and an exception on error.

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.shutdownNode("n0011")

9.14.16. Get hardware information


Function name:

getHardwareInformation()

Result type:

This function returns the output of dmidecode on the specified node.

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.getHardwareInformation("n11", 1)

Python example output:

>>> import xmlrpclib


>>> s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
>>> # Get dmidecode information of type 0 ("BIOS"). See man dmidecode for details.
>>> print s.getHardwareInformation("n11", 0)
# dmidecode 2.11
SMBIOS 2.4 present.

Handle 0x0000, DMI type 0, 24 bytes


BIOS Information
Vendor: Bochs
Version: Bochs
Release Date: 01/01/2011
Address: 0xE8000
Runtime Size: 96 kB
ROM Size: 64 kB
Characteristics:
BIOS characteristics not supported

99
9.14. XML-RPC interface

Targeted content distribution is supported


BIOS Revision: 1.0

9.14.17. Get list of cluster nodes


Function name:

getNodeList()

Result type:

This function returns a list of all defined cluster nodes.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $result = $server->call('getNodeList');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.getNodeList()

Python example output:

>>> s.getNodeList()
['n0011', 'n0012', 'n0013', 'n0014']

9.14.18. Get current EXAoperation main node


Function name:

getEXAoperationMaster()

Result type:

This function returns a string with the name of the node currently serving EXAoperation.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $result = $server->call('getEXAoperationMaster');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.getEXAoperationMaster()

Python example output:

>>> s.getEXAoperationMaster()
'n0010'

100
Chapter 9. EXAoperation

9.14.19. Get current EXASuite version


Function name:

getEXASuiteVersion()

Result type:

This function returns the version of the EXASuite version currently installed.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $result = $server->call('getEXASuiteVersion');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.getEXASuiteVersion()

Python example output:

>>> s.getEXASuiteVersion()
'6.0.0'

9.14.20. Get list of archive volumes


Function name:

getArchiveFilesystems()

Result type:

This function returns a list of archive volumes that can be used by the calling user.

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/storage")
s.getArchiveFilesystems()

Python example output:

>>> s.getArchiveFilesystems()
{'v0002': ['volume', 2, ['read', 'write']]}

9.14.21. Get list of volumes


Function name:

getVolumeList()

Result type:

This function returns a list of volumes that can be read by the calling user.

101
9.14. XML-RPC interface

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/storage")
s.getVolumeList()

Python example output:

>>> s.getVolumeList()
{'v0001': 'Archive', 'v0000': 'Data', 'v0001': 'Temporary Data', 'r0000': 'Remote Archive'}

9.14.22. Get information about volumes


Function name:

getVolumeInfo()

Result type:

This function returns a list of volumes that can be read by the calling user.

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/storage")
s.getVolumeInfo(1)
s.getVolumeInfo('v0001')

Python example output:

>>> pprint.pprint(s.getVolumeInfo(0))
{'allowed users': ['admin'],
'disk': 'd03_storage',
'labels': ['exa_db1_persistent'],
'name': 'v0000',
'readonly users': [],
'redundancy': 2,
'segments': [['n0011', 'n0012', 'n0013'], ['n0012', 'n0013', 'n0011']],
'size': 8000,
'status': 'ONLINE',
'type': 'Data'}
>>> pprint.pprint(s.getVolumeInfo(1))
{'disk': 'd03_storage',
'labels': ['exa_db1_temporary'],
'name': 'v0001',
'redundancy': 1,
'segments': [['n0011', 'n0012', 'n0013']],
'size': 8000,
'status': 'ONLINE',
'type': 'Temporary Data'}
>>> s.getVolumeInfo(10000)
{'readonly users': [], 'type': 'Remote Archive', 'allowed users': ['admin'], 'name': 'r0000'}

9.14.23. Get list of databases


Function name:

102
Chapter 9. EXAoperation

getDatabaseList()

Result type:

This function returns a list of all defined databases.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $result = $server->call('getDatabaseList');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.getDatabaseList()

Python example output:

>>> s.getDatabaseList()
['exa_db1', 'testdb']

9.14.24. Get information about database


Function name:

getDatabaseInfo()

Result type:

This function returns a dictionary of key-value pairs with information about the specified database.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_testdb');
my $result = $server->call('getDatabaseInfo');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_testdb")
s.getDatabaseInfo()

Python example output:

>>> pprint.pprint(s.getDatabaseInfo())
{'connectible': 'No',
'connection string': '192.168.16.11..12:8563',
'name': 'exa_db1',
'nodes': {'active': ['n0011', 'n0012'], 'failed': [], 'reserve': []},
'operation': 'None',
'quota': 50,
'state': 'setup',
'usage persistent': 100,
'usage temporary': 1,

103
9.14. XML-RPC interface

'persistent volume': 'v0000',


'temporary volume': 'v0001'}

9.14.25. Get list of database backups


Function name:

getBackupList()

Result type:

This function returns a list of backup IDs for the specified database.

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_testdb")
s.getBackupList()

Python example output:

>>> # Non-Storage example


>>> s.getBackupList()
['2010-11-02 14-35 00', '2010-11-02 14-35 01']

>>> # Storage example


>>> s.getBackupList()
[['2012-10-04 10:29', 1], ['2012-10-04 10:29', 2], ['2012-10-04 10:30', 3], ['2012-10-04 10:30', 4]]
>>> s.getBackupInfo('2012-10-04 10:29')
...
>>> s.getBackupInfo(1)
...
>>> s.getBackupInfo((1, 'v0001'))

9.14.26. Get information about backup


Function name:

getBackupInfo()

Result type:

This functions returns a dictionary with key-value pairs describing the specified backup.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_testdb');
my $result = $server->call('getBackupInfo', '2010-11-02 14-35 00');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_testdb")
s.getBackupInfo('2010-11-02 14-35 00')

104
Chapter 9. EXAoperation

Python example output:

>>> # Non-Storage example


>>> pprint.pprint(s.getBackupInfo('2012-10-04 10-34 00'))
{'details': ['OK'],
'expire': 'n/a',
'expire date': 'n/a',
'files': {'n0011': ['/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/backup.ini',
'/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/node0',
'/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/node2'],
'n0012': ['/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/backup.ini',
'/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/node0',
'/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/node2'],
'n0013': ['/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/backup.ini',
'/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/node1'],
'n0014': ['/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/backup.ini',
'/d03_data/exa_db2/BACKUPS/2012-10-04_10-34_00/node1']},
'note': 'Backup from system exa_db2',
'offline media': '',
'public_ips': {'n0011': '10.50.1.11',
'n0012': '10.50.1.12',
'n0013': '10.50.1.13',
'n0014': '10.50.1.14'},
'size': {'n0011.c0001.exacluster.local': 0.0087890625,
'n0012.c0001.exacluster.local': 0.0087890625,
'n0013.c0001.exacluster.local': 0.0048828125,
'n0014.c0001.exacluster.local': 0.0048828125},
'state': 100}

>>> # Storage example


>>> pprint.pprint(s.getBackupInfo('2012-10-04 10:29'))
{'comment': '',
'dependencies': [],
'expire date': '2012-10-11 10:29',
'files': ['exa_db1/id_1/level_0/node_2/backup_201210041029',
'exa_db1/id_1/level_0/node_1/backup_201210041029',
'exa_db1/id_1/level_0/node_0/backup_201210041029',
'exa_db1/id_1/level_0/node_0/metadata_201210041029'],
'id': 1,
'level': 0,
'nodes': 3,
'size': 0.015142664313316345,
'system': 'exa_db1',
'timestamp': '2012-10-04 10:29',
'usable': True,
'usage': 0.00146484375,
'volume': ['v0001', 1]}
>>> s.getBackupInfo(1)
...
>>> s.getBackupInfo((1, 'v0001'))
...

9.14.27. Get database statistics


Function name:

getDatabaseStatistics()

105
9.14. XML-RPC interface

Result type:

This functions returns a base64 encoded zip file containing database statistics. In case of not providing start and
stop date, database statistics will be retrieved for the last month. This can be used by customer service to provide
useful usage graphs etc.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/db_testdb');
my $result = $server->call('getDatabaseStatistics', 'user', 'pass');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/db_testdb")
# Get statistics for last month (including today)
s.getDatabaseStatistics('dbuser', 'pass')
# Get statistics for January until May (including the 31st of May)
s.getDatabaseStatistics('dbuser', 'pass', '2013-01-01', '2013-05-31')

Python example output:

>>> import xmlrpclib, base64


>>> s = xmlrpclib.ServerProxy("https://user:password@localhost/cluster1/db_exa_db1")
>>> zipfile = base64.b64decode(s.getDatabaseStatistics('dbuser', 'dbpassword'))
>>> file("statistics.zip", "w").write(zipfile)

9.14.28. Start EXAStorage service


Function name:

startEXAStorage()

Result type:

This function returns 'OK' or an exception in case of a failure.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/storage');
my $result = $server->call('startEXAStorage');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/storage")
s.startEXAStorage()

Python example output:

>>> s.startEXAStorage()
'OK'

106
Chapter 9. EXAoperation

9.14.29. Stop EXAStorage service


Function name:

stopEXAStorage()

Result type:

This function returns 'OK' or an exception in case of a failure.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/storage');
my $result = $server->call('stopEXAStorage');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/storage")
s.stopEXAStorage()

Python example output:

>>> s.stopEXAStorage()
'OK'

9.14.30. Get state of EXAClusterOS services


Function name:

getServiceState()

Result type:

This function returns a list of tuples, each tuple consisting of a service name and the appropriate service state. The
service state is described with 'OK', 'not running' or (for DWAd) 'DWAd has no quorum'.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $result = $server->call('getServiceState');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
s.getServiceState()

Python example output:

>>> s.getServiceState()
[['Loggingd', 'OK'], ['Lockd', 'OK'], ['Storaged', 'OK'], ['DWAd', 'OK']]

107
9.14. XML-RPC interface

9.14.31. Get IPMI sensor status of a node


Function name:

getIPMISensorStatus()

Result type:

This function returns a list of IPMI key-value pairs with a value classification depending on the underlying IPMI
card.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/n0011');
my $result = $server->call('getIPMISensorStatus');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/n0011")
s.getIPMISensorStatus()

Python example output:

>>> pprint.pprint(s.getIPMISensorStatus())
[['CPU Temp 1', '47 degrees C', 'ok'],
['CPU Temp 2', '47 degrees C', 'ok'],
['CPU Temp 3', 'no reading', 'ns'],
['CPU Temp 4', 'no reading', 'ns'],
['Sys Temp', '29 degrees C', 'ok'],
['CPU1 Vcore', '1.14 Volts', 'ok'],
['CPU2 Vcore', '1.14 Volts', 'ok'],
['3.3V', '3.33 Volts', 'ok'],
['5V', '4.99 Volts', 'ok'],
['12V', '12 Volts', 'ok'],
['-12V', '-12.10 Volts', 'ok'],
['1.5V', '1.49 Volts', 'ok'],
['5VSB', '4.94 Volts', 'ok'],
['VBAT', '3.23 Volts', 'ok'],
['Fan1', '0 RPM', 'nr'],
['Fan2', '6300 RPM', 'ok'],
['Fan3', '6400 RPM', 'ok'],
['Fan4', '6100 RPM', 'ok'],
['Fan5', '0 RPM', 'nr'],
['Fan6', '0 RPM', 'nr'],
['Fan7/CPU1', '0 RPM', 'nr'],
['Fan8/CPU2', '0 RPM', 'nr'],
['Intrusion', '0 unspecified', 'nc'],
['Power Supply', '0 unspecified', 'ok'],
['CPU0 Internal E', '0 unspecified', 'ok'],
['CPU1 Internal E', '0 unspecified', 'ok'],
['CPU Overheat', '0 unspecified', 'ok'],
['Thermal Trip0', '0 unspecified', 'ok'],
['Thermal Trip1', '0 unspecified', 'ok']]

108
Chapter 9. EXAoperation

9.14.32. Get state of a node


Function name:

getNodeState()

Result type:

This function returns a dictionary(Python)/hash(Perl) that describes a state of node. The dictionary has 'state',
'power' and 'action' fields. Function is for Cluster and Node objects defined. When it is called from cluster, it takes
a node'' name as a obligatory parameter.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/n0011');
my $result = $server->call('getNodeState');

my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');


my $result = $server->call('getNodeState', 'n0011');

Python example:

import xmlrpclib
# Node level
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/n0011")
s.getNodeState()

# Cluster level
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
for x in s.getNodeList():
print x, s.getNodeState(x)

Python example output:

>>> s.getNodeState()
{'status': 'Running', 'operation': 'Active', 'power': 'Power On'}

Output details:

The value of 'status' can vary between 'Running', 'Suspended', 'Offline', 'Installing', 'Installed', 'Unknown', 'Shredding',
'Shredded', 'Booting' The value of 'power' can be 'Power On', 'Power Off' and 'Unknown'(this when IPMI Service
is not present) The value of 'operation' can be 'Active', 'Active/Force fsck', 'Force no fsck', 'To install' and 'TO
WIPE'

9.14.33. Get disk state(s) of a node


Function name:

getDiskStates()

Result type:

This function returns a list of dictionaries (Python)/hashes (Perl), whereas each entry describes one disk of the
specified node.

Perl example:

109
9.14. XML-RPC interface

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1/n0011');
my $result = $server->call('getDiskStates');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1/n0011")
s.getDiskStates()

Python example output:

>>> pprint.pprint(s.getDiskStates())
[{'devices': 'Default',
'encr': 'disk-encr-aes256',
'mount_count': '2/28',
'name': 'd00_os',
'next_fsck': 'Mon Mar 28 08:51:27 2011',
'raid': 'disk-raid-none',
'size': '50',
'state': 'None',
'type': 'disk-type-os',
'free': '41.8'},
{'devices': 'Default',
'encr': 'disk-encr-none',
'mount_count': '-',
'name': 'd01_swap',
'next_fsck': '-',
'raid': 'disk-raid-none',
'size': '4',
'state': 'None',
'type': 'disk-type-swap',
'free': '3.8'},
{'devices': 'Default',
'encr': 'disk-encr-none',
'mount_count': '2/27',
'name': 'd02_data',
'next_fsck': 'Mon Mar 28 08:52:04 2011',
'raid': 'disk-raid-none',
'size': '47',
'state': 'None',
'type': 'disk-type-data',
'free': '45.2'}]

Output details:

The state of a disk can vary between 'Online', 'Offline', and 'Degraded' for software RAIDs. Hardware RAID systems
may use proprietary interfaces to retrieve the current operation. Thus, such disks will show up with state 'None'.
The size of a disk is always shown in GiB (as usage is, too) or declared as 'Rest' (partition grows with disk size).
The used devices for a disk may be set to 'Default' (all devices of a node are used) or it is a list of devices, e.g.
"['/dev/sda', '/dev/sdb']".

9.14.34. Show list of installed plugins


Function name:

showPluginList()

110
Chapter 9. EXAoperation

Result type:

List of installed plugins

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $funcs = $server->call('showPluginList');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
funcs = s.showPluginList()

Python example output:

>>> s.showPluginList()
['RAID.tw_cli-10.1', 'RAID.arcconf-6.30']

9.14.35. Call plugin function


Function name:

callPlugin()

Parameter(s):

1. plugin of type string: Name of plugin to call.

2. node of type string: Name of node on which to call plugin.

3. command of type string: Command to call.

4. extra parameters (optional) of type string: Extra parameters for command.

Result type:

[return code, output]

Precondition(s):

The appropriate plugin is installed in the cluster.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $result = $server->call('callPlugin', 'RAID.tool', 'n11', 'SHOW_LOGS');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
ret, output = s.callPlugin('RAID.tool', 'n11', 'SHOW_LOGS')

Python example output:

111
9.14. XML-RPC interface

>>> status, result = s.callPlugin('RAID.arcconf-6.30', 'license', 'SHOW', '1')


>>> print result
Controllers found: 1
----------------------------------------------------------------------
Controller information
----------------------------------------------------------------------
Controller Status : Optimal
Channel description : SAS/SATA
Controller Model : Adaptec 5805Z
Controller Serial Number : 0A111151612
Physical Slot :4
Temperature : 49 C/ 120 F (Normal)
Installed memory : 512 MB
Copyback : Disabled
Background consistency check : Disabled
Automatic Failover : Enabled
Global task priority : High
Performance Mode : Default/Dynamic
Stayawake period : Disabled
Spinup limit internal drives :0
Spinup limit external drives :0
Defunct disk drive count :0
Logical devices/Failed/Degraded : 1/0/0
SSDs assigned to MaxIQ Cache pool :0
Maximum SSDs allowed in MaxIQ Cache pool : 8
NCQ status : Enabled
--------------------------------------------------------
Controller Version Information
--------------------------------------------------------
BIOS : 5.2-0 (17544)
Firmware : 5.2-0 (17544)
Driver : 1.1-5 (2456)
Boot Flash : 5.2-0 (17544)
--------------------------------------------------------
Controller ZMM Information
--------------------------------------------------------
Status : ZMM Optimal

----------------------------------------------------------------------
Logical device information
----------------------------------------------------------------------
Logical device number 0
Logical device name : ADAP1
RAID level :0
Status of logical device : Optimal
Size : 141590 MB
Stripe-unit size : 256 KB
Read-cache mode : Enabled
MaxIQ preferred cache setting : Enabled
MaxIQ cache setting : Disabled
Write-cache mode : Enabled (write-back)
Write-cache setting : Enabled (write-back)
Partitioned : No
Protected by Hot-Spare : No
Bootable : Yes
Failed stripes : No
Power settings : Disabled

112
Chapter 9. EXAoperation

--------------------------------------------------------
Logical device segment information
--------------------------------------------------------
Segment 0 : Present (0,4) WD-WMAKE2258147
Segment 1 : Present (0,5) WD-WMAKE2257664

----------------------------------------------------------------------
Physical Device information
----------------------------------------------------------------------
Device #0
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 1.5 Gb/s
Reported Channel,Device(T:L) : 0,4(4:0)
Reported Location : Connector 1, Device 0
Vendor : WDC
Model : WD740GD-00FL
Firmware : 33.08F33
Serial number : WD-WMAKE2258147
Size : 70911 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings :0
Power State : Full rpm
Supported Power States : Full rpm,Powered off
SSD : No
MaxIQ Cache Capable : No
MaxIQ Cache Assigned : No
NCQ status : Disabled
Device #1
Device is a Hard drive
State : Online
Supported : Yes
Transfer Speed : SATA 1.5 Gb/s
Reported Channel,Device(T:L) : 0,5(5:0)
Reported Location : Connector 1, Device 1
Vendor : WDC
Model : WD740GD-00FL
Firmware : 33.08F33
Serial number : WD-WMAKE2257664
Size : 70911 MB
Write Cache : Enabled (write-back)
FRU : None
S.M.A.R.T. : No
S.M.A.R.T. warnings :0
Power State : Full rpm
Supported Power States : Full rpm,Powered off
SSD : No
MaxIQ Cache Capable : No
MaxIQ Cache Assigned : No
NCQ status : Disabled

Command completed successfully.

113
9.15. Libvirt interface for managing cluster
nodes

9.14.36. Show plugin functions


Function name:

showPluginFunctions()

Parameter(s):

1. plugin of type string: Name of plugin to call.

Result type:

Dictionary of functions and function descriptions

Precondition(s):

The appropriate plugin is installed in the cluster.

Perl example:

use Frontier::Client;
my $server = Frontier::Client->new('url' => 'https://user:password@license-server/cluster1');
my $funcs = $server->call('showPluginFunctions', 'RAID.tool');

Python example:

import xmlrpclib
s = xmlrpclib.ServerProxy("https://user:password@license-server/cluster1")
funcs = s.showPluginFunctions('RAID.tool')

Python example output:

>>> pprint.pprint(s.showPluginFunctions('RAID.tw_cli-10.1'))
{'SHOW': 'Show information about controllers and units. May be called with a controller/unit ID as argument, e.g. "/c0", "/c0/u0"',
'SHOW_AENS': 'Show automatic event notifications of controllers. May be called with a controller ID as argument, e.g. "/c0".',
'SHOW_ALARMS': 'Show alarms of controllers. May be called with a controller ID as argument, e.g. "/c0".',
'SHOW_DIAG': 'Show diagnostic information of controllers. May be called with a controller ID as argument, e.g. "/c0".',
'SHOW_EVENTS': 'Show events of controllers. May be called with a controller ID as argument, e.g. "/c0".',
'SHOW_REBUILD': 'Show rebuild schedules for controllers. May be called with a controller ID as argument, e.g. "/c0".',
'SHOW_SELFTEST': 'Show information about controller selftests. May be called with a controller ID as argument, e.g. "/c0".',
'SHOW_VER': 'Show API/CLI version of tw_cli.',
'SHOW_VERIFY': 'Show verify schedules. May be called with a controller ID as argument, e.g. "/c0".'}

9.15. Libvirt interface for managing cluster nodes


When using virtualization via the Libvirt daemon and allowing the "qemu+ssh" protocol, it is possible to power
on/power off those nodes via EXAoperation. This requires the following steps:

1. Name the nodes in Libvirt by {cluster_name}_{node name}, e.g. "mycluster_n0011".

2. Add a Server Management group with type "Libvirt", public IP addresses and a valid Libvirt user (and pass-
word).

3. For each node, use the new Server Management Group and use the IP address of the physical virtualization
host as public Server Management IP.

114
Chapter 9. EXAoperation

9.16. Hardware Security Modules


To enhance security for stored disk passwords, one may use HSM (hardware security module) devices. These
devices can be used via a PKCS#11-API enabled library. To couple these devices, do the following:

1. Configure your hardware device and install the necessary PKCS#11 library in the cluster. This may include
using a command line shell in the EXASuite cluster. See your HSM manual for further details.

2. Enable the cluster node(s) to use the specified key and create an encryption key in that slot. To do this on the
command-line easily, we provide a tool that does that for you. This one can be called with pkcs11-handler
-l {PKCS#11 library} -k {keylabel} -S {slot number} -c

3. Create a key store in EXAoperation via "Access Management" -> "Key Stores" -> "Add". Provide an identi-
fier for that key store, the label of the key and some attributes. An example for how to use attributes is
LIB={PKCS#11 library};SLOT={slot number} If only one slot is provided, the SLOT parameter
may be skipped. Click "Apply" to save this new key store.

4. Select the newly created key store (via the radio button in the table) and choose "Unlock". Now, provide the
slot PIN and choose the time duration this key should be accessible.

5. Browse to "Access Management" -> "System Passwords" and choose this newly created key store ("Disk
Key Store").

As long as the used key store is unlocked, nodes may be booted. Otherwise, a boot process will fail.

9.17. Disk wipe


When wiping disks via EXAoperation, the following happens at boot time of a node:

1. Determine disks to wipe (as specified in EXAoperation).

2. For each disk, start the shred command. This is a standard tool on UNIX systems and will be used with the
command line parameters -n 7 -z. Thus, it will overwrite each disk with random data (seven times) and
start a further run, overwriting the disk with zeroes.

9.18. Increase disk space


EXAoperation provides three different approaches to increase the available disk space on cluster nodes:

1. Enlarge disk devices: This is the preferred approach for VM environments. See SOL-177 in the EXASOL
support portal for more information.

2. Add additional EXAStorage disks: This is reasonable in case option (1) is not possible due to the cluster en-
vironment or its configuration. See SOL-453 in the EXASOL support portal for more information.

3. Reinstall nodes: This is an option in case the data of a node can be stored somewhere else during the process
of reinstallation.

9.19. Hugepages
Since EXAoperation version 5.0.0 it is possible to define the amount of so-called hugepages for cluster nodes. The
hugepage feature has been introduced into machine hardware and the Linux kernel several years ago and allows
efficient management of memory especially for situations with large amounts of physical RAM. It allows to shrink

115
9.20. Compatibility and known issues

the necessary kernel data structures for handling process memory dramatically2.We recommend to define a reas-
onable amount of hugepages for cluster nodes with at least 512 GiB RAM. These hugepages can be used for "hot
database data" as shown in Figure 9.20, “DB RAM and hugepages”. As only those data can be used here, it is ne-
cessary that the amount of hugepages per node is smaller than the amount of DB RAM used on it. Typically, 2-16
GiB of database RAM on any node will be used for other data necessary to be held in memory. This amount may
be larger in case of databases with many open client connections at one time. When using this feature, we recommend
to define at least 70 percent of DB RAM for hugepages, but to leave at least 64 GiB of physical node RAM un-
touched.

Figure 9.20. DB RAM and hugepages

When starting multiple databases on one node, hugepages will be shared on-demand between database instances.
When changing the hugepage setting for a cluster node, it has to be restarted.

Here an example calculation with one database node having 768 GiB RAM: In such a configuration we recommend
at least 32 GiB of RAM for general purpose use (kernel memory, operating system processes, ...). Furthermore,
we recommend at least 32 GiB for database heap and hot data on demand. This results in 768-32-32 GiB = 704
GiB of hugepages that could be allocated.

9.20. Compatibility and known issues

9.20.1. Web browsers


The EXAoperation web interface has been tested successfully against the following browsers:

2
Especially page table data structures can benefit. These normally reference pages with a size of 4 KB, but when used with hugepages (whose
size is 2MB), their size can be shrinked upto 500 times. In a real-case scenario, this can save dozens of Gigabytes.

116
Chapter 9. EXAoperation

• Mozilla Firefox 45 on a GNU/Linux system

• Mozilla Firefox 45 on a Windows system

• Chrome 49 on GNU/Linux system

Other common browsers as well as Internet Explorer installations in higher versions should also work as expected.
Konqueror 3.4.1 is known not to deselect checkboxes in "Unselect All" dialogs.

9.20.2. Pulling backups


EXAoperation provides database backup over ports 2021 (FTP), 2022 (SFTP), 2080 (HTTP), and 2443 (HTTPS)
(see appropriate EXAoperation chapter). The contained file hierarchy includes virtual tar.gz files that provide the
content of subdirectories in a compressed form and are shown with a size of zero bytes. Some client tools may
have problems downloading these files, as they expect no content (e.g. WinSCP) or only a small portion of data.
When this case occurs, please use tools such as cURL or lftp, which are available for most Linux distributions and
for Windows.

9.20.3. Synchronizing backups between EXASuite clusters


When using remote volumes to write backups from one cluster to another and the backups should be readable by
remote databases, the remote volume must be created with the option 'nocompression'.

9.20.4. SW-RAID
In case of choosing software RAID, CentOS has an automated Cron job that checks and synchronizes all appropriate
devices once every week. Thus, you may recognize appropriate synchronization log messages over EXAoperation.

9.20.5. NTP symmetric key exchange


When specifying NTP keys for reaching NTP servers, MD5 is always used for communication.

9.20.6. Python and XML-RPC


All Python examples for the XML-RPC interface cover Python 2.x. In Python 3, the xmlrpclib module has
been replaced by xmlrpc.client. Thus, new code must import this module instead as in the example below:

import xmlrpc.client
s = xmlrpc.client.ServerProxy("https://user:password@license-server/cluster1")
s.getDatabaseList()

9.20.7. Block sizes


The block size of newly created archive and data volumes is chosen automatically in case of not explicitly
providing a block size. It is set to 4 KB for data volumes and 64 KB for archive volumes. The block size of data
volumes should never be set to another value unless explicitly required by EXASOL. The block size of archive
volumes may be set to a larger value and determines the maximum archive volume size. This maximum archive
volume size is 16 TB for 64 KB blocks, 24 TB for 96 KB blocks, 32 TB for 128 KB blocks, and so on.

117
9.20. Compatibility and known issues

9.20.8. Enlarging databases


To enlarge a database that runs on an EXAStorage Volume please follow these steps:

1. Shutdown the database.

2. Select the corresponding data volume on the EXAStorage page and enlarge it by X nodes.

3. On the corresponding database properties page select the action "Enlarge" and enlarge the database for the
same number of nodes. Note: by enlargement you are specifying a number of nodes, that will be added to a
database and not a total number of nodes in it.

4. Restart the database.

5. When the database is running, connect to it with EXAplus and call REORGANIZE DATABASE. This will
reorganize data layout and improve performance.

If needed, also enlarge the archive volume by the same number of nodes.

Enlarging databases may require some additional database startup parameters in case not working properly. E.g.
enlarging a database with too much memory required results in a startup error message and later startup attempts
fail due to database file mismatches. In this case, enter "-enlargeCluster=1" into the "Extra database parameters"
field and start the database. Shut down the database after the first successfull login, remove the "-enlargeCluster=1"
parameter and start the database again.

9.20.9. Uploading multiple backups at once


When uploading multiple backups at once, these backups must use different IDs (timestamps). Otherwise, one of
both backups will not be shown in the backup form.

9.20.10. Remote syslog servers


Remote syslog servers will be connected at TCP or UDP port 514 (belonging on the configuration of the logservice).
A remote syslog service has to be configured to receive messages over network, e.g. by adding the following two
lines to its configuration (TCP case):

$ModLoad imtcp
$InputTCPServerRun 514

For the UDP case, a configuration like this one has to be used:

$ModLoad imudp
$UDPServerRun 514

Note that the syslog service discards undeliverable messages silently.

118
Chapter 10. Installation

Chapter 10. Installation


Before installation you should take into account the following minimum hardware requirements for license servers:

• 4 GB RAM (16 GB recommended)

• 1 Dual Core Intel processor

• 2 300 GB hard disks

• 2 network adapters with at least 1 GB/s

• lights-out management (LOM) interface, e.g. an IPMI card

In case of using a virtualized license server, e.g. via KVM or VirtualBox, the minimum requirements are:

• 4 GB RAM

• 100 GB usable hard disk space

• 2 network adapters

10.1. Installation on a license server via installation me-


dium
EXASOL provides DVDs for installation of license servers. These should be used as the preferred way and offer
a step-by-step installation process.

10.2. Automated installation of a license server via net-


work
License servers can be installed automatically via PXE boot and a configuration file. We recommend to use an
appropriate network installation environment for this task, e.g. Cobbler (www.cobblerd.org). If this is not an option,
execute the following steps to configure a PXE boot server on top of a CentOS system:

1. yum install tftp-server dhcp vsftpd xinetd syslinux

2. Create a directory in the FTP server directory (e.g. mkdir /var/ftp/pub/EXASuite)

3. Mount the installation medium into this newly created directory (e.g. mount -o loop dvd.iso
/var/ftp/pub/EXASuite)

4. Enable tftp in the xinetd configuration (sed -i 's/.*disable.*//' /etc/xinetd.d/tftp)

5. Restart xinetd (service xinetd restart) and start vsftpd (service vsftpd start)

6. Copy the necessary boot files into the tftpboot directory (e.g. cp /usr/share/syslinux/pxelinux.0
/var/lib/tftpboot/ && cp /var/ftp/pub/EXASuite/images/pxeboot/{in
itrd.img,vmlinuz} /var/lib/tftpboot/

7. Create the subdirectory pxelinux.cfg in the tftpboot base directory (mkdir /var/lib/tftp-
boot/pxelinux.cfg)

119
10.2. Automated installation of a license server
via network

8. Insert installation URL into kickstart file (e.g. sed 's!MAKE_URL!ftp://{IP of FTP serv-
er}/pub/EXASuite!g' /var/ftp/pub/EXASuite/kickstart-net/install.cfg >
/var/ftp/pub/install.cfg).

9. Create configuration file (e.g. as /var/ftp/pub/auto.cfg). This configuration could look like this (for
details of all configuration options see below):

[Network]
Private = 00:0A:0B:0C:0D:0E
Public = 00:0B:0C:0D:0E:0F

[General]
Number = 10
Installation method = immediate
Device = /dev/sda
Maintenance password = {SHA-512 hashed password}

[Cluster Network]
IP address = 10.17.0.0
Netmask = 255.255.0.0

[Public Network]
IP address = 192.168.6.2
Netmask = 255.255.255.0
Gateway = 192.168.6.1

10. Create a network boot configuration for the license server and store it into the file /var/lib/tftp-
boot/pxelinux.cfg/default. This configuration could look like this:

default linux
label linux
kernel vmlinuz
append initrd=initrd.img \
biosdevname=0 \
ks=ftp://{IP of FTP server}/pub/install.cfg \
ksdevice=eth1 \
exaconf=ftp://{IP of FTP server}/pub/auto.cfg

biosdevname=0 is a required parameter, as well as ksdevice= (the device to get the installation from),
ks= (the kickstart file to install from) and exaconf=.

11. Create a DHCPd configuration for the license server. It could look like this one:

deny unknown-clients;
default-lease-time 300;
allow booting;
allow bootp;
ddns-update-style none;
filename "pxelinux.0";

subnet 192.168.6.0 netmask 255.255.255.0 {


option broadcast-address 192.168.6.255;

120
Chapter 10. Installation

next-server 192.168.6.1;
}

host n0010 {
hardware ethernet 00:0B:0C:0D:0E:0F;
fixed-address 192.168.6.2;
}

The IP address should match the one specified in the license server installation process. Change the MAC
address to the real one, the license server is booting from.

12. Restart DHCPd (service dhcpd restart)

13. Now you can connect the license server to the PXE boot server and start it.

10.2.1. Configuration file settings


This subsection will show all parameters that can be used in a license server installation configuration file as shown
above:

1. Section General:

• Number (required) - Number of license server. Use 10 as default.

• Device (required) - Device to which install operating system, e.g. /dev/sda

• Installation method (required) - One of "immediate" (install on first startup), "delayed" (install
on first startup with help of maintenance user), "additional" (install as additional license server, in this
case, the parameter "root password" has to be delivered, which is a base-64 encoded string)

• Maintenance password (required) - Password of maintenance user as base-64 encoded string or as


hashed string to use in /etc/shadow

• Encryption (optional) defaults to "False" (no encryption); in case of choosing "True", encrypt disk
device with "Encryption password"

• Encryption password (optional) base-64 encoded string of disk password, only used in case of
choosing disk encryption

2. Section Network:

• Private (required) - MAC address of private network interface or default kernel interface name (e.g.
eth0, eth1)

• Public (required) - MAC address of public network interface or default kernel interface name (e.g. eth0,
eth1)

• Private-bonded (optional) - MAC address of private bonded network interface or default kernel in-
terface name (e.g. eth0, eth1)

• Public-bonded (optional) - MAC address of public bonded network interface or default kernel interface
name (e.g. eth0, eth1)

• Private MTU (optional) - MTU for private network interface; for 10 GBit networks, a value of 9000
for the whole network is recommended; do not mix different MTU sizes for one network

121
10.3. Installation of EXAClusterOS on a bare
CentOS server system

• Public MTU (optional) - MTU for private network interface; for 10 GBit networks, a value of 9000 for
the whole network is recommended; do not mix different MTU sizes for one network

• VLAN MTU (optional) - MTU for VLAN network interfaces; for 10 GBit networks, a value of 9000 for
the whole network is recommended; do not mix different MTU sizes for one network

3. Section Public Network:

• IP address (required in case of using static public network configuration) - public IP address of license
server

• Netmask (required in case of using static public network configuration) - public network netmask of li-
cense server

• Gateway (required in case of using static public network configuration) - public gateway of license
server

• Use DHCP (required in case of using public IP address via DHCP) - "True" or "False"

4. Section Private VLANs: This section defines VLAN IDs in case of using private VLANs. An entry of 1
= 11 constitutes the first private VLAN with a VLAN ID of 11.

10.3. Installation of EXAClusterOS on a bare CentOS


server system
Delivering an EXAClusterOS package on a license server installed as mentioned above consists of the following
steps:

1. Unpack the EXAClusterOS package into the root directory (/)

2. Unpack /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/var/clients/packages/EXAClus-
terOS-6.0.6_Linux-META_DEFAULT_KERNEL_x86_64.tar.gz into the root directory. This will
copy the required Linux kernel as well as its sources into the system. Afterwards, execute /sbin/new-
kernel-pkg --mkinitrd --depmod --install --make-default META_DEFAULT_KERNEL,
which will install the kernel.

3. Reboot the system.

4. Execute /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/sbin/cos_start_exaop. This will


do the rest of the installation. After having finished, all required services should be up and running.

5. Configure standard network and NTP settings with EXAoperation.

6. Do a final reboot of the license server. EXAClusterOS should start up automatically.

After a shutdown or a crash of the Cored process, the license server may be reintegrated into the cluster by executing
/etc/init.d/cos start.

In case VLANs have been configured for client nodes, the license server should also be configured with appropriate
network interfaces. If only one private network interface is available, this should be configured to be part of all
VLANs on the switch. Furthermore, the network configuration should be changed appropriately as shown in the
following example (this example is based on two VLANs tagged with 25 (for network 27.1.0.0/16) and 53 (for
network 27.65.0.0/16): Create the files /etc/sysconfig/network-scripts/ifcfg-eth0.25 and
/etc/sysconfig/network-scripts/ifcfg-eth0.53. These files must contain the entry "VLAN=yes"
and the appropriate IP address of the interface configured statically. The file /etc/sysconfig/network-
scripts/ifcfg-eth0 must contain no IP address. In all files, the "ONBOOT" entry must be set to "yes".

For accessing license servers over an IPMI console, the following commands have to be executed:

122
Chapter 10. Installation

1. Configure the SOL (Serial-over-LAN) interface of your IPMI card.

2. In /boot/grub/grub.conf, add the parameters "console=tty0 console=ttyS1,115200" to the command


line of the current kernel.1

3. Add the entry "S1:2345:respawn:/sbin/agetty -h -L ttyS1 115200 vt100" to the /etc/inittab file.

4. Add the entry "ttyS1" to the /etc/securetty file.

5. Reboot the system. Now you should be able to use ipmitool for accesses to the SOL interface.

10.4. Client Nodes


Client nodes must be configured with EXAoperation as shown above. The following three steps are required for
the install process:

1. Network configuration: A valid network configuration for public interfaces of client nodes must be provided.

2. Disks: EXAoperation will use the inscribed disks for setting up client nodes.

3. Node installation: Client nodes will be identified by their MAC addresses of the appropriate private interface.
If not set explicitely, the default disk and network configuration will be used. New nodes should be started
either with EXAoperation (assuming an IPMI card in the node) or manually. While not being marked as ac-
tivated, node disks will be formatted on every startup. After activation, a restarted client node will not be in-
stalled anymore.

Hint: When using console redirection (as specified in the node configuration), a TTY speed of 19200 bps will be
used.

The following two sections will give a detailed description of the client boot process. This is followed by a descrip-
tion of the main part of client node monitoring implemented in cos-sensors.

10.4.1. Client boot process


Every client node that is dedicated to be part of an EXAClusterOS cluster environment must be setup'd to boot
via PXE. Thus, the initial boot image will be received from a TFTP server running on the license server. After re-
ceiving this initial boot image, a client node initiates a cos-ping command, which connects to the Xinetd on the
license server. This ping command will be invoked periodically until the node has been successfully installed and
configured (this prevents a "dead" node in case something fails temporarily during this process, e.g. the license
server). The Xinetd will invoke the appropriate initialization script, which does the following:

1. Read and check initialization message from client node.

2. Check that no installation process for this node is running currently, else exit.

3. Invoke stage 2 script, which will copy and unpack all necessary packages, do the final network configuration
and start up all cluster services.

1
Replace all occurrences of 115200 in this instruction guide with the configured bit rate of the IPMI card SOL interface.

123
10.5. Updates

10.5. Updates

10.5.1. Updates from version 4.x


Please remember that a database update from version 4.x will lead to a reset of distribution keys and indices. Thus,
a REORGANIZE DATABASE is required after the first startup of an upgraded EXASolution instance to restore
full system performance.

10.5.2. Updates and defect nodes


Sometimes, nodes suddenly get a hardware defect when doing a reboot. If that happens during an update and involves
a database node, you must also remove this node/these nodes from the database configuration, but only if the
number of defect nodes is lower than the configured database redundancy. Thus, you have to choose this node/these
nodes in the "Remove reserve/failed nodes" field in the configuration. If the number of failed nodes equals or is
higher than the redundancy, the node/nodes must be repaired first.

Remember that an explicit database node reconfiguration requires a background-restore for non-Storage databases
to take place. This process could decrease your database performance for a certain time after database startup.

10.6. Downgrades
In case upgrades should be reverted it is also possible to do downgrades. Thus, the following steps are necessary:

1. Do a shutdown of all client nodes.

2. Log into the license server via SSH over port 20. Call /etc/init.d/cos stop

3. Remove the file {previous EXAClusterOS directory}/var/exaoperation/conf/.success.

4. Remove /etc/cos.conf.

5. Execute chkconfig --del cos.

6. Reboot the license server.

7. Execute {previous EXAClusterOS directory}/sbin/cos_start_exaop.

After having downgraded, EXAoperation will use the configuration that was specified at upgrade time.

10.7. Add another license server into a cluster


EXAClusterOS supports multiple dedicated license servers in a cluster. To install a new license server, execute
the following steps:

1. Install the new server with an EXAClusterOS installation medium. You must choose to install for an additional
license server and provide the SSH root password of the cluster nodes. 2

2. After the installation and one reboot, the synchronization is done automatically and the license server may
be used actively soon.

2
It is recommended to move EXAoperation to another (already installed) license server beforehand, because the data synchronization process
may make heavy use of network bandwidth, which may hit database performance temporarily.

124
Chapter 10. Installation

10.8. Add new disks to client nodes without re-installa-


tion
This section will give a summary about the steps necessary to add new disks to nodes without re-installation and
data loss.

1. Create the directory /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/var/exaopera-


tion/spool/27.1.0.{NODE_NUMBER} on the license server to prevent EXAoperation from booting
the node. Now the node can be (re-)booted (with the new disk(s) built in).

2. After the node started, log into it via rssh, execute killall crond and issue a dmesg command to see,
how the new disk(s) were integrated into the system (the formerly known disk partitions are shown at kernel
boot time).

3. If an already installed disk changed its name (e.g. from /dev/sda to /dev/sdb, rename it via EXAoper-
ation (-> Nodes -> Properties). Furthermore, execute mkdir /etc/cos (in rssh) and create a
/etc/cos/node_uuid file containing the "Unique ID" of that node (-> Node View), e.g. via echo -n
{UUID} >/etc/cos/node_uuid. Check the current referred disk via /usr/opt/EXASuite-
6/EXAClusterOS-6.0.6/sbin/hddident -m /dev/sdb1 -a (this assumes that the formerly
known disk is now known as /dev/sdb). Set the right disk via /usr/opt/EXASuite-6/EXACluster-
OS-6.0.6/sbin/hddident -m /dev/sdb1 -N /dev/sdb and re-check the success of this command
via /usr/opt/EXASuite-6/EXAClusterOS-6.0.6/sbin/hddident -m /dev/sdb1 -a.
Delete the /etc/cos directory via rm -rf /etc/cos.

4. Set the node into the "Install" state and add the new disk(s) (-> Nodes -> Node -> Disks). Set the node back
into the "Active" state.

5. Delete the spool directory on the license server.

6. Execute cos-ping in the rssh shell on the client node and wait until the boot of this node fails (see the
messages in EXAoperation).

7. Take the mkfs.ext4 command referring to the new disk from the /etc/hddinit_gpt.sh script on
the client node and execute it in the rssh shell.

8. Reboot the client node.

125
126
Glossary

Glossary
D
Disk A partition that may be spanned over one or more physical devices. Comparable
to a logical volume under Linux.
See Also Disk device.

Disk device A physical device, typically named as /dev/sda, /dev/sdb.


See Also Disk.

127
128

You might also like