UsingVoltDB For Developers
UsingVoltDB For Developers
Abstract
This book explains how to use VoltDB to design, build, and run high performance applications.
V6.2
Using VoltDB
V6.2
Copyright 2008-2016 VoltDB, Inc.
The text and illustrations in this document are licensed under the terms of the GNU Affero General Public License Version 3 as published by the
Free Software Foundation. See the GNU Affero General Public License (http://www.gnu.org/licenses/) for more details.
Many of the core VoltDB database features described herein are part of the VoltDB Community Edition, which is licensed under the GNU Affero
Public License 3 as published by the Free Software Foundation. Other features are specific to the VoltDB Enterprise Edition, which is distributed
by VoltDB, Inc. under a commercial license. Your rights to access and use VoltDB features described herein are defined by the license you received
when you acquired the software.
Table of Contents
About This Book .............................................................................................................. xii
1. Overview ....................................................................................................................... 1
1.1. What is VoltDB? .................................................................................................. 1
1.2. Who Should Use VoltDB ....................................................................................... 1
1.3. How VoltDB Works .............................................................................................. 2
1.3.1. Partitioning ................................................................................................ 2
1.3.2. Serialized (Single-Threaded) Processing ......................................................... 2
1.3.3. Partitioned vs. Replicated Tables ................................................................... 3
1.3.4. Ease of Scaling to Meet Application Needs ..................................................... 4
1.4. Working with VoltDB Effectively ............................................................................ 4
2. Installing VoltDB ............................................................................................................ 5
2.1. Operating System and Software Requirements ............................................................ 5
2.2. Installing VoltDB .................................................................................................. 6
2.2.1. Upgrading From Older Versions ................................................................... 6
2.2.2. Building a New VoltDB Distribution Kit ........................................................ 6
2.3. Setting Up Your Environment ................................................................................. 7
2.4. What is Included in the VoltDB Distribution ............................................................. 7
2.5. VoltDB in Action: Running the Sample Applications .................................................. 8
3. Starting the Database ....................................................................................................... 9
3.1. Initializing a VoltDB Database ................................................................................ 9
3.2. Initializing the Database on a Cluster ....................................................................... 9
3.3. Updating Nodes on the Cluster .............................................................................. 10
3.4. Stopping a VoltDB Database ................................................................................. 11
3.5. Restarting a VoltDB Database ............................................................................... 11
3.6. Defining the Cluster Configuration ......................................................................... 12
3.6.1. Determining How Many Sites per Host ......................................................... 12
3.6.2. Configuring Paths for Runtime Features ........................................................ 13
3.6.3. Verifying your Hardware Configuration ........................................................ 14
4. Designing the Database Schema ....................................................................................... 15
4.1. How to Enter DDL Statements .............................................................................. 16
4.2. Creating Tables and Primary Keys ......................................................................... 17
4.3. Analyzing Data Volume and Workload ................................................................... 18
4.4. Partitioning Database Tables ................................................................................. 19
4.4.1. Choosing a Column on which to Partition Table Rows ..................................... 19
4.4.2. Specifying Partitioned Tables ...................................................................... 20
4.4.3. Design Rules for Partitioning Tables ............................................................ 20
4.5. Replicating Database Tables .................................................................................. 20
4.5.1. Choosing Replicated Tables ........................................................................ 21
4.5.2. Specifying Replicated Tables ...................................................................... 21
4.6. Modifying the Schema ......................................................................................... 21
4.6.1. Effects of Schema Changes on Data and Clients ............................................. 22
4.6.2. Viewing the Schema .................................................................................. 23
4.6.3. Modifying Tables ...................................................................................... 23
4.6.4. Adding and Dropping Indexes ..................................................................... 25
4.6.5. Modifying Partitioning for Tables and Stored Procedures ................................. 26
5. Designing Stored Procedures to Access the Database ........................................................... 30
5.1. How Stored Procedures Work ................................................................................ 30
5.1.1. VoltDB Stored Procedures are Transactional .................................................. 30
5.1.2. VoltDB Stored Procedures are Deterministic .................................................. 30
5.2. The Anatomy of a VoltDB Stored Procedure ............................................................ 32
5.2.1. The Structure of the Stored Procedure .......................................................... 32
iii
Using VoltDB
iv
Using VoltDB
11.
12.
13.
14.
15.
Using VoltDB
15.1.
15.2.
15.3.
15.4.
15.5.
vi
Using VoltDB
BIN() .....................................................................................................................
BIT_SHIFT_LEFT() .................................................................................................
BIT_SHIFT_RIGHT() ..............................................................................................
BITAND() ..............................................................................................................
BITNOT() ..............................................................................................................
BITOR() .................................................................................................................
BITXOR() ..............................................................................................................
CAST() ..................................................................................................................
CEILING() .............................................................................................................
CENTROID() ..........................................................................................................
CHAR() .................................................................................................................
CHAR_LENGTH() ..................................................................................................
COALESCE() .........................................................................................................
CONCAT() .............................................................................................................
CONTAINS() ..........................................................................................................
COUNT() ...............................................................................................................
CURRENT_TIMESTAMP ........................................................................................
DATEADD() ...........................................................................................................
DAY(), DAYOFMONTH() ........................................................................................
DAYOFWEEK() ......................................................................................................
DAYOFYEAR() ......................................................................................................
DECODE() .............................................................................................................
DISTANCE() ..........................................................................................................
DWITHIN() ............................................................................................................
EXP() ....................................................................................................................
EXTRACT() ...........................................................................................................
FIELD() .................................................................................................................
FLOOR() ................................................................................................................
FORMAT_CURRENCY() .........................................................................................
FROM_UNIXTIME() ...............................................................................................
HEX() ....................................................................................................................
HOUR() .................................................................................................................
ISINVALIDREASON() ............................................................................................
ISVALID() .............................................................................................................
LATITUDE() ..........................................................................................................
LEFT() ...................................................................................................................
LN(), LOG() ...........................................................................................................
LONGITUDE() .......................................................................................................
LOWER() ...............................................................................................................
MAX() ...................................................................................................................
MIN() ....................................................................................................................
MINUTE() ..............................................................................................................
MOD() ...................................................................................................................
MONTH() ..............................................................................................................
NOW .....................................................................................................................
NUMINTERIORRINGS() .........................................................................................
NUMPOINTS() .......................................................................................................
OCTET_LENGTH() .................................................................................................
OVERLAY() ...........................................................................................................
PI() ........................................................................................................................
POINTFROMTEXT() ...............................................................................................
POLYGONFROMTEXT() .........................................................................................
POSITION() ...........................................................................................................
POWER() ...............................................................................................................
vii
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
228
230
231
232
233
234
235
236
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
Using VoltDB
QUARTER() ...........................................................................................................
REGEXP_POSITION() .............................................................................................
REPEAT() ..............................................................................................................
REPLACE() ............................................................................................................
RIGHT() ................................................................................................................
SECOND() .............................................................................................................
SET_FIELD() ..........................................................................................................
SINCE_EPOCH() ....................................................................................................
SPACE() ................................................................................................................
SQRT() ..................................................................................................................
SUBSTRING() ........................................................................................................
SUM() ...................................................................................................................
TO_TIMESTAMP() .................................................................................................
TRIM() ..................................................................................................................
TRUNCATE() .........................................................................................................
UPPER() ................................................................................................................
VALIDPOLYGONFROMTEXT() ..............................................................................
WEEK(), WEEKOFYEAR() ......................................................................................
WEEKDAY() ..........................................................................................................
YEAR() ..................................................................................................................
D. VoltDB CLI Commands ...............................................................................................
csvloader ................................................................................................................
jdbcloader ...............................................................................................................
kafkaloader .............................................................................................................
sqlcmd ...................................................................................................................
voltadmin ...............................................................................................................
voltdb ....................................................................................................................
E. Deployment File (deployment.xml) .................................................................................
E.1. Understanding XML Syntax ................................................................................
E.2. The Structure of the Deployment File ...................................................................
F. VoltDB Datatype Compatibility ......................................................................................
F.1. Java and VoltDB Datatype Compatibility ...............................................................
G. System Procedures .......................................................................................................
@AdHoc ................................................................................................................
@Explain ...............................................................................................................
@ExplainProc .........................................................................................................
@GetPartitionKeys ...................................................................................................
@Pause ..................................................................................................................
@Promote ..............................................................................................................
@Quiesce ...............................................................................................................
@Resume ...............................................................................................................
@Shutdown ............................................................................................................
@SnapshotDelete .....................................................................................................
@SnapshotRestore ...................................................................................................
@SnapshotSave .......................................................................................................
@SnapshotScan .......................................................................................................
@SnapshotStatus .....................................................................................................
@Statistics ..............................................................................................................
@StopNode ............................................................................................................
@SystemCatalog ......................................................................................................
@SystemInformation ................................................................................................
@UpdateApplicationCatalog ......................................................................................
@UpdateClasses ......................................................................................................
@UpdateLogging .....................................................................................................
viii
258
259
260
261
262
263
264
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
284
287
290
294
297
302
302
302
306
306
309
310
311
312
313
315
316
317
318
319
320
322
324
328
331
333
348
350
355
357
360
362
List of Figures
1.1. Partitioning Tables ........................................................................................................ 2
1.2. Serialized Processing ..................................................................................................... 3
1.3. Replicating Tables ......................................................................................................... 4
4.1. Components of a Database Schema ................................................................................ 15
4.2. Partitions Distribute Table Data and Stored Procedure Processing ........................................ 16
4.3. Diagram Representing the Flight Reservation System ........................................................ 18
5.1. Array of VoltTable Structures ....................................................................................... 36
5.2. One VoltTable Structure is returned for each Queued SQL Statement .................................... 37
5.3. Stored Procedures Execute in the Appropriate Partition Based on the Partitioned Parameter
Value ............................................................................................................................... 42
8.1. The Structure of the VoltDB JSON Response ................................................................... 69
10.1. K-Safety in Action ..................................................................................................... 77
10.2. Network Partition ...................................................................................................... 81
10.3. Network Fault Protection in Action ............................................................................... 83
11.1. Passive Database Replication ....................................................................................... 84
11.2. Cross Datacenter Replication ....................................................................................... 85
11.3. Replicating an Existing Database .................................................................................. 87
11.4. Promoting the Replica ................................................................................................ 88
11.5. Read-Only Access to the Replica ................................................................................. 94
11.6. Transaction Order and Conflict Resolution ..................................................................... 98
14.1. Command Logging in Action ..................................................................................... 118
14.2. Recovery in Action .................................................................................................. 119
15.1. Overview of the Export Process ................................................................................. 124
15.2. Flight Schema with Export Streams ............................................................................. 125
E.1. Deployment XML Structure ........................................................................................ 303
ix
List of Tables
2.1. Operating System and Software Requirements ................................................................... 5
2.2. Components Installed by VoltDB ..................................................................................... 7
4.1. Example Application Workload ..................................................................................... 18
5.1. Methods of the VoltTable Classes .................................................................................. 38
8.1. Datatypes in the JSON Interface .................................................................................... 68
11.1. Structure of the XDCR Conflict Logs .......................................................................... 103
12.1. Named Security Permissions ...................................................................................... 107
15.1. File Export Properties ............................................................................................... 130
15.2. HTTP Export Properties ............................................................................................ 132
15.3. JDBC Export Properties ............................................................................................ 136
15.4. Kafka Export Properties ............................................................................................ 138
15.5. RabbitMQ Export Properties ...................................................................................... 140
15.6. Elasticsearch Export Properties ................................................................................... 142
15.7. Kafka Import Properties ............................................................................................ 145
A.1. Supported SQL Datatypes .......................................................................................... 160
C.1. Selectable Values for the EXTRACT Function ............................................................... 226
E.1. Deployment File Elements and Attributes ...................................................................... 304
F.1. Java and VoltDB Datatype Compatibility ....................................................................... 306
G.1. @SnapshotSave Options ............................................................................................. 324
List of Examples
4.1.
5.1.
5.2.
5.3.
xi
17
33
36
39
Explains what VoltDB is, how it works, how to install it, and how to
start using VoltDB. The chapters in this section are:
Chapter 1, Overview
Chapter 2, Installing VoltDB
Chapter 3, Starting the Database
Part 2: Developing VoltDB Data- Describes how to design and develop applications using VoltDB. The
base Applications
chapters in this section are:
Chapter 4, Designing the Database Schema
Chapter 5, Designing Stored Procedures to Access the Database
Chapter 6, Designing VoltDB Client Applications
Chapter 7, Simplifying Application Development
Chapter 8, Using VoltDB with Other Programming Languages
Part 3: Running VoltDB in a Clus- Describes additional features useful for running a database in a cluster.
ter
The chapters in this section are:
Chapter 9, Using VoltDB in a Cluster
Chapter 10, Availability
Chapter 11, Database Replication
Chapter 12, Security
Part 4: Managing the Data
xii
xiii
Chapter 1. Overview
1.1. What is VoltDB?
VoltDB is a revolutionary new database product. Designed from the ground up to be the best solution for
high performance business-critical applications, the VoltDB architecture is able to achieve 45 times higher
throughput than current database products. The architecture also allows VoltDB databases to scale easily
by adding processors to the cluster as the data volume and transaction requirements grow.
Current commercial database products are designed as general-purpose data management solutions. They
can be tweaked for specific application requirements. However, the one-size-fits-all architecture of traditional databases limits the extent to which they can be optimized.
Although the basic architecture of databases has not changed significantly in 30 years, computing has. As
have the demands and expectations of business applications and the corporations that depend on them.
VoltDB is designed to take full advantage of the modern computing environment:
VoltDB uses in-memory storage to maximize throughput, avoiding costly disk access.
Further performance gains are achieved by serializing all data access, avoiding many of the time-consuming functions of traditional databases such as locking, latching, and maintaining transaction logs.
Scalability, reliability, and high availability are achieved through clustering and replication across multiple servers and server farms.
VoltDB is a fully ACID-compliant transactional database, relieving the application developer from having
to develop code to perform transactions and manage rollbacks within their own application. By using
ANSI standard SQL for the schema definition and data access, VoltDB also reduces the learning curve
for experienced database designers.
Overview
To aid businesses that require both exceptional transaction performance and ad hoc reporting, VoltDB
includes integration functions so that historical data can be exported to an analytic database for larger
scale data mining.
1.3.1. Partitioning
In VoltDB, each stored procedure is defined as a transaction. The stored procedure (i.e. transaction) succeeds or rolls back as a whole, ensuring database consistency.
By analyzing and precompiling the data access logic in the stored procedures, VoltDB can distribute both
the data and the processing associated with it to the individual partitions on the cluster. In this way, each
partition contains a unique "slice" of the data and the data processing. Each node in the cluster can support
multiple partitions.
Overview
general rule of thumb, the more processors (and therefore the more partitions) in the cluster, the more transactions VoltDB completes per second, providing an easy, almost linear path for scaling an application's
capacity and performance.
When a procedure does require data from multiple partitions, one node acts as a coordinator and hands out
the necessary work to the other nodes, collects the results and completes the task. This coordination makes
multi-partitioned transactions slightly slower than single-partitioned transactions. However, transactional
integrity is maintained and the architecture of multiple parallel partitions ensures throughput is kept at a
maximum.
It is important to note that the VoltDB architecture is optimized for throughput over latency. The latency of
any one transaction (the time from when the transaction begins until processing ends) is similar in VoltDB
to other databases. However, the number of transactions that can be completed in a second (i.e. throughput)
is orders of magnitude higher because VoltDB reduces the amount of time that requests sit in the queue
waiting to be executed. VoltDB achieves this improved throughput by eliminating the overhead required
for locking, latching, and other administrative tasks.
Overview
VoltDB requires a 64-bit Linux-based operating system. Kits are built and
qualified on the following platforms:
CentOS version 6.6 or later, including 7.0
Red Hat (RHEL) version 6.6 or later, including 7.0
Ubuntu versions 12.04 and 14.04
Development builds are also available for Macintosh OS X 10.9 and later1.
CPU
Memory
4 Gbytes3
Java4
Required Software
NTP5
Python 2.6 or later release of 2.x
Installing VoltDB
5. NTP minimizes time differences between nodes in a database cluster, which is critical for VoltDB.
All nodes of the cluster should be configured to synchronize against the same NTP server. Using a
single local NTP server is recommended, but not required.
Installing VoltDB
The VoltDB sources are designed to build and run on 64-bit Linux-based or 64-bit Macintosh platforms.
However, the build process has not been tested on all possible configurations. Attempts to build the sources
on other operating systems may require changes to the build files and possibly to the sources as well.
Once you obtain the sources, use Ant 1.7 or later to build a new distribution kit for the current platform:
$ ant dist
The resulting distribution kit is created as obj/release/volt-n.n.nn.tar.gz where n.n.nn identifies the current version and build numbers. Use this file to install VoltDB according to the instructions
in Section 2.2, Installing VoltDB.
Description
Example Applications
Installing VoltDB
Component
Description
voltsvr:8080/. Note that the httpd server and
JSON interface must be enabled on the server to be
able to access the Management Center.
Shell Commands
Documentation
Important
If the database you are working on has stopped, use voltdb recover to restart it. Do not rerun
the voltdb create command or your schema and data will be reinitialized to an empty database.
Later in this chapter we explain how to safely stop and restart a VoltDB database.
deployment file and naming voltsvr1 as the host node. Be sure the number of nodes on which you run the
command match the number of nodes defined in the deployment file.
$ voltdb create --deployment=deployment.xml -host=voltsvr1
Or you can also use shortened forms for the argument flags:
$ voltdb create -d deployment.xml -H voltsvr1
VoltDB looks for the license file on the host as a file named license.xml in three locations, in the
following order:
1. The current working directory
2. The directory where the VoltDB image files are installed (usually in the /voltdb subfolder of the
installation directory)
3. The current user's home directory
If the license file is not in any of these locations, you must explicitly identify it when you run the voltdb
command on the host node using the --license or -l flag. For example, the command on the host
node might be:
$ voltdb create -d deployment.xml -H voltsvr1 \
-l /usr/share/voltdb-license.xml
When starting a VoltDB database on a cluster, the VoltDB server process performs the following actions:
1. If you are starting the database on the node identified as the host node, it waits for initialization messages
from the remaining nodes. The host can be any node in the cluster and plays a special role during startup
by managing the cluster initiation process. It is important that all nodes in the cluster can resolve the
hostname or IP address of the host node you specify.
2. If you are starting the database on a non-host node, it sends an initialization message to the host indicating that it is ready. The database is not operational until the correct number of nodes (as specified
in the deployment file) have connected.
3. Once all the nodes have sent initialization messages, the host sends out a message to the other nodes that
the cluster is complete. Once the startup procedure is complete, the host's role is over and it becomes a
peer like every other node in the cluster. It performs no further special functions.
Manually logging on to each node of the cluster every time you want to start the database can be tedious.
Instead, you can use secure shell (ssh) to execute shell commands remotely. By creating an ssh script (with
the appropriate permissions) you can copy files and/or start the datab ase on each node in the cluster from
a single script.
10
Use the voltdb rejoin command to restart a node that was previously part of the cluster but had stopped
running. See Section 10.3, Recovering from System Failures.
11
-H voltsvr1 \
-l /usr/share/voltdb-license.xml
12
changed. However, for systems with a large number of available processes (16 or more) or older machines
with fewer than 8 processors and limited memory, you may wish to tune the sitesperhost attribute.
The number of sites needed per node is related to the number of processor cores each system has, the
optimal number being approximately 3/4 of the number of CPUs reported by the operating system. For
example, if you are using a cluster of dual quad-core processors (in other words, 8 cores per node), the
optimal number of partitions is likely to be 6 or 7 sites per node.
<?xml version="1.0"?>
<deployment>
<cluster . . .
sitesperhost="6"
/>
</deployment>
For systems that support hyperthreading (where the number of physical cores support twice as many
threads), the operating system reports twice the number of physical cores. In other words, a dual quadcore system would report 16 virtual CPUs. However, each partition is not quite as efficient as on nonhyperthreading systems. So the optimal number of sites is more likely to be between 10 and 12 per node
in this situation.
Because there are no hard and set rules, the optimal number of sites per node is best calculated by actually
benchmarking the application to see what combination of cores and sites produces the best results. However, it is important to remember that all nodes in the cluster will use the same number of sites. So the best
performance is achieved by using a cluster with all nodes having the same physical architecture (i.e. cores).
13
<snapshots>
If you name a specific feature path and it does not exist, VoltDB will attempt to create it for you. For
example, the <exportoverflow> path contains temporary data which can be deleted periodically.
The following excerpt from a deployment file specifies /opt/voltdb as the default root but /opt/
overflow as the directory for export overflow.
<paths>
<voltdbroot path="/opt/voltdb" />
<exportoverflow path="/opt/overflow" />
</paths>
14
Along with designing your database tables, an important aspect of VoltDB database design is partitioning,
which provides much more efficient access to data and processing. Partitioning distributes the rows of a
table and the processing to access the table across several, independent partitions instead of one. Your
design requires coordinating the partitioning of both database tables and the stored procedures that access
the tables. At design time you choose a column on which to partition a table's rows. You also partition
stored procedures on the same column if they use the column to identify which rows to operate on in the
table.
At runtime, VoltDB decides which cluster nodes and partitions to use for the table partitions and consistently allocates rows to the appropriate partition. Figure 4.2, Partitions Distribute Table Data and Stored
Procedure Processing shows how when data is inserted into a partitioned table, VoltDB automatically
allocates the data to the correct partition. Also, when a partitioned stored procedure is invoked, VoltDB
automatically executes the stored procedure in the single partition that has the data requested.
15
Figure 4.2. Partitions Distribute Table Data and Stored Procedure Processing
The following sections of this chapter provide guidelines for designing VoltDB database schemas. Although gathering business requirements is a typical first step in database application design, it is outside
the scope of this guide.
16
7>
PRIMARY KEY(FlightID)
8> );
The following sections show how to design and create schema objects. DDL statements and techniques
for changing a schema are described later in Section 4.6, Modifying the Schema.
17
Frequency
10,000/sec
5,000/sec
Make a reservation
1,000/sec
Cancel a reservation
200/sec
200/sec
100/sec
1/sec
1/sec
You can make your procedures that access the database transactional by defining them as VoltDB stored
procedures. This means each stored procedure call completes or rolls back if necessary, thus maintaining
data integrity. Stored procedures are described in detail in Chapter 5, Designing Stored Procedures to
Access the Database.
In our analysis we also need to consider referential integrity, where relationships are maintained between
tables with shared columns that link tables together. For example, Figure 4.3, Diagram Representing the
Flight Reservation System shows that the Flight table links to the Reservation table where FlightID is
the shared column. Similarly, the Customer table links to the Reservation table where CustomerID is the
common column.
18
Since VoltDB stored procedures are transactional, you can use stored procedures to maintain referential
integrity between tables as data is added or removed. For example, if a customer record is removed from the
Customer table, all reservations for that customer need to be removed from the Reservations table as well.
With VoltDB, you use all this additional information about volume and workload to configure the database
and optimize performance. Specifically, you want to partition the individual tables to ensure efficiency.
Partitioning is described next.
Moving to the Customer table, CustomerID is used for most data access. Although customers might need
to look up their record by name, the first and last names are not guaranteed to be unique. Therefore,
CustomerID is the best column to use for partitioning the Customer table.
CREATE TABLE Customer (
CustomerID INTEGER UNIQUE NOT NULL,
19
FirstName VARCHAR(15),
LastName VARCHAR (15),
PRIMARY KEY(CustomerID)
);
20
The previous section describes how to partition the Reservation and Customer tables as examples, but what
about the Flight table? It is possible to partition the Flight table (for example, on the FlightID column).
However, not all tables benefit from partitioning.
Fortunately, the number of flights available for booking at any given time is limited (estimated at 2,000)
and so the size of the table is relatively small (approximately 36 megabytes). In addition, the vast majority
of the transactions involving the Flight table are read-only except when new flights are added and at takeoff (when the records are deleted). Therefore, Flight is a good candidate for replication.
Note that the Customer table is also largely read-only. However, because of the volume of data in the
Customer table (a million records), it is not a good candidate for replication, which is why it is partitioned.
21
VoltDB safely handles sqlcmd DDL entered by different users on different nodes of the cluster because
it manages sqlcmd commands as transactions, just like stored procedures. To demonstrate the DDL statements to modify the schema, the following sections use a new table, Airport, added to the fight reservation
as shown below:
CREATE TABLE Airport (
AirportID integer NOT NULL,
Name varchar(15) NOT NULL,
City varchar(25),
Country varchar(15),
PRIMARY KEY (AirportID)
);
22
Plan and test carefully before making schema changes to a production database. Be aware that clients may
experience connection issues during schema changes, especially for changes that take longer to complete,
such as view or index changes.
Schema changes not only affect data, but the existence of data in the database affects the time it takes to
process schema changes. For example, when there are large amounts of data, some DDL statements can
block processing, resulting in a noticeable delay for other pending transactions. Examples include adding
indexes, creating new table columns, and modifying views.
23
24
BEFORE column-name Table columns cannot be reordered but the BEFORE clause allows
you to place a new column in a specific position with respect to the existing columns of the table.
Drop table columns. In our example we drop the AirportID column because we are replacing it with
the AirportCode column.
You cannot remove a column that has a reference to it. You have to remove all references to the
column first. References to a column may include:
A stored procedure
An index
A view
25
TABLE statement to add or drop these table constraints along with their associated indexes, as shown in
Section 4.6.3, Modifying Tables.
26
27
In our example so far, we have three stored procedures that are adequate to access the Airport table, so
no additional procedures need to be partitioned:
VoltDB automatically defined a default select stored procedure, which is partitioned on the AirportCode column. It takes an AirportCode as input and returns a table structure containing the AirportCode, Name, City, and Country.
The FindAirportCodeByName stored procedure should remain multi-partitioned because it needs to
search in all partitions.
The FindAirportCodeByCity stored procedure should also remain multi-partitioned because it needs
to search in all partitions.
29
One side effect of transactions being precompiled as stored procedures is that external transaction management frameworks, such as Spring or
JEE, are not supported by VoltDB.
30
31
32
33
BigDecimal
Timestamp types
org.voltdb.types.TimestampType
java.util.Date, java.sql.Date, java.sql.Timestamp
VoltDB type
VoltTable
The arguments can be scalar objects or arrays of any of the preceding types. For example, the following
run() method defines three arguments: a scalar long and two arrays, one array of timestamps and one
array of Strings:
import org.voltdb.*;
public class LogMessagesByEvent extends VoltProcedure {
public long run (
long eventType,
org.voltdb.types.TimestampType[] eventTimeStamps,
String[] eventMessages
) throws VoltAbortException {
The calling client application can use any of the preceding datatypes when invoking the callProcedure() method and, where necessary, VoltDB makes the appropriate type conversions (for example,
34
35
36
Figure 5.2. One VoltTable Structure is returned for each Queued SQL Statement
VoltDB provides a set of convenience methods for accessing the contents of the VoltTable array. Table 5.1, Methods of the VoltTable Classes lists some of the most common methods. (See also Java Stored
Procedure API.)
37
Description
int getRowCount()
int getColumnCount()
Methods of VoltTable.Row
Return the value of the column at the specified index
in the appropriate datatype. Because the datatype of
the columns vary depending on the SQL query, there
is no generic method for returning the value. You
must specify what datatype to use when fetching the
value.
It is also possible to retrieve the column values by name. You can invoke any of the getDatatype() methods
and pass a string argument specifying the name of the column, rather than the numeric index. Accessing
the columns by name can make code easier to read and less susceptible to errors due to changes in the
SQL schema (such as changing the order of the columns). On the other hand, accessing column values by
numeric index is potentially more efficient under heavy load conditions.
Example 5.3, Displaying the Contents of VoltTable Arrays shows a generic routine for walking
through the return results of a stored procedure. In this example, the contents of the VoltTable array
are written to standard output.
38
39
40
41
PROCEDURE
PROCEDURE
PROCEDURE
PROCEDURE
PROCEDURE
FROM
FROM
FROM
FROM
FROM
CLASS
CLASS
CLASS
CLASS
CLASS
fadvisor.procedures.LookupFlight;
fadvisor.procedures.HowManySeats;
fadvisor.procedures.MakeReservation;
fadvisor.procedures.CancelReservation;
fadvisor.procedures.RemoveFlight;
For some situations, you can create stored procedures directly in the schema using SQL instead of loading
Java code. See how to use the CREATE PROCEDURE AS statement in Section 7.2, Shortcut for Defining
Simple Stored Procedures.
For more about modifying a schema with DDL, see Section 4.6, Modifying the Schema.
Figure 5.3. Stored Procedures Execute in the Appropriate Partition Based on the
Partitioned Parameter Value
42
Caution
It is the application developer's responsibility to ensure that the queries in a single-partitioned
stored procedure are truly single-partitioned. VoltDB does not warn you about SELECT or
DELETE statements that might return incomplete results. For example, if your single-partitioned
procedure attempts to operate on a range of values for the partitioning column, the range is incomplete and includes only a subset of the table data that is in the current partition.
VoltDB does generate a runtime error if you attempt to INSERT a row that does not belong in
the current partition.
After you partition a procedure, your stored procedure can operate on only those records in the partitioned
table that are identified by the partitioning column, in this example the RESERVATION table identified
43
44
45
The application can use error handling to detect and recover from broken connections, as described in
Section 6.5.2, Handling Timeouts. Or you can enable auto-reconnecting when you initialize the client
object. You set auto-reconnecting in the client configuration before creating the client object, as in the
following example:
org.voltdb.client.Client client = null;
ClientConfig config = new ClientConfig("","");
config.setReconnectOnConnectionLoss(true);
try {
client = ClientFactory.createClient(config);
client.createConnection("server1.xyz.net");
client.createConnection("server2.xyz.net");
client.createConnection("server3.xyz.net");
. . .
When setReconnectOnConnectionLoss() is set to true, the client library will attempt to reestablish lost connections, attempts starting every second and backing off to every eight seconds. As soon as the
connection is reestablished, the reconnected server will begin to receive its share of the procedure calls.
47
Asynchronous Invocation
To invoke stored procedures asynchronously, use the callProcedure() method with an additional
first argument, a callback that will be notified when the procedure completes (or an error occurs). For example, to invoke a NewCustomer() stored procedure asynchronously, the call to callProcedure()
might look like the following:
client.callProcedure(new MyCallback(),
"NewCustomer",
firstname,
lastname,
custID};
The following are other important points to note when making asynchronous invocations of stored procedures:
Asynchronous calls to callProcedure() return control to the calling application as soon as the
procedure call is queued.
If the database server queue is full, callProcedure() will block until it is able to queue the procedure call. This is a condition known as backpressure. This situation does not normally happen unless the
database cluster is not scaled sufficiently for the workload or there are abnormal spikes in the workload.
See Section 6.5.3, Writing a Status Listener to Interpret Other Errors for more information.
Once the procedure is queued, any subsequent errors (such as an exception in the stored procedure itself
or loss of connection to the database) are returned as error conditions to the callback procedure.
Callback Implementation
The callback procedure (MyCallback() in this example) is invoked after the stored procedure completes
on the server. The following is an example of a callback procedure implementation:
static class MyCallback implements ProcedureCallback {
@Override
public void clientCallback(ClientResponse clientResponse) {
if (clientResponse.getStatus() != ClientResponse.SUCCESS) {
System.err.println(clientResponse.getStatusString());
} else {
myEvaluateResultsProc(clientResponse.getResults());
}
}
}
The callback procedure is passed the same ClientResponse structure that is returned in a synchronous
invocation. ClientResponse contains information about the results of execution. In particular, the
methods getStatus() and getResults() let your callback procedure determine whether the stored
procedure was successful and evaluate the results of the procedure.
The VoltDB Java client is single threaded, so callback procedures are processed one at a time. Consequently, it is a good practice to keep processing in the callback to a minimum, returning control to the main
thread as soon as possible. If more complex processing is required by the callback, creating a separate
thread pool and spawning worker methods on a separate thread from within the asynchronous callback
is recommended.
48
49
System.err.println(clientResponse.getStatusString());
} else {
if (clientResponse.getAppStatus() == AppCodeFuzzy) {
System.err.println(clientResponse.getAppStatusString());
};
myEvaluateResultsProc(clientResponse.getResults());
}
}
}
The getStatus() method tells you whether the stored procedure completed successfully and, if
not, what type of error occurred. It is good practice to always check the status of the ClientResponse before evaluating the results of a procedure call, because if the status is anything but SUCCESS, there will not be any results returned. The possible values of getStatus() are:
CONNECTION_LOST The network connection was lost before the stored procedure returned
status information to the calling application. The stored procedure may or may not have completed
successfully.
CONNECTION_TIMEOUT The stored procedure took too long to return to the calling application. The stored procedure may or may not have completed successfully. See Section 6.5.2,
Handling Timeouts for more information about handling this condition.
GRACEFUL_FAILURE An error occurred and the stored procedure was gracefully rolled
back.
RESPONSE_UNKNOWN This is a rare error that occurs if the coordinating node for the
transaction fails before returning a response. The node to which your application is connected
cannot determine if the transaction failed or succeeded before the coordinator was lost. The best
course of action, if you receive this error, is to use a new query to determine if the transaction
failed or succeeded and then take action based on that knowledge.
SUCCESS The stored procedure completed successfully.
UNEXPECTED_FAILURE An unexpected error occurred on the server and the procedure
failed.
USER_ABORT The code of the stored procedure intentionally threw a UserAbort exception
and the stored procedure was rolled back.
If a getStatus() call identifies an error status other than SUCCESS, you can use the getStatusString() method to return a text message providing more information about the specific error
that occurred.
If you want the stored procedure to provide additional information to the calling application, there are
two more methods to the ClientResponse that you can use. The methods getAppStatus()
and getAppStatusString() act like getStatus() and getStatusString(), but rather
than returning information set by VoltDB, getAppStatus() and getAppStatusString()
return information set in the stored procedure code itself.
In the stored procedure, you can use the methods setAppStatusCode() and setAppStatusString() to set the values returned to the calling application by the stored procedure. For
example:
/* stored procedure code */
50
51
if (response.getStatus() == ClientResponse.CONNECTION_TIMEOUT) {
System.out.println("A procedure invocation has timed out.");
return;
};
if (response.getStatus() == ClientResponse.CONNECTION_LOST) {
System.out.println("Connection lost before procedure response.");
return;
};
Set a status listener to receive the results of any procedure invocations that complete after the client
interface times out. See the following Section 6.5.3, Writing a Status Listener to Interpret Other Errors
for an example of creating a status listener for delayed procedure responses.
52
For the sake of example, the following status listener does little more than display a message on standard
output. However, in real world applications the listener would take appropriate actions based on the circumstances.
/*
* Declare the status listener
*/
ClientStatusListenerExt mylistener = new ClientStatusListenerExt()
{
@Override
public void connectionLost(String hostname, int port,
int connectionsLeft,
DisconnectCause cause)
{
System.out.printf("A connection to the database has been lost."
+ "There are %d connections remaining.\n", connectionsLeft);
}
@Override
public void backpressure(boolean status)
{
System.out.println("Backpressure from the database "
+ "is causing a delay in processing requests.");
}
@Override
public void uncaughtException(ProcedureCallback callback,
ClientResponse r, Throwable e)
{
System.out.println("An error has occurred in a callback "
+ "procedure. Check the following stack trace for details.");
e.printStackTrace();
}
@Override
public void lateProcedureResponse(ClientResponse response,
String hostname, int port)
{
System.out.printf("A procedure that timed out on host %s:%d"
+ " has now responded.\n", hostname, port);
}
};
/*
* Declare the client configuration, specifying
* a username, a password, and the status listener
*/
ClientConfig myconfig = new ClientConfig("username",
"password",
mylistener);
/*
* Create the client using the specified configuration.
*/
Client myclient = ClientFactory.createClient(myconfig);
By performing the operations in the order as described here, you ensure that all connections to the VoltDB
database cluster use the same credentials for authentication and will notify the status listener of any error
conditions outside of normal procedure execution.
53
Declare a ClientStatusListenerExt listener callback. Define the listener before you define
the VoltDB client or open a connection.
The ClientStatusListenerExt interface has four methods that you can implement, one for
each type of error situation:
connectionLost()
backpressure()
uncaughtException()
lateProcedureResponse()
Define the client configuration ClientConfig object. After you declare your ClientStatusListenerExt, you define a ClientConfig object to use for all connections, which includes
the username, password, and status listener. This configuration is then used to define the client next.
Create a client with the specified configuration.
MyClientApp
If you develop your application using one of the sample applications as a template, the run.sh file
manages this dependency for you.
54
55
The parameters are the table columns, in the same order as defined in the
schema.
VoltDB defines default update, upsert, and delete stored procedures if the table has a primary key:
HELLOWORLD.update The parameters are the new column values, in the order defined by the schema,
followed by the primary key column values. This means the primary key column values are specified twice: once as their corresponding new column values and once as the primary key value.
HELLOWORLD.upsert The parameters are the table columns, in the same order as defined in the
schema.
HELLOWORLD.delete The parameters are the primary key column values, listed in the order they
appear in the primary key definition.
56
VoltDB defines a default select stored procedure if the table has a primary key and the table is partitioned:
HELLOWORLD.select
The parameters are the primary key column values, listed in the order they
appear in the primary key definition.
Use the sqlcmd command show procedures to list all the stored procedures available including the number
and type of parameters required. Use @SystemCatalog with the PROCEDURECOLUMNS selector
to show more details about the order and meaning of each procedure's parameters.
The following code example uses the default procedures for the HELLOWORLD table to insert, retrieve
(select), update, and delete a new record with the key value "American":
VoltTable[] results;
client.callProcedure("HELLOWORLD.insert",
"American","Howdy","Earth");
results = client.callProcedure("HELLOWORLD.select",
"American").getResults();
client.callProcedure("HELLOWORLD.update",
"American","Yo","Biosphere",
"American");
client.callProcedure("HELLOWORLD.delete",
"American");
57
Description
EXPECT_EMPTY
58
Expectation
Description
EXPECT_ONE_ROW
EXPECT_ZERO_OR_ONE_ROW
EXPECT_NON_EMPTY
EXPECT_SCALAR
The query must return a single value (that is, one row with one
column).
EXPECT_SCALAR_LONG
EXPECT_SCALAR_MATCH( long )
59
As with Java stored procedures, you must declare all SQL queries as SQLStmt objects at the beginning of the Groovy procedure.
You must also define a closure called transactOn, which is invoked the same way the run()
method is invoked in a Java stored procedure. This closure performs the actual work of the procedure
and can accept any arguments that the Java run method can accept. It can also return a VoltTable,
an array of VoltTable, or a long value.
End the DDL statement with three pound signs (###) after the Groovy code.
In addition, VoltDB provides special wrappers, tuplerator() and buildTable(), that help you
access VoltTable results and construct VoltTable structures from scratch. For example, the following code fragment shows the ContestantWinningStates() stored procedure from the Voter sample application (examples/voter) written in Groovy:
transactOn = { int contestantNumber, int max ->
voltQueueSQL(resultStmt)
results = []
state = ""
tuplerator(voltExecuteSQL()[0]).eachRow {
isWinning = state != it[1]
state = it[1]
if (isWinning && it[0] == contestantNumber) {
results << [state: state, votes: it[2]]
}
}
if (max > results.size) max = results.size
buildTable(state:STRING, num_votes:BIGINT) {
results.sort { a,b -> b.votes - a.votes }[0..<max].each {
row it.state, it.votes
}
}
}
60
61
<vector>
<boost/shared_ptr.hpp>
"Client.h"
"Table.h"
"TableIterator.h"
"Row.hpp"
"WireType.h"
"Parameter.hpp"
"ParameterSet.hpp"
"ProcedureCallback.hpp"
Once you have included all of the necessary declarations, there are three steps to using the interface to
interact with VoltDB:
1. Create and open a client connection
2. Invoke stored procedures
3. Interpret the results
The following sections explain how to perform each of these functions.
62
63
64
http://<server>:8080/api/1.0/
Arguments
Procedure=<procedure-name>
Parameters=<procedure-parameters>
User=<username for authentication>
Password=<password for authentication>
Hashedpassword=<Hashed password for authentication>
admin=<true|false>
jsonp=<function-name>
The arguments can be passed either using the GET or the POST method. For example, the following URL
uses the GET method (where the arguments are appended to the URL) to execute the system procedure
@SystemInformation on the VoltDB database running on node voltsvr.mycompany.com:
http://voltsvr.mycompany.com:8080/api/1.0/?Procedure=@SystemInformation
Note that only the Procedure argument is required. You can authenticate using the User and Password (or Hashedpassword) arguments if security is enabled for the database. Use Password to send
the password as plain text or Hashedpassword to send the password as an encoded string. (The hashed
password must be either a 40-byte hex-encoding of the 20-byte SHA-1 hash or a 64-byte hex-encoding
of the 32-byte SHA-256 hash.)2
You can also include the parameters on the request. However, it is important to note that the parameters
and the response returned by the stored procedure are JSON encoded. The parameters are an array (even
if there is only one element to that array) and therefore must be enclosed in square brackets. Also, although
there is an upper limit of 2 megabytes for the entire length of the parameter string, large parameter sets
must be sent using POST to avoid stricter limitations on allowable URL lengths.
The admin argument specifies whether the request is submitted on the standard client port (the default)
or the admin port (when you specify admin=true). When the database is in admin mode, the client
port is read-only; so you must submit write requests with admin=true or else the request is rejected
by the server.
The jsonp argument is provided as a convenience for browser-based applications (such as Javascript)
where cross-domain browsing is disabled. When you include the jsonp argument, the entire response is
wrapped as a function call using the function name you specify. Using this technique, the response is a
complete and valid Javascript statement and can be executed to create the appropriate language-specific
object. For example, calling the @Statistics system procedure in Javascript using the jQuery library looks
like this:
$.getJSON('http://myserver:8080/api/1.0/?Procedure=@Statistics' +
'&Parameters=["MANAGEMENT",0]&jsonp=?',
{},MyCallBack);
1
You can specify an alternate port for the JSON interface when you start the VoltDB server by including the port number as an attribute of the
<httpd> tag in the deployment file. For example: <httpd port="{port-number}">.
2
Hashing the password stops the text of your password from being detectable from network traffic. However, it does not make the database access
any more secure. To secure the transmission of credentials and data between client applications and VoltDB, use an SSL proxy server in front of
the database servers.
65
PHP
// Construct the procedure name, parameter list, and URL.
$voltdbserver = "http://myserver:8080/api/1.0/";
$proc = "Insert";
$a = array("Croatian","Pozdrav","Svijet");
$params = json_encode($a);
$params = urlencode($params);
$querystring = "Procedure=$proc&Parameters=$params";
// create a new cURL resource and set options
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $voltdbserver);
curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_FAILONERROR, 1);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $querystring);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
// Execute the request
66
Python
import urllib
import urllib2
import json
# Construct the procedure name, parameter list, and URL.
url = 'http://myserver:8080/api/1.0/'
voltparams = json.dumps(["Croatian","Pozdrav","Svijet"])
httpparams = urllib.urlencode({
'Procedure': 'Insert',
'Parameters' : voltparams
})
print httpparams
# Execute the request
data = urllib2.urlopen(url, httpparams).read()
# Decode the results
result = json.loads(data)
Perl
use LWP::Simple;
my $server = 'http://myserver:8080/api/1.0/';
# Insert "Hello World" in Croatian
my $proc = 'Insert';
my $params = '["Croatian","Pozdrav","Svijet"]';
my $url = $server . "?Procedure=$proc&Parameters=$params";
my $content = get $url;
die "Couldn't get $url" unless defined $content;
C#
using
using
using
using
System;
System.Text;
System.Net;
System.IO;
namespace hellovolt
{
class Program
{
static void Main(string[] args)
{
string VoltDBServer = "http://myserver:8080/api/1.0/";
string VoltDBProc = "Insert";
string VoltDBParams = "[\"Croatian\",\"Pozdrav\",\"Svijet\"]";
string Url = VoltDBServer + "?Procedure=" + VoltDBProc
+ "&Parameters=" + VoltDBParams;
67
// handle error
Console.WriteLine( ex.Message );
}
finally
{
if (reader != null)reader.Close();
if (response != null) response.Close();
}
}
}
}
How to Pass
68
Example
12345
How to Pass
Example
DOUBLE
123.45
BIGDECIMAL
TIMESTAMP
String
A quoted string
"I am a string"
appstatus
appstatusstring
exception
results
[
{ data
[
]
schema
[ name
type
]
status
}
]
status
statusstring
(integer, boolean)
(string)
(integer)
(array)
(object, VoltTable)
(array)
(any type)
(array)
(string)
(integer, enumerated)
(integer, boolean)
(integer)
(string)
}
The key components of the JSON response are the following:
appstatus
Returns additional information, provided by the application developer, about the success
or failure of the stored procedure. The values of appstatus and appstatusstring can be
set programmatically in the stored procedure. (See Section 6.5.1, Interpreting Execution
Errors for details.)
results
An array of objects representing the data returned by the stored procedure. This is an array
of VoltTable objects. If the stored procedure does not return a value (i.e. is void or null),
then results will be null.
data
69
Within each VoltTable, object schema is an array of objects with two elements: the name
of the field and the datatype of that field (encoded as an enumerated integer value).
status
Indicates the success or failure of the stored procedure. If status is false, statusstring contains the text of the status message..
It is possible to create a generic procedure for testing and evaluating the result values from any VoltDB
stored procedure. However, in most cases it is far more expedient to evaluate the values that you know
the individual procedures return.
For example, again using the Hello World example that is provided with the VoltDB software, it is possible
to use the JSON interface to call the Select stored procedure and return the values for "Hello" and "World"
in a specific language. Rather than evaluate the entire results array (including the name and type fields),
we know we are only receiving one VoltTable object with two string elements. So we can simplify the
code, as in the following python example:
import
import
import
import
urllib
urllib2
json
pprint
70
71
72
hostcount="5" />
Choose one of the nodes as the lead or "host" node and specify that node using the --host argument on
the start command
Issue the start command on all nodes of the cluster
For example, if you are creating a new five node cluster and choose node server3 as the host, you would
issue a command like the following on all five nodes:
$ voltdb create --host=server3 --deployment=deployment.xml
To restart a cluster using commands logs or automatic snapshots, you repeat this process replacing the
create action with recover:
$ voltdb recover --host=server3 --deployment=deployment.xml
In both cases you choose one node, any node, to act as the leader for initiating the cluster. Once the database
cluster is running the leader's special role is complete and all nodes are peers.
73
However, if you are simply adding nodes to the cluster to add capacity or increase performance, you can
add the nodes while the database is running. Adding nodes "on the fly" is also known as elastic scaling.
74
You can control how quickly the rebalance operation completes versus how much rebalance work impacts
ongoing client transactions using two attributes of the <elastic> element in the deployment file:
The duration attribute sets a target value for the length of time each rebalance transaction will take,
specified in milliseconds. The default is 50 milliseconds.
The throughput attribute sets a target value for the number of megabytes per second that will be
processed by the rebalance transactions. The default is 2 megabytes.
When you change the target duration, VoltDB adjusts the amount of data that is moved in each transaction
to reach the target execution time. If you increase the duration, the volume of data moved per transaction
increases. Similarly, if you reduce the duration, the volume per transaction decreases.
When you change the target throughput, VoltDB adjusts the frequency of rebalance transactions to achieve
the desired volume of data moved per second. If you increase the target throughout, the number of rebalance transactions per second increases. Similarly, if you decrease the target throughout, the number of
transactions decreases.
The <elastic> element is a child of the <systemsettings> element. For example, the following deployment
file sets the target duration to 15 milliseconds and the target throughput to 1 megabyte per second before
starting the database:
<deployment>
. . .
<systemsettings>
<elastic duration="15" throughput="1"/>
</systemsettings>
</deployment>
75
76
Availability
To achieve K=1, it is necessary to duplicate all partitions. (If you don't, failure of a node that contains a
non-duplicated partition would cause the database to fail.) Similarly, K=2 requires two duplicates of every
partition, and so on.
What happens during normal operations is that any work assigned to a duplicated partition is sent to all
copies (as shown in Figure 10.1, K-Safety in Action). If a node fails, the database continues to function
sending the work to the unaffected copies of the partition.
77
Availability
The important point to note when setting the K value is that, if you do not change the hardware configuration, you are dividing the available partitions among the duplicate copies. Therefore performance (and
capacity) will be proportionally decreased as K-safety is increased. So running K=1 on a 6-node cluster
will be approximately equivalent to running a 3-node cluster with K=0.
If you wish to increase reliability without impacting performance, you must increase the cluster size to
provide the appropriate capacity to accommodate for K-safety.
78
Availability
rejoin --host=myclusternode5 \
--deployment=mydeployment.xml
Note that the node you specify may be any active cluster node; it does not have to be the node identified as
the host when the cluster was originally started. Also, the deployment file you specify must be the currently
active deployment settings for the running database cluster.
79
Availability
By default, VoltDB performs live rejoins, allowing the work of the database to continue. If, for any reason,
you choose to perform a blocking rejoin, you can do this by using the --blocking flag on the command
line. For example, the following command performs a blocking rejoin to the database cluster including
the node myclusternode5:
$ voltdb rejoin --blocking --host=myclusternode5 \
--deployment mydeployment.xml
In rare cases, if the database is near capacity in terms of throughput, a live rejoin cannot keep up with the
ongoing changes made to the data. If this happens, VoltDB reports that the live rejoin cannot complete and
you must wait until database activity subsides or you can safely perform a blocking rejoin to reconnect
the server.
It is important to remember that the cluster is not fully K-safe until the restoration is complete. For example,
if the cluster was established with a K-safety value of two and one node failed, until that node rejoins and
is updated, the cluster is operating with a K-safety value of one. Once the node is up to date, the cluster
becomes fully operational and the original K-safety is restored.
80
Availability
execute system procedures. If not, the rejoin request will be rejected and an appropriate error message
displayed.
The problem is that you never want two separate copies of the database continuing to operate and accepting
requests thinking they are the only viable copy. If the cluster is physically on a single network switch,
81
Availability
the threat of a network partition is reduced. But if the cluster is on multiple switches, the risk increases
significantly and must be accounted for.
82
Availability
For example, in the case shown in Figure 10.2, Network Partition, if a network partition separates nodes
A and B from C, the larger segment (nodes A and B) will continue to run and node C will write a snapshot
and shutdown (as shown in Figure 10.3, Network Fault Protection in Action).
If a network partition creates two viable segments of the same size (for example, if a four node cluster
is split into two two-node segments), a special case is invoked where one segment is uniquely chosen
to continue, based on the internal numbering of the host nodes. Thereby ensuring that only one viable
segment of the partitioned database continues.
Network fault protection is a very valuable tool when running VoltDB clusters in a distributed or uncontrolled environment where network partitions may occur. The one downside is that there is no way to differentiate between network partitions and actual node failures. In the case where network fault protection
is turned on and no network partition occurs but a large number of nodes actually fail, the remaining nodes
may believe they are the smaller segment. In this case, the remaining nodes will shut themselves down
to avoid partitioning.
For example, in the previous case shown in Figure 10.3, Network Fault Protection in Action, if rather
than a network partition, nodes A and B fail, node C is the only node still running. Although node C
is viable and could continue because the cluster was started with K-safety set to 2, if fault protection is
enabled node C will shut itself down to avoid a partition.
In the worst case, if half the nodes of a cluster fail, the remaining nodes may actually shut themselves down
under the special provisions for a network partition that splits a cluster into two equal parts. For example,
consider the situation where a two node cluster with a k-safety value of one has network partition detection
enabled. If one of the nodes fails (half the cluster), there is only a 50/50 chance the remaining node is the
"blessed" node chosen to continue under these conditions. If the remaining node is not the chosen node, it
will shut itself down to avoid a conflict, taking the database out of service in the process.
Because this situation a 50/50 split could result in either a network partition or a viable cluster
shutting down, VoltDB recommends always using network partition detection and using clusters with an
odd number of nodes. By using network partitioning, you avoid the dangers of a partition. By using an
odd number of servers, you avoid even the possibility of a 50/50 split, whether caused by partitioning or
node failures.
83
Cross Datacenter Replication (XDCR), or active replication, copies changes in both directions. It is possible for client applications to perform read/write operations on either cluster and changes in one database
are then copied and applied to the other database. Figure 11.2, Cross Datacenter Replication shows how
XDCR can support client applications attached to each database instance.
84
Database Replication
Database replication (DR) provides two key business advantages. The first is protecting your business
data against catastrophic events, such as power outages or natural disasters, which could take down an
entire cluster. This is often referred to as disaster recovery. Because the two clusters can be in different
geographic locations, both passive DR and XDCR allow one of the clusters to continue unaffected when
the other becomes inoperable. Because the replica is available for read-only transactions, passive DR also
allows you to offload read-only workloads, such as reporting, from the main database instance.
The second business issue that DR addresses is the need to maintain separate, active copies of the database in two separate locations. For example, XDCR allows you to maintain copies of a product inventory
database at two separate warehouses, close to the applications that need the data. This feature makes it
possible to support massive numbers of clients that could not be supported by a single database instance
or might result in unacceptable latency when the database and the users are geographically separated. The
databases can even reside on separate continents.
It is important to note, however, that database replication is not instantaneous. The transactions are committed locally, then copied to the other database. So when using XDCR to maintain two active clusters you
must be careful to design your applications to avoid possible conflicts when transactions change the same
record in the two databases at approximately the same time. See Section 11.3.5, Understanding Conflict
Resolution for more information about conflict resolution.
The remainder of this chapter discusses the following topics:
Section 11.1, How Database Replication Works
Section 11.2, Using Passive Database Replication
Section 11.3, Using Cross Datacenter Replication
Section 11.4, Monitoring Database Replication
85
Database Replication
DR TABLE contestants;
DR TABLE votes;
DR TABLE area_code_state;
86
Database Replication
For passive DR, only the master database can have existing data before starting replication for the first
time. The replica's DR tables must be empty. For XDCR, only one of the two databases can have data in
the DR tables. If both clusters contain data, replication cannot start. Once DR has started, the databases
can stop and recover using command logging without having to restart DR from the beginning.
87
Database Replication
See Section 11.2.5.3, Promoting the Replica When the Master Becomes Unavailable for more information on promoting the replica database.
88
Database Replication
The decision whether to promote the replica or wait for the master to return (and hopefully recover all
transactions from the command log) is not an easy one. Promoting the replica and using it to replace the
original master may involve losing one or more transactions per partition. However, if the master cannot
be recovered or cannot not be recovered quickly, waiting for the master to return can result in significant
business loss or interruption.
Your own business requirements and the specific situation that caused the outage will determine which
choice to make whether to wait for the failed cluster to recover or to continue operations on the remaining cluster only. The important point is that database replication makes the choice possible and significantly eases the dangers of unforeseen events.
89
Database Replication
it is easiest to have the master and replica databases use the exact same schema, that is not necessary. The
replica can have a subset or superset of the tables in the master, as long as it contains matching definitions
for all of the DR tables. The replica schema can even contain additional objects not in the master schema,
such as additional views. Which can be useful when using the replica for read-only or reporting workloads,
just as long as the DR tables match.
90
Database Replication
not have any schema defined yet. This is normal. The replica will periodically contact the master until the
schema for DR objects on the two databases match. This gives you time to load a matching schema.
As soon as the replica database has started, you can load the appropriate schema. Loading the same schema
as the master database is the easiest and recommended approach. The key point is that once a matching
schema is loaded, replication will begin automatically.
When replication starts, the following actions occur:
1. The replica and master databases verify that the DR tables match on the two clusters.
2. If data already exists in the DR tables on the master, the master sends a snapshot of the current contents
to the replica where it is restored into the appropriate tables.
3. Once the snapshot, if any, is restored, the master starts sending binary logs of changes to the DR tables
to the replica.
If any errors occur during the snapshot transmission, replication stops and must be restarted from the
beginning. However, once the third step is reached, replication proceeds independently for each unique
partition and, in a K safe environment, the DR process becomes durable across node failures and rejoins
and other non-fatal events.
If either the master or the replica database crashes and needs to restart, it is possible to restart DR where
it left off, assuming the databases are using command logging for recovery. If the master fails, you can
perform a voltdb recover action to restart the master database. The replica will wait for the master to
recover. The master will then replay any DR logs on disk and resume DR where it left off.
If the replica fails, the master will queue the DR logs to disk waiting for the replica to return. If you perform
a voltdb recover action, including the --replica flag, on the replica cluster, the replica will perform the
following actions:
1. Restart the replica database, restoring both the schema and the data, and placing the database in readonly mode.
2. Contact the master cluster and attempt to re-establish DR.
3. If both clusters agree on where (that is, what transaction), DR was interrupted, DR will resume from
that point, starting with the DR logs that the master database has queued in the interim.
Note that you must use the --replica flag when recovering the replica database if you want to resume DR
where it left off. For example:
$ voltdb recover --replica --deployment=dr-deploy.xml
If you do not include the --replica flag, the database will resume as a normal, read/write database and not
attempt to contact the master database. Also, if the clusters do not agree on where DR stopped during step
#3, the replica database will generate an error and stop replication. For example, if you recover from an
asynchronous command log where the last few DR logs were ACKed to the master but not written to the
command log, the master and the replica will be in different states when the replica recovers.
If this occurs, you must restart DR from the beginning, by creating a new, empty replica database and
reloading a compatible schema. Similarly, if you are not using command logging, you cannot recover the
replica database and must start DR from scratch.
91
Database Replication
individual partitions are replicating data independently, if possible you want to make sure all pending
transfers are completed before turning off replication.
So, under the best circumstances, you should perform the following steps to stop replication:
1. Stop write transactions on the master database by putting it in admin mode using the voltadmin pause
command.
2. Wait for all pending DR log transfers to be completed.
3. Reset DR on the master cluster using the voltadmin dr reset command.
4. Depending on your goals, either shut down the replica or promote it to a fully-functional database as
described in Section 11.2.5.3, Promoting the Replica When the Master Becomes Unavailable.
92
Database Replication
Once the master is offline and the replica is promoted, the data is no longer being replicated. As soon as
normal business operations have been re-established, it is a good idea to also re-establish replication. This
can be done using any of the following options:
If the original master database hardware can be restarted, take a snapshot of the current database (that
is, the original replica), restore the snapshot on the original master and redirect client traffic back to the
original. Replication can then be restarted using the original configuration.
An alternative, if the original database hardware can be restarted but you do not want to (or need to)
redirect the clients away from the current database, is to use the original master hardware to create
a replica of the newly promoted cluster essentially switching the roles of the master and replica
databases as described in Section 11.2.5.4, Reversing the Master/Replica Roles.
If the original master hardware cannot be recovered effectively, create a new database cluster in a third
location to use as a replica of the current database.
93
Database Replication
Important
XDCR is a separately licensed feature. If your current VoltDB license does not include a key for
XDCR you will not be able to complete the tasks described in this section. See your VoltDB sales
representative for more information on licensing XDCR.
94
Database Replication
95
Database Replication
To manage the DR process VoltDB needs to uniquely identify the clusters. You provide this unique identifier as a number between 0 and 127 when you configure the clusters. For example, if we assign ID=1
to a cluster in New York and ID=2 to another in Chicago, their respective deployment files must contain
the following <dr> elements:
New York Cluster
<dr id="1" />
Chicago Cluster
<dr id="2" />
96
Database Replication
You can then load the schema on both databases and perform any other preparatory work you require.
Then edit the deployment files filling in the source attribute for each cluster to point at the other. Then use
the voltadmin update command to update the deployment files on the running databases. As soon as the
source attribute is defined and the schema match, the DR process will begin.
Note
Although the source attribute can be modified on a running database, the unique cluster ID cannot
be changed after the database starts. So it is important to include the <dr> element with the unique
ID in the initial deployment file when starting the databases.
97
Database Replication
For example, say clusters A and B are processing transactions as shown in Figure 11.6, Transaction
Order and Conflict Resolution. Cluster A executes a transaction that modifies a specific record and this
transaction is included in the binary log A1. By the time cluster B receives the binary log and processes
A1, cluster B has already processed its own transactions B1 and B2. Those transactions may have modified
the same record as the transaction in A1, or another record that would conflict with the change in A1, such
as a matching unique index entry.
Under these conditions, cluster B cannot simply apply the changes in A1 because doing so could violate
the uniqueness constraints of the schema and, more importantly, is likely to result in the content of the
two database clusters diverging. Instead, cluster B must decide which change takes priority. That is, what
resolution to the conflict is most likely to produce meaningful results or match the intent of the business
application. This decision making process is called conflict resolution.
No matter what the resolution, it is important that the database administrators are notified of the conflict,
why it occurred, and what action was taken. The following sections explain:
How to avoid conflicts
How VoltDB resolves conflicts when they do occur
What types of conflicts can occur
How those conflicts are reported
98
Database Replication
After
UserID:
12345
Name:
Joe Smith
Password: abalone
UserID:
12345
Name:
Joseph Smith
Password: abalone
99
Database Replication
Before
After
UserID:
12345
Name:
Joe Smith
Password: abalone
UserID:
12345
Name:
Joe Smith
Password: flounder
When the binary log A1 arrives at cluster B, the DR process performs the following steps:
1. Uses the primary key (12345) to look up the current record in the database.
2. Compares the current timestamp in the database with the previous timestamp in the binary log.
3. Because the transaction in B1 has already been applied on cluster B, the time stamps do not match. A
conflict is recognized.
4. A primary key exists, so cluster B attempts to resolve the conflict by comparing the new timestamp,
10:15.00.003, to the current timestamp, 10:15.00.001.
5. Because the new timestamp is the later of the two, the new transaction "wins" and the change is applied
to the database.
6. Finally, the conflict and resolution is logged. (See Section 11.3.5.4, Reporting Conflicts for more
information about how conflicts are reported.)
Note that when the UPDATE from A1 is applied, the change to the password in B1 is overwritten and
the password is reset to "abalone". Which at first looks like a problem. However, when the binary log B1
arrives at cluster A, the same steps are followed. But when cluster A reaches steps #4 and 5, it finds that
the new timestamp from B1 is older than the current timestamp, and so the action is rejected and the record
is left unchanged. As a result both databases end up with the same value for the record. Essentially, the
password change is dropped.
If the transaction on cluster B had been to delete the user record rather than change the password, then
the outcome would be different, but still consistent. In that case, when binary log A1 reaches cluster B, it
would not be able to find the matching record in step #1. This is recognized as a DELETE action having
occurred. Since DELETE always wins, the incoming UPDATE is rejected. Similarly, when binary log B1
reaches cluster A, the previous timestamps do not match but, even though the incoming action in B1 has
an older timestamp than the UPDATE action in A1, B1 "wins" because it is a delete action and the record
is deleted from cluster A. Again, the result is consistent across the two databases.
The real problem with conflicts is when there is no primary key on the database table. Primary keys
uniquely identify a record. Without a primary key, there is no way for VoltDB to tell, even if there are one
or more unique indexes on the table, whether two records are the same record modified or two different
records with the same unique key values.
As a result, if there is a conflict between two transactions without a primary key, VoltDB has no way to
resolve the conflict and simply rejects the incoming action. Going back to our example, if the user table
had a unique index on the user ID rather than a primary key, and both cluster A and cluster B update the
user record at approximately the same time, when binary log A1 arrives at cluster B, it would look for the
record based on all columns in the record and fail to find a match.
However, when it attempts to insert the record, it will encounter a constraint violation on the unique index.
Again, since there is no primary key, VoltDB cannot resolve the conflict and rejects the incoming action,
leaving the record with the changed password. On cluster A, the same process occurs and the password
change in B1 gets rejected, leaving cluster A with a changed name column and database B with a changed
password column the databases diverge.
100
Database Replication
Possible Conflict
INSERT
Constraint violation
Rejected
UPDATE
Missing row
Timestamp mismatch
Constraint violation
Rejected
Last transaction wins
Rejected
DELETE
Missing row
Timestamp mismatch
101
Database Replication
102
Database Replication
Datatype
Description
ROW_TYPE
3 Byte string
ACTION_TYPE
1 Byte string
CONFLICT_TYPE
4 Byte string
CONFLICTS_ON
_PRIMARY_KEY
TINYINT
DECISION
1 Byte string
CLUSTER_ID
TINYINT
TIMESTAMP
BIGINT
DIVERGENCE
1 Byte string
TABLE_NAME
String
TUPLE
JSON-encoded string
Update operations are executed as two separate statements: a delete and an insert, where only one of the two statements might result
in a violation. For example, the delete may trigger a missing row violation but the insert not generate a violation. In which case the
EXT row of the conflict log reports the MISS conflict and the NEW row reports NONE.
103
Database Replication
104
Note
VoltDB uses hashing rather than encryption when passing the username and password between
the client and the server. The Java and C++ clients use SHA-2 hashing while the older clients
currently use SHA-1. The passwords are also hashed within the database. For an encrypted solution, you can consider implementing Kerberos security, described in Section 12.7, Integrating
Kerberos Security with VoltDB.
There are three steps to enabling security for a VoltDB application:
1. Add the <security enabled="true"/> tag to the deployment file to turn on authentication and
authorization.
2. Define the users and roles you need to authenticate.
3. Define which roles have access to each stored procedure.
The following sections describe each step of this process, plus how to enable access to system procedures
and ad hoc queries.
105
Security
106
Security
12.5. Assigning Access by Function (System Procedures, SQL Queries, and Default Procedures)
It is not always convenient to assign permissions one at a time. You might want a special role for access to
all user-defined stored procedures. Also, there are special capabilities available within VoltDB that are not
called out individually in the schema so cannot be assigned using the CREATE PROCEDURE statement.
For these special cases VoltDB provides named permissions that you can use to assign functions as a
group. For example, the ALLPROC permission grants a role access to all user-defined stored procedures
so the role does not need to be granted access to each procedure individually.
Several of the special function permissions have two versions: a full access permission and a read-only
permission. So, for example, DEFAULTPROC assigns access to all default procedures while DEFAULTPROCREAD allows access to only the read-only default procedures; that is, the TABLE.select procedures.
Similarly, the SQL permission allows the user to execute both read and write SQL queries interactively
while SQLREAD only allows read-only (SELECT) queries to be executed.
One additional functional permission is access to the read-only system procedures, such as @Statistics and
@SystemInformation. This permission is special in that it does not have a name and does not need to be
assigned; all authenticated users are automatically assigned read-only access to these system procedures.
Table 12.1, Named Security Permissions describes the named functional permissions.
Description
Inherits
default
procedures
DEFAULTPROC
SQLREAD
SQL
Access to all ad hoc SQL queries and default proce- SQLREAD, DEFAULTdures
PROC
ALLPROC
107
Security
Permission
Description
Inherits
ADMIN
Full access to all system procedures, all user-defined ALLPROC, DEFAULTprocedures, as well as default procedures, ad hoc PROC, SQL
SQL, and DDL statements.
Note: For backwards compatibility, the special permissions ADHOC and SYSPROC are still recognized.
They are interpreted as synonyms for SQL and ADMIN, respectively.
In the CREATE ROLE statement you enable access to these functions by including the permission name
in the WITH clause. (The default, if security is enabled and the keyword is not specified, is that the role
is not allowed access to the corresponding function.)
Note that the permissions are additive. So if a user is assigned one role that allows access to SQLREAD
but not DEFAULTPROC, but that user is also assigned another role that allows DEFAULTPROC, the
user has both permissions.
The following example assigns full access to members of the ops role, access to interactive SQL queries
(and default procedures by inheritance) and all user-defined procedures to members of the developer role,
and no special access beyond read-only system procedures to members of the apps role.
CREATE ROLE ops WITH admin;
CREATE ROLE developer WITH sql, allproc;
CREATE ROLE apps;
108
Security
109
Security
doNotPrompt=true
principal="service/voltdb@MYCOMPANY.LAN" storeKey=true;
};
On the client nodes, the JAAS login configuration defines the VoltDBClient module.
110
Security
import org.voltdb.client.ClientFactory;
ClientConfig config = new ClientConfig();
// specify the JAAS login module
config.enableKerberosAuthentication("VoltDBClient");
VoltClient client = ClientFactory.createClient(config);
client.createConnection("voltsvr");
Note that the VoltDB client automatically picks up the Kerberos cached credentials of the current process,
the user's Kerberos "principal". So you do not need to and should not specify a username or password
as part of the VoltDB client configuration.
It is also important to note that once the cluster starts using Kerberos authentication, only Java clients can
connect to the cluster and they must also use Kerberos authentication, including the CLI command sqlcmd.
To authenticate to a VoltDB server with Kerberos security enabled using sqlcmd, you must include the
--kerberos flag identifying the name of the Kerberos client service module. For example:
$ sqlcmd --kerberos=VoltDBClient
Again, if the configuration files are not in the default location, you must specify their location on the
command line:
$ sqlcmd --kerberos=VoltDBClient -J-Djava.security.krb5.conf=/etc/krb5.conf
You cannot use clients in other programming languages or CLI commands other than sqlcmd to access
a cluster with Kerberos security enabled.
111
112
113
114
115
116
117
The frequency with which the transactions are written to the command log is configurable (as described in
Section 14.3, Configuring Command Logging for Optimal Performance). By adjusting the frequency and
118
type of logging (synchronous or asynchronous) you can balance the performance needs of your application
against the level of durability desired.
In reverse, when it is time to "replay" the logs, if you start the database with the recover action (as described
in Section 3.5, Restarting a VoltDB Database) once the server nodes establish a quorum, they start by
restoring the most recent snapshot. Once the snapshot is restored, they then replay all of the transactions
in the log since that snapshot.
119
120
are written to the log. In other words, the results for all of the transactions since the last write are held on
the server until the next write occurs.
The advantage of synchronous logging is that no transaction is "complete" and reported back to the calling
application until it is guaranteed to be logged no transactions are lost. The obvious disadvantage of
synchronous logging is that the interval between writes (i.e. the frequency) while the results are held, adds
to the latency of the transactions. To reduce the penalty of synchronous logging, you need to reduce the
frequency.
When using synchronous logging, it is recommended that the frequency be limited to between 1 and 4 milliseconds to avoid adding undue latency to the transaction rate. A frequency of 1 or 2 milliseconds should
have little or no measurable affect on overall latency. However, low frequencies can only be achieved
effectively when using appropriate hardware (as discussed in the next section, Section 14.3.4, Hardware
Considerations).
To select synchronous logging, use the synchronous attribute of the <commandlog> tag. For example:
<commandlog enabled="true" synchronous="true" >
<frequency time="2"/>
</commandlog>
121
Write command logs to a dedicated device. Do not write logs and snapshots to the same device.
Use low (1-2 milisecond) frequencies when performing synchronous logging.
Use moderate (100 millisecond or greater) frequencies when performing asynchronous logging.
122
123
Note that you do not need to modify the schema or the client application to turn exporting of live data on
and off. The application's stored procedures insert data into the streams; but it is the deployment file that
determines whether export actually occurs at runtime.
When a stored procedure uses an SQL INSERT statement to write data into a export stream, rather than
storing that data in the database, it is handed off to the connector when the stored procedure successfully
commits the transaction.1 Export streams have several important characteristics:
Streams let you limit the export to only the data that is required. For example, in the preceding example,
Stream B may contain a subset of columns from Table A. Whenever a new record is written to Table
A, the corresponding columns can be written to Stream B for export to the remote database.
Streams let you combine fields from several existing tables into a single exported row. This technique
is particularly useful if your VoltDB database and the target of the export have different schema. The
stream can act as a transformation of VoltDB data to a representation of the target schema.
Streams let you control when data is exported. Again, in the previous example, Stream D might be an
exact replica of Table C. However, the records in Table C are updated frequently. The client application
can choose to copy records from Table C to Stream D only when all of the updates are completed and
the data is finalized, significantly reducing the amount of data that must pass through the connector.
Of course, there are restrictions to export streams. Since they have no storage associated with them, they
are for INSERT only. Any attempt to SELECT, UPDATE, or DELETE data from streams will result in
an error.
There is no guarantee on the latency of export between the connector and the export target. The export function is transactionally correct; no export
occurs if the stored procedure rolls back and the export data is in the appropriate transaction order. But the flow of export data from the connector to
the target is not synchronous with the completion of the transaction. There may be several seconds delay before the export data reaches the target.
124
It is possible to export all of the data in a VoltDB database. You would do this by creating export stream
replicas of all tables in the schema and writing to the corresponding stream whenever you insert into the
normal table. However, this means the same number of transactions and volume of data that is being
processed by VoltDB will be exported through the connector. There is a strong likelihood, given a high
transaction volume, that the target database will not be able to keep up with the load VoltDB is handling.
As a consequence you will usually want to be more selective about what data is exported when.
If you have an existing target database, the question of what data to export is likely decided for you (that is,
you need to export the data matching the target's schema). If you are defining both your VoltDB database
and your target at the same time, you will need to think about what information is needed "downstream"
and create the appropriate export streams within VoltDB.
The second consideration is when to export the data. For tables that are not updated frequently, inserting
the data to a complementary export stream whenever data is inserted into the real table is the easiest and
most practical approach. For tables that are updated frequently (hundreds or thousands of times a second)
you should consider writing a copy of the data to an export stream at an appropriate milestone.
Using the flight reservation system as an example, one aspect of the workflow not addressed by the application described in Chapter 6, Designing VoltDB Client Applications is the need to archive information
about the flights after takeoff. Changes to reservations (additions and cancellations) are important in real
time. However, once the flight takes off, all that needs to be recorded (for billing purposes, say) is what
reservations were active at the time.
In other words, the archiving database needs information about the customers, the flights, and the final
reservations. According to the workload in Table 4.1, Example Application Workload, the customer
and flight tables change infrequently. So data can be inserted into the export streams at the same time as
the "live" flight and reservation tables. (It is a good idea to give the export stream a meaningful name so
its purpose is clear. In this example we identify the streams with the export_ prefix or, in the case of the
reservation stream which is not an exact copy, the _final suffix.)
The reservation table, on the other hand, is updated frequently. So rather than export all changes to a
reservation to the reservation stream in real-time, a separate stored procedure is invoked when a flight
takes off. This procedure copies the final reservation data to the export stream and deletes the associated flight and reservation records from the VoltDB database. Figure 15.2, Flight Schema with Export Streams shows the modified database schema with the added export streams, EXPORT_FLIGHT,
EXPORT_CUSTOMER, and RESERVATION_FINAL.
This design adds a transaction to the VoltDB application, which is executed approximately once a second
(when a flight takes off). However, it reduces the number of reservation transactions being exported from
125
1200 a second to less than 200 a second. These are the sorts of trade offs you need to consider when adding
export functionality to your application.
The third decision is where to export the data to. As described in Section 15.4, Configuring Export in the
Deployment File, you can export the data through multiple different protocols: files, HTTP. JDBC, etc.
Your choice of protocol will depend on the ultimate target destination for your exported data.
You can also export to multiple destinations at once. When you declare a stream, you can assign it to a
specific export target. If you want different streams to be exported to different destinations, you declare
the streams to belong to different targets. Then in the deployment file you can configure each target to be
exported to a different destination.
126
If a stream does not specify an export target, it is not exported. In the preceding example, export_customer,
export_flight, and reservation_final streams are identified as the streams that will be sent to the export
target called archive. Note that, even if an export target is specified in the CREATE STREAM statement,
inserting data into these streams will have no effect until export is enabled in the deployment file for the
archive target.
If you want to export to different locations, you can assign the streams to different targets, then export
each stream separately. For example, if you want to export the reservations to a log file but the customer
and flight records to an archival database, you can assign the streams to two different targets:
CREATE STREAM export_customer
EXPORT TO TARGET archive (
. . .
);
CREATE STREAM export_flight
EXPORT TO TARGET archive (
. . .
);
CREATE STREAM reservation_final
EXPORT TO TARGET log (
. . .
);
Note that no changes are required to the client application. The configuration of streams and export targets
is all done through the schema and deployment file.
You can also specify whether the streams are partitioned or not using the PARTITION ON COLUMN
clause in the CREATE STREAM statement. For example, if an export stream is a copy of a normal
data table, it can be partitioned on the same column. However, partitioning is not necessary for export
streams. Whether they are partitioned or "replicated", since no storage is associated with the stream, you
can INSERT into the stream in either a single-partitioned or multi-partitioned stored procedure. In either
case, the export connector ensures that at least one copy of the tuple is written to the export target.
127
<export>
<configuration enabled="true" type="file" target="log">
. . .
</configuration>
<configuration enabled="true" type="jdbc" target="archive">
. . .
</configuration>
</export>
You must also configure each export connector by specifying properties as one or more <property>
tags within the <configuration> tag. For example, the following XML code enables export to comma-separated (CSV) text files using the file prefix "MyExport".
<export>
<configuration enabled="true" stream="log" type="file">
<property name="type">csv</property>
<property name="nonce">MyExport</property>
</configuration>
</export>
The properties that are allowed and/or required depend on the export connector you select. VoltDB comes
with six export connectors:
Export to file (type="file")
Export to HTTP, including Hadoop (type="http")
Export to JDBC (type="jdbc")
Export to Kafka (type="kafka")
Export to RabbitMQ (type="rabbitmq")
Export to Elasticsearch (type="elasticsearch")
As the name implies, the file connector writes the exported data to local files, either as comma-separated
or tab-delimited files. Similarly, the JDBC connector writes data to a variety of possible destination databases through the JDBC protocol. The Kafka connector writes export data to an Apache Kafka distributed
message queue, where one or more other processes can read the data. In all three cases you configure the
specific features of the connector using the <property> tag as described in the following sections.
128
handle queueing export data pending its actual transmission to the target, including ensuring durability in
case of system failures. Again, this task is handled automatically by the VoltDB server process. But it is
useful to understand how the export queuing works and its consequences.
One consequence of this durability guarantee is that VoltDB will send at least one copy of every export
record to the target. However, it is possible when recovering command logs or rejoining nodes, that certain
export records are resent. It is up to the downstream target to handle these duplicate records. For example,
using unique indexes or including a unique record ID in the export stream.
129
Allowable Values
Description
type
csv, tsv
nonce*
string
outdir
directory path
The directory where the files are created. If you do not specify an
output path, VoltDB writes the output files to the current default directory.
period
Integer
The frequency, in minutes, for "rolling" the output file. The default
frequency is 60 minutes.
binaryencoding
hex, base64
dateformat
format string
The format of the date used when constructing the output file names.
You specify the date format as a Java SimpleDateFormat string. The
default format is "yyyyMMddHHmmss".
timezone
string
The time zone to use when formatting the timestamp. Specify the
time zone as a Java timezone identifier. The default is GMT.
delimiters
string
Specifies the delimiter characters for CSV output. The text string
specifies four characters: the field delimiter, the enclosing character, the escape character, and the record delimiter. To use special or
non-printing characters (including the space character) encode the
character as an HTML entity. For example "<" for the "less than"
symbol.
130
Property
Allowable Values
Description
batched
true, false
skipinternals
true, false
with-schema
true, false
Required
Whatever properties you choose, the order and representation of the content within the output files is the
same. The export connector writes a separate line of data for every INSERT it receives, including the
following information:
Six columns of metadata generated by the export connector. This information includes a transaction ID,
a timestamp, a sequence number, the site and partition IDs, as well as an integer indicating the query
type.
The remaining columns are the columns of the database stream, in the same order as they are listed in
the database definition (DDL) file.
131
There are essentially two types of HTTP export: batch mode and one record at a time. Batch mode is
appropriate for exporting large volumes of data to targets such as Hadoop. Exporting one record at a time
is less efficient for large volumes but can be very useful for writing intermittent messages to other services.
In batch mode, the data is exported using a POST or PUT method, where multiple records are combined
in either command-separated value (CSV) or Avro format in the body of the request. When writing one
record at a time, you can choose whether to submit the HTTP request as a POST, PUT or GET (that is,
as a querystring attached to the URL). When exporting in batch mode, the method must be either POST
or PUT and the type must be either csv or avro. When exporting one record at a time, you can use the
GET, POST, or PUT method, but the output type must be form.
Finally, the endpoint property specifies the target URL where data is being sent, using either the http: or
https: protocol. Again, the endpoint must be compatible with the possible settings for the other properties.
In particular, if the endpoint is a WebHDFS URL, batch mode must enabled.
The URL can also contain placeholders that are filled in at runtime with metadata associated with the
export data. Each placeholder consists of a percent sign (%) and a single ASCII character. The following
are the valid placeholders for the HTTP endpoint property:
Placeholder
Description
%t
The name of the VoltDB export stream. The stream name is inserted into the endpoint
in all uppercase.
%p
The VoltDB partition ID for the partition where the INSERT query to the export stream
is executing. The partition ID is an integer value assigned by VoltDB internally and
can be used to randomly partition data. For example, when exporting to webHDFS, the
partition ID can be used to direct data to different HDFS files or directories.
%g
The export generation. The generation is an identifier assigned by VoltDB. The generation increments each time the database starts or the database schema is modified in
any way.
%d
The date and hour of the current export period. Applicable to WebHDFS export only.
This placeholder identifies the start of each period and the replacement value remains
the same until the period ends, at which point the date and hour is reset for the new
period.
You can use this placeholder to "roll over" WebHDFS export destination files on a
regular basis, as defined by the period property. The period property defaults to
one hour.
When exporting in batch mode, the endpoint must contain at least one instance each of the %t, %p, and
%g placeholders. However, beyond that requirement, it can contain as many placeholders as desired and
in any order. When not in batch mode, use of the placeholders are optional.
Table 15.2, HTTP Export Properties describes the supported properties for the HTTP connector.
Allowable Values
Description
endpoint
string
Specifies the target URL. The endpoint can contain placeholders for
inserting the stream name (%t), the partition ID (%p), the date and
hour (%d), and the export generation (%g).
avro.compress
true, false
132
Property
Allowable Values
Description
avro.schema.location string
Specifies the location where the Avro schema will be written. The
schema location can be either an absolute path name on the local
database server or a webHDFS URL and must include at least one
instance of the placeholder for the stream name (%t). Optionally it can contain other instances of both %t and %g. The default
location for the Avro schema is the file path export/avro/
%t_avro_schema.json on the database server under the voltdbroot directory. This property is ignored if the type is not Avro.
batch.mode
true, false
httpfs.enable
true, false
kerberos.enable
true, false
method
Specifies the HTTP method for transmitting the export data. The default method is POST. For WebHDFS export, this property is ignored.
period
Integer
timezone
string
The time zone to use when formatting the timestamp. Specify the
time zone as a Java timezone identifier. The default is the local time
zone.
type
Required
133
<export>
<configuration target="hadoop" enabled="true" type="http">
<property name="endpoint">
http://myhadoopsvr/webhdfs/v1/%t/data%p-%g.%d.csv
</property>
<property name="batch.mode">true</property>
<property name="period">2</property>
</configuration>
</export>
Note that the HTTP connector will create any directories or files in the WebHDFS endpoint path that do
not currently exist and then append the data to those files, using the POST or PUT method as appropriate
for the WebHDFS REST API.
You also have a choice between two formats for the export data when using WebHDFS: comma-separated
values (CSV) and Apache Avro format. By default, data is written as CSV data with each record on
a separate line and batches of records attached as the contents of the HTTP request. However, you can
choose to set the output format to Avro by setting the type property, as in the following example:
<export>
<configuration target="hadoop" enabled="true" type="http">
<property name="endpoint">
http://myhadoopsvr/webhdfs/v1/%t/data%p-%g.%d.avro
</property>
<property name="type">avro</property>
<property name="avro.compress">true</property>
<property name="avro.schema.location">
http://myhadoopsvr/webhdfs/v1/%t/schema.json
</property>
</configuration>
</export>
Avro is a data serialization system that includes a binary format that is used natively by Hadoop utilities
such as Pig and Hive. Because it is a binary format, Avro data takes up less network bandwidth than textbased formats such as CSV. In addition, you can choose to compress the data even further by setting the
avro.compress property to true, as in the previous example.
When you select Avro as the output format, VoltDB writes out an accompanying schema definition as a
JSON document. For compatibility purposes, the stream name and columns names are converted, removing
underscores and changing the resulting words to lowercase with initial capital letters (sometimes called
"camelcase"). The stream name is given an initial capital letter, while columns names start with a lowercase
letter. For example, the stream EMPLOYEE_DATA and its column named EMPLOYEE_iD would be
converted to EmployeeData and employeeId in the Avro schema.
By default, the Avro schema is written to a local file on the VoltDB database server. However, you can
specify an alternate location, including a webHDFS URL. So, for example, you can store the schema in
the same HDFS repository as the data by setting the avro.schema.location property, as shown in
the preceding example.
See the Apache Avro web site for more details on the Avro format.
134
135
If any errors do occur when the JDBC connector attempts to submit data to the remote database, the VoltDB
disconnects and then retries the connection. This process is repeated until the connection succeeds. If
the connection does not succeed, VoltDB eventually reduces the retry rate to approximately every eight
seconds.
Table 15.3, JDBC Export Properties describes the supported properties for the JDBC connector.
Allowable Values
Description
connection string
string
jdbcpassword
string
jdbcdriver
string
The class name of the JDBC driver. The JDBC driver class must be
accessible to the VoltDB process for the JDBC export process to
work. Place the driver JAR files in the lib/extension/ directory where VoltDB is installed to ensure they are accessible at runtime.
jdbcurl
jdbcuser
You do not need to specify the driver as a property value for several
popular databases, including MySQL, Netezza, Oracle, PostgreSQL,
and Vertica. However, you still must provide the driver JAR file.
schema
string
The schema name for the target database. The use of the schema
name is database specific. In some cases you must specify the database name as the schema. In other cases, the schema name is not
needed and the connection string contains all the information necessary. See the documentation for the JDBC driver you are using for
more information.
minpoolsize
integer
maxpoolsize
integer
maxidletime
integer
The number of milliseconds a connection can be idle before it is removed from the pool. The default value is 60000 (one minute).
maxstatementcached
integer
ignoregenerations
true, false
skipinternals
true, false
Required
136
set up message queues which are written to and read from by "producers" and "consumers", respectively.
In the Apache Kafka model, VoltDB export acts as a "producer".
Before using the Kafka connector, we strongly recommend reading the Kafka documentation and becoming familiar with the software, since you will need to set up a Kafka 0.8.2 service and appropriate "consumer" clients to make use of VoltDB's Kafka export functionality. The instructions in this section assume
a working knowledge of Kafka and the Kafka operational model.
When the Kafka connector receives data from the VoltDB export streams, it establishes a connection to the
Kafka messaging service as a Kafka producer. It then writes records to Kafka topics based on the VoltDB
stream name and certain export connector properties.
The majority of the Kafka export properties are identical in both in name and content to the Kafka producer properties listed in the Kafka documentation. All but one of these properties are optional for the
Kafka connector and will use the standard Kafka default value. For example, if you do not specify the
queue.buffering.max.ms property it defaults to 5000 milliseconds.
The only required property is bootstrap.servers, which lists the Kafka servers that the VoltDB
export connector should connect to. You must include this property so VoltDB knows where to send the
export data. Specify each server by its IP address (or hostname) and port; for example, myserver:7777. If
there are multiple servers in the list, separate them with commas.
In addition to the standard Kafka producer properties, there are several custom properties specific to VoltDB. The properties binaryencoding, skipinternals, and timezone affect the format of the
data. The topic.prefix and topic.key properties affect how the data is written to Kafka.
The topic.prefix property specifies the text that precedes the stream name when constructing the
Kafka topic. If you do not specify a prefix, it defaults to "voltdbexport". Alternately, you can map individual streams to topics using the topic.key property. In the topic.key property you associate a
VoltDB export stream name with the corresponding Kafka topic as a named pair separated by a period (.).
Multiple named pairs are separated by commas (,). For example:
Employee.EmpTopic,Company.CoTopic,Enterprise.EntTopic
Any stream-specific mappings in the topic.key property override the automated topic name specified
by topic.prefix.
Note that unless you configure the Kafka brokers with the auto.create.topics.enable property
set to true, you must create the topics for every export stream manually before starting the export process.
Enabling auto-creation of topics when setting up the Kafka brokers is recommended.
When configuring the Kafka export connector, it is important to understand the relationship between synchronous versus asynchronous processing and its effect on database latency. If the export data is sent
asynchronously, the impact of export on the database is reduced, since the export connector does not wait
for the Kafka infrastructure to respond. However, with asynchronous processing, VoltDB is not able to
resend the data if the message fails after it is sent.
If export to Kafka is done synchronously, the export connector waits for acknowledgement of each message
sent to Kafka before processing the next packet. This allows the connector to resend any packets that fail.
The drawback to synchronous processing is that on a heavily loaded database, the latency it introduces
means export may not be able to keep up with the influx of export data and and have to write to overflow.
You specify the level of synchronicity and durability of the connection using the Kafka acks property.
Set acks to "0" for asynchronous processing, "1" for synchronous delivery to the Kafka broker, or "all" to
137
ensure durability on the Kafka broker. Use of "all" is not recommended for VoltDB export. See the Kafka
documentation for more information.
VoltDB guarantees that at least one copy of all export data is sent by the export connector. But when
operating in asynchronous mode, the Kafka connector cannot guarantee that the packet is actually received
and accepted by the Kafka broker. By operating in synchronous mode, VoltDB can catch errors returned
by the Kafka broker and resend any failed packets. However, you pay the penalty of additional latency
and possible export overflow.
Finally, the actual export data is sent to Kafka as a comma-separated values (CSV) formatted string. The
message includes six columns of metadata (such as the transaction ID and timestamp) followed by the
column values of the export stream.
Table 15.4, Kafka Export Properties lists the supported properties for the Kafka connector, including
the standard Kafka producer properties and the VoltDB unique properties.
Allowable Values
Description
bootstrap.servers*
string
acks
0, 1, all
acks.retry.timeout
integer
partition.key
{stream}.{column}[,...]
binaryencoding
hex, base64
skipinternals
true, false
138
Property
Allowable Values
Description
contains only the exported stream data. The default is
false.
timezone
string
topic.key
string
topic.prefix
string
various
Required
139
a suffix for the routing key. Alternately, you can specify a different column for each stream by declaring
the routing.key.suffix property as a list of stream and column name pairs, separating the stream from the
column name with a period and separating the pairs with commas. For example:
<export>
<configuration target="queue" enabled="true" type="rabbitmq">
<property name="broker.host">rabbitmq.mycompany.com</property>
<property name="routing.key.suffix">
voter_export.state,contestants_export.contestant_number
</property>
</configuration>
</export>
The important point to remember is that it is your responsibility to configure a RabbitMQ exchange that
matches the name associated with the exchange.name property (or take the default exchange) and create queues and/or filters to match the routing keys generated by VoltDB. At a minimum, the exchange
must be able to handle routing keys starting with the export stream names. This can be achieved by using a filter for each export stream. For example, using the flight example in Section 15.2, Planning
your Export Strategy, you can create filters for EXPORT_FLIGHT.*, EXPORT_CUSTOMER.*, and
RESERVATION_FINAL.*.
Table 15.5, RabbitMQ Export Properties lists the supported properties for the RabbitMQ connector.
Allowable Values
Description
string
broker.port
integer
The port number of the RabbitMQ server. The default port number
is 5672.
amqp.uri
string
An alternate method for specifying the location of the RabbitMQ exchange server. Use of amqp.uri allows you to specify additional RabbitMQ options as part of the connection URI. Either
broker.host or amqp.uri must be specified.
virtual.host
string
username
string
password
string
exchange.name
string
routing.key.suffix
{stream}.{column}
[,...]
broker.host
true, false
Whether the RabbitMQ queue is durable. That is, data in the queue
will be retained and restarted if the RabbitMQ server restarts. If you
140
Property
Allowable Values
Description
specify the queue as durable, the messages themselves will also be
marked as durable to enable their persistence across server failure.
The default is true.
binaryencoding
hex, base64
skipinternals
true, false
timezone
string
The time zone to use when formatting the timestamp. Specify the
time zone as a Java timezone identifier. The default is GMT.
Required
141
Allowable Values
Description
endpoint*
string
batch.mode
true, false
Specifies whether to send multiple rows as a single request or send each export row separately. The default is
true.
timezone
string
The time zone to use when formatting timestamps. Specify the time zone as a Java timezone identifier. The default is the local time zone.
Required
142
the data. So, for example, to load records from a CSV file named staff.csv into the table EMPLOYEES,
the command might be the following:
$ csvloader employees --file=staff.csv
If instead you are copying the data from a JDBC-compliant database, the command might look like this:
$ jdbcloader employees \
--jdbcurl=jdbc:postgresql://remotesvr/corphr \
--jdbctable=employees \
--jdbcdriver=org.postgresql.Driver
Each utility has arguments unique to the data source (such as --jdbcurl) that allow you to properly
configure and connect to the source. See the description of each utility in Appendix D, VoltDB CLI Commands for details.
Note
For the initial release of built-in importers, Kafka is the only supported import type.
VoltDB currently provides support for only one type of import: kafka. VoltDB also provides support for
two import formats: comma-separated values (csv) and tab-separated values (tsv). Command-separated
values are the default format. So if you are using CSV-formatted input, you can leave out the format
attribute, as in the following examples.
When the database starts, the import infrastructure starts any enabled configurations. If you are importing
multiple streams to separate tables through separate procedures, you must include multiple configurations,
even if they come from the same source. For example, the following configuration imports data from two
Kafka topics from the same Kafka servers into separate VoltDB tables.
143
<import>
<configuration type="kafka" enabled="true">
<property name="brokers">kafkasvr:9092</property>
<property name="topics">employees</property>
<property name="procedure">EMPLOYEE.insert</property>
</configuration>
<configuration type="kafka" enabled="true">
<property name="brokers">kafkasvr:9092</property>
<property name="topics">managers</property>
<property name="procedure">MANAGER.insert</property>
</configuration>
</import>
The following section describes the Kafka importer in more detail.
144
Allowable Values
Description
string
procedure
string
topics*
string
fetch.message.max.bytes
integer
groupid
string
brokers*
*
integer
Required
145
ALTER TABLE
CREATE INDEX
CREATE TABLE
CREATE VIEW
The supported VoltDB-specific extensions for declaring stored procedures, streams, and partitioning are:
CREATE PROCEDURE AS
CREATE PROCEDURE FROM CLASS
CREATE ROLE
CREATE STREAM
DR TABLE
DROP INDEX
DROP PROCEDURE
DROP ROLE
DROP STREAM
DROP TABLE
DROP VIEW
IMPORT CLASS
PARTITION PROCEDURE
PARTITION TABLE
SET DR
146
ALTER TABLE
ALTER TABLE Modifies an existing table definition.
Syntax
ALTER TABLE table-name DROP CONSTRAINT constraint-name
ALTER TABLE table-name DROP [COLUMN] column-name [CASCADE]
ALTER TABLE table-name DROP {PRIMARY KEY | LIMIT PARTITION ROWS}
ALTER TABLE table-name ADD {constraint-definition | column-definition [BEFORE column-name] }
ALTER TABLE table-name ALTER column-definition [CASCADE]
ALTER TABLE table-name ALTER [COLUMN] column-name SET {DEFAULT value | [NOT]
NULL}
column-definition: [COLUMN] column-name datatype [DEFAULT value ] [ NOT NULL ] [index-type]
constraint-definition: [CONSTRAINT constraint-name] { index-definition | limit-definition }
index-definition: {index-type} (column-name [,...])
limit-definition: LIMIT PARTITION ROWS row-count
index-type: PRIMARY KEY | UNIQUE | ASSUMEUNIQUE
Description
The ALTER TABLE modifies an existing table definition by adding, removing or modifying a column or
constraint. There are several different forms of the ALTER TABLE statement, depending on what attribute
you are altering (a column or a constraint) and how you are changing it. The key point to remember is
that you only alter one item at a time. To change two columns or a column and a constraint, you need to
issue two ALTER TABLE statements.
There are three ALTER TABLE operations:
ALTER TABLE ADD
ALTER TABLE DROP
ALTER TABLE ALTER
The syntax of each statement depends on whether you are modifying a column or a constraint. You can
ADD or DROP either a column or an index. However, you can ALTER columns only. To alter an existing
constraint you must first drop the constraint and then ADD the new definition.
There are two forms of the ALTER TABLE DROP statement. You can drop a column or constraint by
name or you can drop a PRIMARY KEY or LIMIT PARTITION ROWS constraint by identifying the
type of constraint, since there is only one such constraint for any given table.
147
The syntax for the ALTER TABLE ADD statement uses the same syntax to define a new column or
constraint as that used in the CREATE TABLE command. When adding columns you can also specify
the BEFORE clause to specify where the new columns falls in the order of table columns. If you to not
specify BEFORE, the column is added at the end of the list of columns.
The ALTER TABLE ALTER COLUMN statement also has two forms. You can alter the column by
providing a complete replacement definition, similar to the ALTER TABLE ADD COLUMN statement,
or you can alter a specific attribute using the ALTER TABLE ALTER COLUMN... SET syntax. Use
SET DEFAULT to add or modify an existing default. Use SET DEFAULT NULL to remove an existing
default. You can also use the SET clause to specify whether the column can be null (SET NULL) or must
not contain a null value (SET NOT NULL).
Handling Dependencies
You can only alter tables if there are no dependencies on the table, column, or index that would be violated
by the change. For example, you cannot drop the partitioning column from a partitioned table if there
are stored procedures partitioned on that table and column as well. You must first drop the partitioned
store procedures before dropping the column. Note that by dropping the partitioning column, you are also
automatically changing the table into a replicated table.
The most common dependency is if the table already has data in it. You can add, delete, and (within
reasonable bounds) modify the columns of a table with existing data as long as those columns are not
named in an index, view, or PARTITION statement. If a column is referenced in a view or index, you can
specify CASCADE when you drop the column to automatically drop the referring indexes and views.
When a table has records in it, data associated with dropped columns is deleted. Added columns are interpreted as null or filled in with the specified default value. (You cannot add a column that is defined as
NOT NULL, but without a default, if the table has existing data in it.) You can even change the datatype
of the column within reason. In other words, you can increase the size of the datatype (for example, from
INTEGER to BIGINT) but you cannot decrease the size (say, from INTEGER to TINYINT) since some
of the existing data may already violate the size constraint.
You can also add non-unique indexes to tables with existing data. However, you cannot add unique constraints (such as PRIMARY KEY) if data exists.
If a table has no records in it, you can make almost any changes you like to it assuming, again, there are
no dependencies. You can add and remove unique constraints, add, remove, and modify columns, even
change column datatypes at will.
However, if there are dependencies, such as stored procedure queries that reference a dropped or modified
column, you may not be allowed to make the change. If there are such dependencies, it is often easier to
do drop the stored procedures before making the changes then recreate the stored procedures afterwards.
Examples
The following example uses ALTER TABLE to drop a unique constraint, add a new column, and then
recreate the constraint adding the new column.
ALTER TABLE Employee DROP CONSTRAINT UniqueNames;
ALTER TABLE Employee ADD COLUMN MiddleInitial VARCHAR(1);
ALTER TABLE Employee ADD CONSTRAINT UniqueNames
UNIQUE (FirstName, MiddleInitial, LastName);
148
CREATE INDEX
CREATE INDEX Creates an index for faster access to a table.
Syntax
CREATE [UNIQUE|ASSUMEUNIQUE] INDEX index-name
ON table-name ( index-column [,...])
[WHERE [NOT] boolean-expression [ {AND | OR} [NOT] boolean-expression]...]
Description
Creating an index on a table makes read access to the table faster when using the columns of the index as
a key. Note that VoltDB creates an index automatically when you specify a constraint, such as a primary
key, in the CREATE TABLE statement.
When you specify that the index is UNIQUE, VoltDB constrains the table to at most one row for each set
of index column values. If an INSERT or UPDATE statement attempts to create a row where all the index
column values match an existing indexed row, the statement fails.
Because the uniqueness constraint is enforced separately within each partition, only indexes on replicated
tables or containing the partitioning column of partitioned tables can ensure global uniqueness for partitioned tables and therefore support the UNIQUE keyword.
If you wish to create an index on a partitioned table that acts like a unique index but does not include the
partitioning column, use the keyword ASSUMEUNIQUE instead of UNIQUE. Assumed unique indexes
are treated like unique indexes (VoltDB verifies they are unique within the current partition). However,
it is your responsibility to ensure these indexes are actually globally unique. Otherwise, it is possible an
index will generate a constraint violation during an operation that modifies the partitioning of the database
(such as adding nodes on the fly or restoring a snapshot to a different cluster configuration).
The indexed items (index-column) are either columns of the specified table or expressions, including functions, based on the table. For example, the following statements index a table based on the calculated area
and its distance from a set location:
CREATE INDEX areaofplot ON plot (width * height);
CREATE INDEX distancefrom49 ON plot ( ABS(latitude - 49) );
You can create a partial index by including a WHERE clause in the index definition. The WHERE clause
limits the number of rows that get indexed. This is useful if certain columns in the index are not evenly
distributed. For example, if you are not interested in records where a column is null, you can use a WHERE
clause to exclude those records and optimize the size and performance of the index.
The partial index is utilized by the database when a query's WHERE clause contains the same condition
as the partial index definition. A special case is if the index condition is {column} IS NOT NULL. In
this situation, the index may be applied even in the query does not contain that exact condition, as long as
the query contains a WHERE condition that implies the column is not null, such as {column} > 0.
By default, VoltDB creates a tree index. Tree indexes provide the best general performance for a wide
range of operations, including exact value matches and queries involving a range of values, such as
SELECT ... WHERE Score > 1 AND Score < 10.
If an index is used exclusively for exact matches (such as SELECT ... WHERE MyHashColumn
= 123), it is possible to create a hash index instead. To create a hash index, include the string "hash"
as part of the index name.
149
Examples
The following example creates two indexes on a single table. The first is, by default, a non-unique index
based on the departure time The second is a unique index based on the columns for the airline and flight
number.
CREATE INDEX flightTimeIdx ON FLIGHT ( departtime );
CREATE UNIQUE INDEX FlightKeyIdx ON FLIGHT ( airline, flightID );
You can also use functions in the index definition. For example, the following is an index based on the
element movie within a JSON-encoded VARCHAR column named favorites and the member's ID.
CREATE INDEX FavoriteMovie ON MEMBER (
FIELD( favorites, 'movie' ), memberID
);
The following example demonstrates the use of a partial index, by including a WHERE clause, to exclude
records with a null column.
CREATE INDEX completed_tasks
ON tasks (task_id, startdate, enddate)
WHERE enddate IS NOT NULL;
150
CREATE PROCEDURE AS
CREATE PROCEDURE AS Defines a stored procedure composed of a SQL query.
Syntax
CREATE PROCEDURE procedure-name
[PARTITION ON TABLE table-name COLUMN column-name [PARAMETER position]]
[ALLOW role-name [,...]]
AS sql-statement
CREATE PROCEDURE procedure-name
[PARTITION ON TABLE table-name COLUMN column-name [PARAMETER position]]
[ALLOW role-name [,...]]
AS ### source-code ### LANGUAGE GROOVY
Description
You must declare stored procedures as part of the schema to make them accessible at runtime. Use CREATE PROCEDURE AS when declaring stored procedures directly within the DDL statement. There are
two forms of the CREATE PROCEDURE AS statement:
The SQL query form supports a single SQL query statement in the AS clause. The SQL statement
can contain question marks (?) as placeholders that are filled in at runtime with the arguments to the
procedure call.
The embedded program code form supports the inclusion of program code in the AS clause. The embedded program code is opened and closed by three pound signs (###) and followed by the LANGUAGE
clause specifying the programming language in use. VoltDB currently supports Groovy as an embedded
language. (Supported in compiled application catalogs only. See the appendix on Using Application
Catalogs in the VoltDB Administrator's Guide for details.)
In both cases, the procedure name must follow the naming conventions for Java class names. For example,
the name is case-sensitive and cannot contain any white space.
When creating single-partitioned procedures, you can either specify the partitioning in a separate
PARTITION PROCEDURE statement or you can include the PARTITION ON clause in the CREATE
PROCEDURE statement. Creating and partitioning stored procedures in a single statement is recommended because there are certain cases where procedures with complex queries must be partitioned and cannot
be compiled without the partitioning information. For example, queries that join two partitioned tables
must be run in a single-partitioned procedure and must join the tables on their partitioning columns.
Partitioning a stored procedure means that the procedure executes within a unique partition of the database.
The partition in which the procedure executes is chosen at runtime based on the table and column specified
by table-name and column-name. By default, VoltDB uses the first parameter to the stored procedure as
the partitioning value. However, you can use the PARAMETER clause to specify a different parameter.
The position value specifies the parameter position, counting from zero. (In other words, position 0 is the
first parameter, position 1 is the second, and so on.)
The specified table must be a partitioned table and cannot be an export-only or replicated table.
If security is enabled at runtime, only those roles named in the ALLOW clause (or with the ALLPROC or
ADMIN permissions) have permission to invoke the procedure. If security is not enabled at runtime, the
ALLOW clause is ignored and all users have access to the stored procedure.
151
Examples
The following example defines a stored procedure, CountUsersByCountry, as a single SQL query with a
placeholder for matching the country column:
CREATE PROCEDURE CountUsersByCountry AS
SELECT COUNT(*) FROM Users WHERE country=?;
The next example restricts access to the stored procedure to only users with the operator role. It also
partitions the stored procedure on the userID column of the Accounts table. Note that the PARAMETER
clause is used since the userID is the second parameter to the procedure:
CREATE PROCEDURE ChangeUserPassword
PARTITION ON TABLE Accounts COLUMN userID PARAMETER 1
ALLOW operator
AS UPDATE Accounts SET HashedPassword=? WHERE userID=?;
152
Syntax
CREATE PROCEDURE
[PARTITION ON TABLE table-name COLUMN column-name [PARAMETER position]]
[ALLOW role-name [,...]]
FROM CLASS class-name
Description
You must declare stored procedures to make them accessible to client applications and the sqlcmd utility. CREATE PROCEDURE FROM CLASS lets you declare stored procedures that are written as Java
classes.The class-name is the name of the Java class.
Before you declare the stored procedure, you must create, compile, and load the associated Java class. It
is usually easiest to do this by compiling all of your Java stored procedures and packaging the resulting
class files into a single JAR file that can be loaded once. For example:
$ javac -d ./obj src/procedures/*.java
$ jar cvf myprocs.jar C obj .
$ sqlcmd
1> load classes myprocs.jar;
2> CREATE PROCEDURE FROM CLASS procedures.AddCustomer;
When creating single-partitioned procedures, you can either specify the partitioning in a separate
PARTITION PROCEDURE statement or you can include the PARTITION ON clause in the CREATE
PROCEDURE statement. Creating and partitioning stored procedures in a single statement is recommended because there are certain cases where procedures with complex queries must be partitioned and cannot
be compiled without the partitioning information. For example, queries that join two partitioned tables
must be run in a single-partitioned procedure and must join the tables on their partitioning columns.
Partitioning a stored procedure means that the procedure executes within a unique partition of the database.
The partition in which the procedure executes is chosen at runtime based on the table and column specified
by table-name and column-name. By default, VoltDB uses the first parameter to the stored procedure as
the partitioning value. However, you can use the PARAMETER clause to specify a different parameter.
The position value specifies the parameter position, counting from zero. (In other words, position 0 is the
first parameter, position 1 is the second, and so on.)
The specified table must be a partitioned table and cannot be an export-only or replicated table.
If security is enabled at runtime, only those roles named in the ALLOW clause (or with the ALLPROC or
ADMIN permissions) have permission to invoke the procedure. If security is not enabled at runtime, the
ALLOW clause is ignored and all users have access to the stored procedure.
Example
The following example declares a stored procedure matching the Java class MakeReservation. Note that
the class name includes its location within the current class path (in this case, as a child of flight and
procedures). However, the name itself, MakeReservation, must be unique within the schema because at
runtime stored procedures are invoked by name only.
153
154
CREATE ROLE
CREATE ROLE Defines a role and the permissions associated with that role.
Syntax
CREATE ROLE role-name [WITH permission [,...]]
Description
The CREATE ROLE statement defines a named role that can be used to assign access rights to specific
procedures and functions. When security is enabled in the deployment file, the permissions assigned in the
CREATE ROLE and CREATE PROCEDURE statements specify which users can access which functions.
Use the CREATE PROCEDURE statement to assign permissions to named roles for accessing specific
stored procedures. The CREATE ROLE statement lets you assign certain generic permissions. The following table describes the permissions that can be assigned the WITH clause.
Permission
Description
Inherits
default
procedures
DEFAULTPROC
SQLREAD
SQL
Access to all ad hoc SQL queries and default proce- SQLREAD, DEFAULTdures
PROC
ALLPROC
ADMIN
Full access to all system procedures, all user-defined ALLPROC, DEFAULTprocedures, as well as default procedures, ad hoc PROC, SQL
SQL, and DDL statements.
Note: For backwards compatibility, the special permissions ADHOC and SYSPROC are still recognized.
They are interpreted as synonyms for SQL and ADMIN, respectively.
The generic permissions are denied by default. So you must explicitly enable them for those roles that
need them. For example, if users assigned to the "interactive" role need to run ad hoc queries, you must
explicitly assign that permission in the CREATE ROLE statement:
CREATE ROLE interactive WITH sql;
Also note that the permissions are additive. So if a user is assigned to one role that allows access to
defaultproc but not allproc, but that user also is assigned to another role that allows allproc, the user has
both permissions.
Example
The following example defines three roles admin, developer, and batch each with a different set
of permissions:
CREATE ROLE admin WITH admin;
155
156
CREATE STREAM
CREATE STREAM Creates an output stream in the database.
Syntax
CREATE STREAM stream-name
[PARTITION ON COLUMN column-name]
[EXPORT TO TARGET export-target-name] (
column-definition [,...]
);
column-definition: column-name datatype [DEFAULT value ] [ NOT NULL ]
Description
The CREATE STREAM statement defines a stream and its associated columns in the database. A stream
can be thought of as a virtual table. It has the same structure as a table, consisting of a list of columns
and supporting all the same datatypes (Table A.1, Supported SQL Datatypes) as tables. The columns
have the same rules in terms of naming and size. You can also use the INSERT statement to insert data
into the stream once it is defined.
The three differences between streams and tables are:
No data is stored in the database for a stream, it is only used as a passthrough.
Because no data is stored, you cannot SELECT, UPDATE, or DELETE the stream contents.
No indexes or constraints (such as primary keys) are allowed on a stream.
Data inserted into the stream is not stored in the database. The stream is an ephemeral container used only
for analysis and/or passing data through VoltDB to other systems via the export function.
Combining streams with views lets you perform summary analysis on data passing through VoltDB without having to store all of the underlying data. For example, you might want to know how many times
users access a website and their most recent visit. But you do not need to store a record for each visit.
In this case, you can create a stream, visits, to capture the event and a view, visit_by_user, to capture the
cumulative data:
CREATE STREAM visits PARTITION ON COLUMN user_id (
user_id BIGINT NOT NULL,
login TIMESTAMP
);
CREATE VIEW visit_by_user
( user_id, total_visits, last_visit )
AS SELECT user_id, COUNT(*), MAX(login)
FROM visits GROUP BY user_id;
When creating a view on a stream, the stream must be partitioned and the partition column must appear in
the view. Another special feature of views on streams is that, because there is no underlying data stored
for the view, VoltDB lets you modify the views content manually by issuing UPDATE and DELETE
statements on the view. (This ability to manipulate the view is only available for views on streams. You
cannot UPDATE or DELETE a view on a table; you must modify the data in the underlying table instead.)
157
For example, if you only care about a daily rollup of visits, you can use DELETE with the stream name
to clear the data at midnight every night:
DELETE FROM visit_by_user;
Or if you need to adjust the cumulative analysis to, say, "undo" an entry from a specific user, you can
use UPDATE:
UPDATE visit_by_user
SET total_visits = total_visits -1, last_visit = NULL
WHERE user_id = ?;
Streams can also be used to export data out of VoltDB into other systems, such as Kafka, CSV files, and
so on. To export data into another system, you start by declaring one or more streams defining the data
that will be sent to the external system. In the CREATE STREAM statement you also specify the named
target for the export:
CREATE STREAM visits
EXPORT TO TARGET archive (
user_id BIGINT NOT NULL,
login TIMESTAMP
);
If no export targets are configured in the deployment file, inserting data into the visits stream has no effect.
However, if the export target archive is enabled in the deployment file, then any data inserted into the
stream is sent to the export connector for delivery to the configured destination. See Chapter 15, Importing
and Exporting Live Data for more information on configuring export targets.
Finally, you can combine analysis with export by creating a stream with an export target and also creating
a view on that stream. So in our earlier example, if we want to warehouse data about each visit but use
VoltDB to perform the real-time summary analysis, we would add an export definition, along with the
partitioning clause, to the CREATE STREAM statement for the visits stream:
CREATE STREAM visits
PARTITION ON COLUMN user_id
EXPORT TO TARGET warehouse (
user_id BIGINT NOT NULL,
login TIMESTAMP
);
Example
The following example defines a stream and a view on that stream. Note the use of the PARTITION ON
clause to ensure the stream is partitioned, since it is being used in a view.
CREATE STREAM flightdata
PARTITION ON COLUMN airport (
flight_id BIGINT NOT NULL,
airport VARCHAR(3) NOT NULL,
passengers INTEGER,
eta TIMSTAMP
);
CREATE VIEW all_flights
(airport, flight_count, passenger_count)
AS SELECT airport, count(*),sum(passengers)
158
159
CREATE TABLE
CREATE TABLE Creates a table in the database.
Syntax
CREATE TABLE table-name (
column-definition [,...]
[, constraint-definition [,...]]
);
column-definition: column-name datatype [DEFAULT value ] [ NOT NULL ] [index-type]
constraint-definition: [CONSTRAINT constraint-name] { index-definition | limit-definition }
index-definition: {index-type} (column-name [,...])
limit-definition: LIMIT PARTITION ROWS row-count [EXECUTE (delete-statement)]
index-type: PRIMARY KEY | UNIQUE | ASSUMEUNIQUE
Description
The CREATE TABLE statement creates a table and its associated columns in the database. The supported
datatypes are described in Table A.1, Supported SQL Datatypes.
Description
TINYINT
byte
SMALLINT
short
INTEGER
int
BIGINT
long
FLOAT
double
DECIMAL
BigDecimal
GEOGRAPHY or GEOGRAPHY()
160
SQL Datatype
Description
laration. For example: GEOGRAPHY(80000). See
the section on entering geospatial data in the VoltDB Guide to Performance and Customization for
details.
GEOGRAPHY_POINT
VARCHAR()
String
VARBINARY()
byte array
TIMESTAMP
Time in microseconds
For integer and floating-point datatypes, VoltDB reserves the largest possible negative value to denote a null value. For example
-128 is interpreted as null for TINYINT, -32768 for SMALLINT, and so on.
The following limitations are important to note when using the CREATE TABLE statement in VoltDB:
CHECK and FOREIGN KEY constraints are not supported.
VoltDB does not support AUTO_INCREMENT, the automatic incrementing of column values.
Each column has a maximum size of one megabyte and the total declared size of all of the columns in a
table cannot exceed two megabytes. For VARCHAR columns where the length is specified in characters,
the declared size is calculated as four bytes per character to allow for the longest potential UTF-8 string.
If you intend to use a column to partition a table, that column cannot contain null values. You must
specify NOT NULL in the definition of the column or VoltDB issues an error when compiling the
schema.
When you specify an index constraint, by default VoltDB creates a tree index. You can explicitly create
a hash index by including the string "hash" as part of the index name. For example, the following
declaration creates a hash index, Version_Hash_Idx, of three numeric columns.
CREATE TABLE Version (
Major SMALLINT NOT NULL,
Minor SMALLINT NOT NULL,
baselevel INTEGER NOT NULL,
ReleaseDate TIMESTAMP,
CONSTRAINT Version_Hash_Idx PRIMARY KEY
(Major, Minor, Baselevel)
);
See the description of CREATE INDEX for more information on the difference between hash and tree
indexes.
To specify an index either for an individual column or as a table constraint that is globally unique
across the database, use the standard SQL keywords UNIQUE and PRIMARY KEY. However, for
161
partitioned tables, VoltDB can only ensure uniqueness if the index includes the partitioning column.
Otherwise, these keywords are not allowed.
It can be a performance advantage to define indexes or constraints on non-partitioning columns that you,
as the developer, know are going to contain unique values. Although VoltDB cannot ensure uniqueness
across the entire database, it does allow you to define indexes that are assumed to be unique by using
the ASSUMEUNIQUE keyword.
When you define an index on a partitioned table as ASSUMEUNIQUE, VoltDB verifies uniqueness
within the current partition when creating an index entry. However, it is your responsibility as developer
or administrator to ensure that the values are actually globally unique. If the database is repartitioned due
to adding new nodes or restoring a snapshot to a different cluster configuration, non-unique ASSUMEUNIQUE index entries may collide. When this occurs it results in a constraint violation error and the
database will not be able to complete its current action.
Therefore, ASSUMEUNIQUE should be used with caution. Also, it is not necessary and should not
be used with replicated tables or indexes that contain the partitioning column, which can be defined
as UNIQUE.
VoltDB includes a special constraint, LIMIT PARTITION ROWS, that limits the number of rows of data
that can be inserted into any one partition for the table. This constraint is useful for managing memory
usage and avoiding accidentally running out of memory due to unbalanced partitions or unexpected
data growth.
Note that the limit, specified as an integer, limits the number of rows per partition, not for the table as
a whole. In the case of replicated tables, where each partition contains all rows of the table, the limit
applies equally to the table as a whole and each partition. Also, the constraint is applied to INSERT
operations. The constraint is not enforced when restoring a snapshot, altering the table declaration, or
rebalancing the cluster as part of elastically adding nodes. In these cases, ignoring the limit allows the
operation to succeed even if, as a result, a partition ends up containing more rows than specified by the
LIMIT PARTITION ROWS constraint. But once the limit has been exceeded, any attempt to INSERT
more table rows into that partition will result in an error, until sufficient rows are deleted to reduce the
row count below the limit.
As part of the LIMIT PARTITION ROWS constraint, you can optionally include an EXECUTE clause
that specifies a DELETE statement to be executed when an INSERT statement will exceed the partition's
row limit. For example, assume the events table has the following constraint as part of the CREATE
TABLE statement:
CREATE TABLE events (
event_time TIMESTAMP NOT NULL,
event_code INTEGER NOT NULL.
event_message VARCHAR(128),
LIMIT PARTITION ROWS 1000 EXECUTE (
DELETE FROM events WHERE
SINCE_EPOCH(second,NOW) - SINCE_EPOCH(second,event_time) > 24*3600
)
);
At runtime, If an INSERT statement would result in the the current partition having more than 1000
rows, the delete statement will automatically be executed in an attempt to reduce the row count before
the INSERT statement is run. In the example, any records with an event_time older than 24 hours will be
deleted. Note that it is your responsibility as the query designer to provide a DELETE statement that is
both deterministic and likely to remove sufficient rows to allow the query to succeed. Several important
points to note about the EXECUTE clause:
162
If the DELETE statement does not delete sufficient rows, the INSERT statement will fail. For example, in the previous example, if you attempt to insert more than 1000 rows into a single partition in
a 24 hour period, the DELETE statement will not delete enough records when you attempt to insert
the 1001st record.
The LIMIT PARTITION ROWS constraint is applied per partition. That is, the DELETE statement
is executed as a single-partitioned query in the partition where the INSERT statement triggers the
row limit constraint, even if the INSERT statement is part of a multi-partitioned stored procedure.
The length of VARCHAR columns can be specified in either characters (the default) or bytes. To specify
the length in bytes, include the BYTES keyword after the length value; for example VARCHAR(16
BYTES).
Specifying the VARCHAR length in characters is recommended because UTF-8 characters can require
a variable number of bytes to store. By specifying the length in characters you can be sure the column
has sufficient space to store any string of the specified length. Specifying the length in bytes is only
recommended when all values contain only single byte (ASCII) characters or when conserving space is
required and the strings are less than 64 bytes in length.
The VARBINARY datatype provides variable storage for arbitrary strings of binary data and operates
similarly to VARCHAR(n BYTES) strings. You assign byte arrays to a VARBINARY column when
passing in variables, or you can use a hexidecimal string for assigning literal values in the SQL statement.
The VoltDB TIMESTAMP datatype is a long integer representing the number of microseconds since
the epoch. Two important points to note about this timestamp:
The VoltDB TIMESTAMP is not the same as the Java Timestamp datatype or traditional Linux time
measurements, which are measured in milliseconds rather than microseconds. Appropriate conversion
is needed when casting values between a VoltDB TIMESTAMP and other timestamp datatypes.
The VoltDB TIMESTAMP is interpreted as a Greenwich Meantime (GMT) value. Depending on
how time values are created, their value may or may not account for the local machine's default time
zone. Mixing timestamps from different time zones (for example, in WHERE clause comparisons)
can result in unexpected behavior.
For TIMESTAMP columns, you can define a default value using
CURRENT_TIMESTAMP keywords in place of a specific value. For example:
the
NOW
or
Example
The following example defines a table with five columns. The first column, Company, is not allowed
to be null, which is important since it is used as the partitioning column in the following PARTITION
TABLE statement. That column is also contained in the PRIMARY KEY constraint. Again, it is important
to include the partitioning column in any fully unique indexes for partitioned tables.
CREATE TABLE Inventory (
163
164
CREATE VIEW
CREATE VIEW Creates a view into a table, optimizing access to a summary of its contents.
Syntax
CREATE VIEW view-name ( view-column-name [,...] )
AS SELECT { column-name | selection-expression } [AS alias] [,...]
FROM table-name
[WHERE [NOT] boolean-expression [ {AND | OR} [NOT] boolean-expression]...]
[GROUP BY { column-name | selection-expression } [,...]]
Description
The CREATE VIEW statement creates a view of a table with selected columns and aggregates. VoltDB
implements views as materialized views. In other words, the view is stored as a special table in the database and is updated each time the corresponding database table is modified. This means there is a small,
incremental performance impact for any inserts or updates to the table, but selects on the view will execute
efficiently.
The following limitations are important to note when using the CREATE VIEW statement with VoltDB:
Views are allowed on individual tables only. Joins are not supported.
The SELECT statement must include a field specified as COUNT(*). Other aggregate functions
(COUNT, MAX, MIN, and SUM) are allowed following the COUNT(*).
If the SELECT statement contains a GROUP BY clause, all of the columns and expressions listed in
the GROUP BY must be listed in the same order at the start of the SELECT statement.
Examples
The following example defines a view that counts the number of records for a specific product item grouped
by its location (that is, the warehouse the item is in).
CREATE VIEW inventory_count_by_warehouse (
productID,
warehouse,
total_inventory
) AS SELECT
productID,
warehouse,
COUNT(*)
FROM inventory GROUP BY productID, warehouse;
The next example uses a WHERE clause but no GROUP BY to provide a count and minimum and maximum aggregates of all records that meet a certain criteria.
CREATE VIEW small_towns ( number, minimum, maximum )
AS SELECT count(*), min(population), max(population)
FROM TOWNS WHERE population < 10000;
165
DR TABLE
DR TABLE Identifies a table as a participant in database replication (DR)
Syntax
DR TABLE table-name [DISABLE]
Description
The DR TABLE statement identifies a table as a participant in database replication (DR). If DR is not
enabled, the DR TABLE statement has no effect on the operation of the table or the database as a whole.
However, once DR is enabled and if the current cluster is the master database for the DR operation, any
updates to the contents of tables identified in the DR TABLE statement are copied and applied to the
replica database as well.
The DR TABLE ... DISABLE statement reverses the effect of a previous DR TABLE statement, removing
the specified table from participation in DR. Because the replica database schema must have DR TABLE
statements for any tables being replicated by the master, if DR is actively occurring you must add the
DR TABLE statements to the replica before adding them to the master. In reverse, you must issue DR
TABLE... DISABLE statements on the master before you issue the matching statements on the replica.
See Chapter 11, Database Replication for more information about how database replication works.
Examples
The following example identifies the tables Employee and Department as participants in database replication.
DR TABLE Employee;
DR TABLE Department;
166
DROP INDEX
DROP INDEX Removes an index.
Syntax
DROP INDEX index-name [IF EXISTS]
Description
The DROP INDEX statement deletes the specified index, and any data associated with it, from the database. The IF EXISTS clause allows the statement to succeed even if the specified index does not exist. If
the index does not exist and you do not include the IF EXISTS clause, the statement will return an error.
You must use the name of the index as specified in the original DDL when dropping the index. You cannot
drop an index if it was not explicitly named in the CREATE INDEX command. This is why you should
always name indexes and other constraints wherever possible.
Examples
The following example removes the index named employee_idx_by_lastname:
DROP INDEX Employee_idx_by_lastname;
167
DROP PROCEDURE
DROP PROCEDURE Removes the definition of a stored procedure.
Syntax
DROP PROCEDURE procedure-name [IF EXISTS]
Description
The DROP PROCEDURE statement deletes the definition of the named stored procedure. Note that, for
procedures declared using CREATE PROCEDURE FROM and a class file, the statement does not delete
the class that implements the procedure, it only deletes the definition and any partitioning information
associated with the procedure. To remove the associated stored procedure class, you must first drop the
procedure definition then use the sqlcmd remove classes directive to remove the class.
The IF EXISTS clause allows the statement to succeed even if the specified procedure name does not
exist. If the stored procedure does not exist and you do not include the IF EXISTS clause, the statement
will return an error.
Examples
The following example removes the definition of the FindCanceledReservations stored procedure, then
uses remove classes to remove the corresponding class.
$ sqlcmd
1> DROP PROCEDURE FindCanceledReservations;
2> remove classes "*.FindCanceledReservations";
168
DROP ROLE
DROP ROLE Removes a role.
Syntax
DROP ROLE role-name [IF EXISTS]
Description
The DROP ROLE statement deletes the specified role. The IF EXISTS clause allows the statement to
succeed even if the specified role does not exist. If the role does not exist and you do not include the IF
EXISTS clause, the statement will return an error.
Examples
The following example removes the role named debug:
DROP ROLE debug;
169
DROP STREAM
DROP STREAM Removes a stream and, optionally, any views associated with it.
Syntax
DROP STREAM stream-name [IF EXISTS] [CASCADE]
Description
The DROP STREAM statement deletes the specified stream from the database. The IF EXISTS clause
allows the statement to succeed even if the specified stream does not exist. If the stream does not exist and
you do not include the IF EXISTS clause, the statement will return an error.
If you use the CASCADE clause, VoltDB automatically drops any referencing views as well as the stream
itself.
Example
The following example uses DROP STREAM with the IF EXISTS clause to remove the MeterReadings
stream definition.
DROP STREAM MetterReadings IF EXISTS;
170
DROP TABLE
DROP TABLE Removes a table and any data associated with it.
Syntax
DROP TABLE table-name [IF EXISTS] [CASCADE]
Description
The DROP TABLE statement deletes the specified table, and any data associated with it, from the database.
The IF EXISTS clause allows the statement to succeed even if the specified tables does not exist. If the
table does not exist and you do not include the IF EXISTS clause, the statement will return an error.
Before dropping a table, you must first remove any stored procedures that reference the table. For example, if the table EMPLOYEE is partitioned and the stored procedure AddEmployee is partitioned on the
EMPLOYEE table, you must drop the procedure first before dropping the table:
PARTITION TABLE Employee ON COLUMN EmpID;
PARTITION PROCEDURE AddEmployee
ON TABLE Employee COLUMN EmpID;
[. . . ]
DROP PROCEDURE AddEmployee;
DROP TABLE Employee;
Attempting to drop the table before dropping the procedure will result in an error. The same will normally
happen if there are any views or indexes that reference the table. However, if you use the CASCADE
clause VoltDB will automatically drop any referencing indexes and views as well as the table itself.
Examples
The following example uses DROP TABLE with the IF EXISTS clause to remove any existing MailAddress table definition and data before adding a new definition.
DROP TABLE UserSignin IF EXISTS;
CREATE TABLE UserSignin (
userID BIGINT NOT NULL,
lastlogin TIMESTAMP DEFAULT NOW
);
171
DROP VIEW
DROP VIEW Removes a view and any data associated with it.
Syntax
DROP VIEW view-name [IF EXISTS]
Description
The DROP VIEW statement deletes the specified view, and any data associated with it, from the database.
The IF EXISTS clause allows the statement to succeed even if the specified view does not exist. If the
view does not exist and you do not include the IF EXISTS clause, the statement will return an error.
Dropping a view has the same constraints as dropping a table, in that you cannot drop a view that is
referenced by existing stored procedure queries. Before dropping the view, you must drop any stored
procedures that reference it.
Examples
The following example removes the view named Votes_by_state:
DROP VIEW votes_by_state;
172
IMPORT CLASS
IMPORT CLASS Specifies additional Java classes to include in the application catalog.
Syntax
IMPORT CLASS class-name
Description
Warning: Deprecated
The IMPORT CLASS statement is only valid when precompiling a schema into an application
catalog. However, use of precompiled catalogs, and the IMPORT CLASS statement, are deprecated. When using interactive DDL to enter your schema, use the sqlcmd load classes directive
instead.
The IMPORT CLASS statement lets you specify class files to be added to the application catalog when
the schema is compiled. You can include individual class files only; the IMPORT CLASS statement does
not extract classes from JAR files. However, you can use Ant-style wildcards in the class specification to
include multiple classes. For example:
IMPORT CLASS org.mycompany.utils.*;
Use the IMPORT CLASS statement to include reusable code that is accessed by multiple stored procedures. Any classes and methods called by stored procedures must follow the same rules for deterministic behavior that stored procedures follow, as described in Section 5.1.2, VoltDB Stored Procedures are
Deterministic.
Code imported using IMPORT CLASS is included in the application catalog and, therefore, can be updated
on a running database through the @UpdateApplicationCatalog system procedure. For static libraries that
stored procedures use but that do not need to be modified often, the recommended approach is to include
the code by placing JAR files in the /lib directory where VoltDB is installed on the database servers.
Example
The following example imports a class containing common financial algorithms so they can be used by
any stored procedures in the catalog:
IMPORT CLASS org.mycompany.common.finance;
173
PARTITION PROCEDURE
PARTITION PROCEDURE Specifies that a stored procedure is partitioned.
Syntax
PARTITION PROCEDURE procedure-name ON TABLE table-name COLUMN column-name
[PARAMETER position ]
Description
Partitioning a stored procedure means that the procedure executes within a unique partition of the database.
The partition in which the procedure executes is chosen at runtime based on the table and column specified
by table-name and column-name and the value of the first parameter to the procedure. For example:
PARTITION TABLE Employees ON COLUMN BadgeNumber;
PARTITION PROCEDURE FindEmployee ON TABLE Employees COLUMN BadgeNumber;
The procedure FindEmployee is partitioned on the table Employees, and table Employees is in turn partitioned on the column BadgeNumber. This means that when the stored procedure FindEmployee is invoked
VoltDB determines which partition to run the stored procedure in based on the value of the first parameter
to the procedure and the corresponding partitioning value for the column BadgeNumber. So to find the
employee with badge number 145303 you would invoke the stored procedure as follows:
clientResponse response = client.callProcedure("FindEmployee", 145303);
By default, VoltDB uses the first parameter to the stored procedure as the partitioning value. However, if
you want to use the value of a different parameter, you can use the PARAMETER clause. The PARAMETER clause specifies which procedure parameter to use as the partitioning value, with position specifying
the parameter position, counting from zero. (In other words, position 0 is the first parameter, position 1
is the second, and so on.)
The specified table must be a partitioned table and cannot be an export-only or replicated table.
You specify the procedure by its simplified class name. Do not include any other parts of the class path.
Note that the simple procedure name you specify in the PARTITION PROCEDURE may be different than
the class name you specify in the CREATE PARTITION statement, which can include a relative path. For
example, if the class for the stored procedure is mydb.procedures.FindEmployee, the procedure name in
the PARTITION PROCEDURE statement should be FindEmployee:
CREATE PROCEDURE FROM CLASS mydb.procedures.FindEmployee;
PARTITION PROCEDURE FindEmployee ON TABLE Employees COLUMN BadgeNumber;
Examples
The following example declares a stored procedure, using an inline SQL query, and then partitions the
procedure on the Customer table, Note that the PARTITION PROCEDURE statement includes the PARAMETER clause, since the partitioning column is not the first of the placeholders in the SQL query.
Also note that the PARTITION argument is zero-based, so the value "1" identifies the second placeholder.
CREATE PROCEDURE GetCustomerByName AS
SELECT * from Customer WHERE FirstName=? AND
ORDER BY LastName, FirstName, CustomerID;
174
LastName = ?
175
PARTITION TABLE
PARTITION TABLE Specifies that a table is partitioned and which is the partitioning column.
Syntax
PARTITION TABLE table-name ON COLUMN column-name
Description
Partitioning a table specifies that different records are stored in different unique partitions, based on the
value of the specified column. The table table-name and column column-name must be valid, declared
elements in the current DDL file or VoltDB generates an error when compiling the schema.
For a table to be partitioned, the partitioning column must be declared as NOT NULL. If you do not declare
a partitioning column of a table in the DDL, the table is assumed to be a replicated table.
Example
The following example partitions the table Employee on the column EmployeeID.
PARTITION TABLE Employee on COLUMN EmployeeID;
176
SET DR
SET DR Enables the use of Cross Datacenter Replication (XDCR).
Syntax
SET DR= {ACTIVE | PASSIVE}
Description
The SET DR statements enables and disables Cross Datacenter Replication (XDCR). You actually turn
on database replication in the deployment file using the <dr> and <connection> elements. But to use
two-way, active replication, you must also enable it in the database schema using the SET DR=ACTIVE
statement for both databases involved in the XDCR process. See Chapter 11, Database Replication for
more information about XDCR.
By default, only passive DR is enabled in the schema. By specifying SET DR=ACTIVE you enable the use
of XDCR. When enabled, XDCR assigns an additional 8 bytes per row for every DR table in the database.
The additional space is used to store metadata about the row's most recent transaction.
For example, say your schema contains 5 tables which you declare as DR tables and those tables will store
a million rows each. This means the database will consume approximately 40 megabytes of additional
memory when XDCR is enabled, even if DR is not yet initiated in the deployment file. Which is why the
SET DR=ACTIVE statement should only be used for databases that will be involved in active XDCR.
If use of XDCR is enabled in the schema, you can use the SET DR=PASSIVE statement to disable it.
Note, however, for both the SET DR=ACTIVE and SET DR=PASSIVE statements, any tables declared
as DR tables must be empty when the SET DR statement is executed.
Examples
The following example enables the use of XDCR and then declares three tables as DR tables. Because any
DR tables must be empty when the SET DR statement is executed, it is often easiest to place the statement
at the beginning of the schema.
SET DR=ACTIVE;
DR TABLE Employees;
DR TABLE Divisions;
DR TABLE Locations;
177
DELETE
INSERT
SELECT
TRUNCATE TABLE
UPDATE
UPSERT
178
DELETE
DELETE Deletes one or more records from the database.
Syntax
DELETE FROM table-name
[WHERE [NOT] boolean-expression [ {AND | OR} [NOT] boolean-expression]...]
[ORDER BY {column-name [ ASC | DESC ]}[,...] [LIMIT integer] [OFFSET integer]]
Description
The DELETE statement deletes rows from the specified table that meet the constraints of the WHERE
clause. The following limitations are important to note when using the DELETE statement in VoltDB:
The DELETE statement can operate on only one table at a time (no joins or subqueries).
The WHERE expression supports the boolean operators: equals (=), not equals (!= or <>), greater than
(>), less than (<), greater than or equal to (>=), less than or equal to (<=), IS NULL, AND, OR, and NOT.
Note, however, although OR is supported syntactically, VoltDB does not optimize these operations and
use of OR may impact the performance of your queries.
The ORDER BY clause lets you order the selection results and then select a subset of the ordered
records to delete. For example, you could delete only the five oldest records, chronologically, sorting
by timestamp:
DELETE FROM events ORDER BY event_time ASC LIMIT 5;
Similarly, you could choose to keep only the five most recent:
DELETE FROM events ORDER BY event_time DESC OFFSET 5;
When using ORDER BY, the resulting sort order must be deterministic. In other words, the ORDER
BY must include enough columns to uniquely identify each row. (For example, listing all columns or
a primary key.)
You cannot use ORDER BY to delete rows from a partitioned table in a multi-partitioned query. In other
words, for partitioned tables DELETE... ORDER BY must be executed as part of a single-partitioned
stored procedure or as an ad hoc query with a WHERE clause that uniquely identifies the partitioning
column value.
Examples
The following example removes rows from the EMPLOYEE table where the EMPLOYEE_ID column
is equal to 145303.
DELETE FROM employee WHERE employee_id = 145303;
The following example removes rows from the BID table where the BIDDERID is 12345 and the BIDPRICE is less than 100.00.
DELETE FROM bid WHERE bidderid=12345 AND bidprice<100.0;
179
INSERT
INSERT Creates new rows in the database, using the specified values for the columns.
Syntax
INSERT INTO table-name [( column-name [,...] )] VALUES ( value-expression [,...] )
INSERT INTO table-name [( column-name [,...] )] SELECT select-expression
Description
The INSERT statement creates one or more new rows in the database. There are two forms of the INSERT
statement, INSERT INTO... VALUES and INSERT INTO... SELECT. The INSERT INTO... VALUES
statement lets you enter specific values for a adding a single row to the database. The INSERT INTO...
SELECT statement lets you insert multiple rows into the database, depending upon the number of rows
returned by the select expression.
The INSERT INTO... SELECT statement is often used for copying rows from one table to another. For
example, say you want to export all of the records associated with a particular column value. The following
INSERT statement copies all of the records from the table ORDERS with a warehouseID of 25 into the
table EXPORT_ORDERS:
INSERT INTO Export_Orders SELECT * FROM Orders WHERE CustomerID=25;
However, the select expression can be more complex, including joining multiple tables. The following
limitations currently apply to the INSERT INTO... SELECT statement:
INSERT INTO... SELECT can join partitioned tables only if they are joined on equality of the partitioning columns. Also, the resulting INSERT must apply to a partitioned table and be inserted using the
same partition column value, whether the query is executed in a single-partitioned or multi-partitioned
stored procedure.
INSERT INTO... SELECT does not support UNION statements.
In addition to the preceding limitations, there are certain instances where the select expression is too complex to be processed. Cases of invalid select expressions in INSERT INTO... SELECT include:
A LIMIT or TOP clause applied to a partitioned table in a multi-partitioned query
A GROUP BY of a partitioned table where the partitioning column is not in the GROUP BY clause
Deterministic behavior is critical to maintaining the integrity of the data in a K-safe cluster. Because an
INSERT INTO... SELECT statement performs both a query and an insert based on the results of that query,
if the selection expression would produces non-deterministic results, the VoltDB query planner rejects the
statement and returns an error. See Section 5.1.2, VoltDB Stored Procedures are Deterministic for more
information on the importance of determinism in SQL queries.
If you specify the column names following the table name, the values will be assigned to the columns in
the order specified. If you do not specify the column names, values will be assigned to columns based on
the order specified in the schema definition. However, if you specify a subset of the columns, you must
specify values for any columns that are explicitly defined in the schema as NOT NULL and do not have
a default value assigned.
180
Examples
The following example inserts values into the columns (firstname, mi, lastname, and emp_id) of an EMPLOYEE table:
INSERT INTO employee VALUES ('Jane', 'Q', 'Public', 145303);
The next example performs the same operation with the same results, except this INSERT statement explicitly identifies the column names and changes the order:
INSERT INTO employee (emp_id, lastname, firstname, mi)
VALUES (145303, 'Public', 'Jane', 'Q');
The last example assigns values for the employee ID and the first and last names, but not the middle initial.
This query will only succeed if the MI column is nullable or has a default value defined in the database
schema.
INSERT INTO employee (emp_id, lastname, firstname)
VALUES (145304, 'Doe', 'John');
181
SELECT
SELECT Fetches the specified rows and columns from the database.
Syntax
Select-statement [{set-operator} Select-statement ] ...
Select-statement:
SELECT [ TOP integer-value ]
{ * | [ ALL | DISTINCT ] { column-name | selection-expression } [AS alias] [,...] }
FROM { table-reference } [ join-clause ]...
[WHERE [NOT] boolean-expression [ {AND | OR} [NOT] boolean-expression]...]
[clause...]
table-reference:
{ table-name [AS alias] | view-name [AS alias] | sub-query AS alias }
sub-query:
(Select-statement)
join-clause:
,table-reference
[INNER | {LEFT | RIGHT} [OUTER]] JOIN [{table-reference}] [join-condition]
join-condition:
ON conditional-expression
USING (column-reference [,...])
clause:
ORDER BY { column-name | alias } [ ASC | DESC ] [,...]
GROUP BY { column-name | alias } [,...]
HAVING boolean-expression
LIMIT integer-value [OFFSET row-count]
set-operator:
UNION [ALL]
INTERSECT [ALL]
EXCEPT
Description
The SELECT statement retrieves the specified rows and columns from the database, filtered and sorted
by any clauses that are included in the statement. In its simplest form, the SELECT statement retrieves
the values associated with individual columns. However, the selection expression can be a function such
as COUNT and SUM.
The following features and limitations are important to note when using the SELECT statement with
VoltDB:
See Appendix C, SQL Functions for a full list of the SQL functions the VoltDB supports.
VoltDB supports the following operators in expressions: addition (+), subtraction (-), multiplication (*),
division (*) and string concatenation (||).
182
In the first statement, there are three parameters that replace individual values in the IN list, allowing
you to specify exactly three selection values. In the second statement the placeholder replaces the entire
list, including the parentheses. In this case the parameter to the procedure call must be an array and
allows you to change not only the values of the alternatives but the number of criteria considered.
The following Java code fragment demonstrates how these two queries can be used in a stored procedure,
resulting in equivalent SQL statements being executed:
String arg1 =
String arg2 =
String arg3 =
voltQueueSQL(
"Salary";
"Hourly";
"Parttime";
query1, arg1, arg2, arg3);
Subqueries
The SELECT statement can include subqueries. Subqueries are separate SELECT statements, enclosed in
parentheses, where the results of the subquery are used as values, expressions, or arguments within the
surrounding SELECT statement.
184
Subqueries, like any SELECT statement, are extremely flexible and can return a wide array of information.
A subquery might return:
A single row with a single column this is sometimes known as a scalar subquery and represents a
single value
A single row with multiple columns this is also known as a row value expression
Multiple rows with one or more columns
In general, VoltDB supports subqueries in the FROM clause, in the selection expression, and in boolean
expressions in the WHERE clause or in CASE-WHEN-THEN-ELSE-END operations. However, different
types of subqueries are allowed in different situations, depending on the type of data returned.
In the FROM clause, the SELECT statement supports all types of subquery as a table reference. The
subquery must be enclosed in parentheses and must be assigned a table alias.
In the selection expression, scalar subqueries can be used in place of a single column reference.
In the WHERE clause and CASE operations, both scalar and non-scalar subqueries can be used as part
of boolean expressions. Scalar subqueries can be used in place of any single-valued expression. Nonscalar subqueries can be used in the following situations:
Row value comparisons Boolean expressions that compare one row value expression to another
can use subqueries that resolve to one row with multiple columns. For example:
select * from t1
where (a,c) > (select a, c from t2 where b=t1.b);
IN and EXISTS Subqueries that return multiple rows can be used as an argument to the IN or
EXISTS predicate to determine if a value (or set of values) exists within the rows returned by the
subquery. For example:
select * from t1
where a in (select a from t2);
select * from t1
where (a,c) in (select a, c from t2 where b=t1.b);
select * from t1 where c > 3 and
exists (select a, b from t2 where a=t1.a);
ANY and ALL Multi-row subqueries can also be used as the target of an ANY or ALL comparison,
using either a scalar or row expression comparison. For example:
select *
where
select *
where
from t1
a > ALL (select a from t2);
from t1
(a,c) = ANY (select a, c from t2 where b=t1.b);
Note that subqueries are only supported in the SELECT statement; they cannot be used in data manipulation statements such UPDATE or DELETE or in CREATE VIEW statements or index definitions. Also,
VoltDB does not support subqueries in the HAVING, ORDER BY, or GROUP BY clauses.
For the initial release of subqueries in selection and boolean expressions, only replicated tables can be
used in the subquery. Both replicated and partitioned tables can be used in subqueries in place of table
references in the FROM clause.
185
Set Operations
VoltDB also supports the set operations UNION, INTERSECT, and EXCEPT. These keywords let you
perform set operations on two or more SELECT statements. UNION includes the combined results sets
from the two SELECT statements, INTERSECT includes only those rows that appear in both SELECT
statement result sets, and EXCEPT includes only those rows that appear in one result set but not the other.
Normally, UNION and INTERSECT provide a set including unique rows. That is, if a row appears in
both SELECT results, it only appears once in the combined result set. However, if you include the ALL
modifier, all matching rows are included. For example, UNION ALL will result in single entries for the
rows that appear in only one of the SELECT results, but two copies of any rows that appear in both.
The UNION, INTERSECT, and EXCEPT operations obey the same rules that apply to joins:
You cannot perform set operations on SELECT statements that reference the same table.
All tables in the SELECT statements must either be replicated tables or partitioned tables partitioned
on the same column value, using equality of the partitioning column in the WHERE clause.
Examples
The following example retrieves all of the columns from the EMPLOYEE table where the last name is
"Smith":
SELECT * FROM employee WHERE lastname = 'Smith';
The following example retrieves selected columns for two tables at once, joined by the employee_id using
an implicit inner join and sorted by last name:
SELECT lastname, firstname, salary
FROM employee AS e, compensation AS c
WHERE e.employee_id = c.employee_id
ORDER BY lastname DESC;
The following example includes both a simple SQL query defined in the schema and a client application
to call the procedure repeatedly. This combination uses the LIMIT and OFFSET clauses to "page" through
a large table, 500 rows at a time.
When retrieving very large volumes of data, it is a good idea to use LIMIT and OFFSET to constrain the
amount of data in each transaction. However, to perform LIMIT OFFSET queries effectively, the database
must include a tree index that encompasses all of the columns of the ORDER BY clause (in this example,
the lastname and firstname columns).
Schema:
CREATE PROCEDURE EmpByLimit AS
SELECT lastname, firstname FROM employee
WHERE company = ?
ORDER BY lastname ASC, firstname ASC
LIMIT 500 OFFSET ?;
PARTITION PROCEDURE EmpByLimit ON TABLE Employee COLUMN Company;
Java Client Application:
long offset = 0;
186
187
TRUNCATE TABLE
TRUNCATE TABLE Deletes all records from the specified table.
Syntax
TRUNCATE TABLE table-name
Description
The TRUNCATE TABLE statement deletes all of the records from the specified table. TRUNCATE TABLE is the same as the statement DELETE FROM {table-name} with no selection clause. These
statements contain optimizations to increase performance and reduce memory usage over an equivalent
DELETE statement containing a WHERE selection clause.
The following behavior is important to remember when using the TRUNCATE TABLE statement in VoltDB:
Executing a TRUNCATE TABLE query on a partitioned table within a single-partitioned stored procedure will only delete the records within the current partition. Records in other partitions will be unaffected.
You cannot execute a TRUNCATE TABLE query on a replicated table from within a single-partition
stored procedure. To truncate a replicated table you must execute the query within a multi-partition
stored procedure or as an ad hoc query.
Examples
The following example removes all data from the CURRENT_STANDINGS table:
TRUNCATE TABLE Current_standings;
188
UPDATE
UPDATE Updates the values within the specified columns and rows of the database.
Syntax
UPDATE table-name SET column-name = value-expression [, ...]
[WHERE [NOT] boolean-expression [ {AND | OR} [NOT] boolean-expression]...]
Description
The UPDATE statement changes the values of columns within the specified records. The following limitations are important to note when using the UPDATE statement with VoltDB:
VoltDB supports the following arithmetic operators in expressions: addition (+), subtraction (-), multiplication (*), and division (*).
The WHERE expression supports the boolean operators: equals (=), not equals (!= or <>), greater than
(>), less than (<), greater than or equal to (>=), less than or equal to (<=), IS NULL, AND, OR, and NOT.
Note, however, although OR is supported syntactically, VoltDB does not optimize these operations and
use of OR may impact the performance of your queries.
Examples
The following example changes the ADDRESS column of the EMPLOYEE record with an employee ID
of 145303:
UPDATE employee
SET address = '49 Lavender Sweep'
WHERE employee_id = 145303;
The following example increases the starting price by 25% for all ITEM records with a category ID of 7:
UPDATE item SET startprice = startprice * 1.25 WHERE categoryid = 7;
189
UPSERT
UPSERT Either inserts new rows or updates existing rows depending on the primary key value.
Syntax
UPSERT INTO table-name [( column-name [,...] )] VALUES ( value-expression [,...] )
UPSERT INTO table-name [( column-name [,...] )] SELECT select-expression
Description
The UPSERT statement has the same syntax as the INSERT statement and will perform the same function,
assuming a record with a matching primary key does not already exist in the database. If such a record does
exist, UPSERT updates the existing record with the new column values. Note that the UPSERT statement
can only be executed on tables that have a primary key.
UPSERT has the same two forms as the INSERT statement: UPSERT INTO... VALUES and UPSERT
INTO... SELECT. The UPSERT statement also has similar constraints and limitations as the INSERT
statement with regards to joining partitioned tables and overly complex SELECT clauses. (See the description of the INSERT statement for details.)
However, UPSERT INTO... SELECT has an additional limitation: the SELECT statement must produce
deterministically ordered results. That is, the query must not only produce the same rows, they must be in
the same order to ensure the subsequent inserts and updates produce identical results.
Examples
The following examples use two tables, Employee and Manager, both of which define the column emp_id
as a primary key. In the first example, the UPSERT statement either creates a new row with the specified
values or updates an existing row with the primary key 145303.
UPSERT INTO employee (emp_id, lastname, firstname, title, department)
VALUES (145303, 'Public', 'Jane', 'Manager', 'HR');
The next example copies records from the Employee table to the Manager table, if the employee's title
is "Manager". Again, new records will be created or existing records updated depending on whether the
employee already has a record in the Manager table. Notice the use of the primary key in an ORDER BY
clause to ensure deterministic results from the SELECT statement.
UPSERT
190
Bitwise Function
BIT_SHIFT_LEFT()
BIT_SHIFT_RIGHT()
BITAND()
BITNOT()
BITOR()
BITXOR()
APPROX_COUNT_DISTINCT()
AVG()
COUNT()
MAX()
MIN()
SUM()
CURRENT_TIMESTAMP
DATEADD()
DAY(), DAYOFMONTH()
DAYOFWEEK()
DAYOFYEAR()
EXTRACT()
FROM_UNIXTIME()
HOUR()
MINUTE()
MONTH()
NOW
QUARTER()
SECOND()
SINCE_EPOCH()
TO_TIMESTAMP()
TRUNCATE()
WEEK(), WEEKOFYEAR()
WEEKDAY()
YEAR()
Geospatial Functions
AREA()
ASTEXT()
CENTROID()
191
SQL Functions
CONTAINS()
DISTANCE()
DWITHIN()
ISINVALIDREASON()
ISVALID()
LATITUDE()
LONGITUDE()
NUMINTERIORRINGS()
NUMPOINTS()
POINTFROMTEXT()
POLYGONFROMTEXT()
VALIDPOLYGONFROMTEXT()
JSON Functions
ARRAY_ELEMENT()
ARRAY_LENGTH()
FIELD()
SET_FIELD()
Math Function
ABS()
CEILING()
EXP()
FLOOR()
LN(), LOG()
MOD()
PI()
POWER()
SQRT()
String Functions
BIN()
CHAR()
CHAR_LENGTH()
CONCAT()
FORMAT_CURRENCY()
HEX()
LEFT()
LOWER()
OCTET_LENGTH()
OVERLAY()
POSITION()
REGEXP_POSITION()
REPEAT()
REPLACE()
192
SQL Functions
RIGHT()
SPACE()
SUBSTRING()
TRIM()
UPPER()
193
SQL Functions
ABS()
ABS() Returns the absolute value of a numeric expression.
Syntax
ABS( numeric-expression )
Description
The ABS() function returns the absolute value of the specified numeric expression.
Example
The following example sorts the results of a SELECT expression by its proximity to a target value (specified by a placeholder), using the ABS() function to normalize values both above and below the intended
target.
SELECT price, product_name FROM product_list
ORDER BY ABS(price - ?) ASC;
194
SQL Functions
APPROX_COUNT_DISTINCT()
APPROX_COUNT_DISTINCT() Returns an approximate count of the number of distinct values for
the specified column expression.
Syntax
APPROX_COUNT_DISTINCT( column-expression )
Description
The APPROX_COUNT_DISTINCT() function returns an approximation of the number of distinct values
for the specified column expression. APPROX_COUNT_DISTINCT(column-expression) is an alternative
to the SQL expression "COUNT(DISTINCT column-expression)".
The reason for using APPROX_COUNT_DISTINCT() is because it can be significantly faster and use
less temporary memory than performing a precise COUNT DISTINCT operation. This is particularly true
when calculating a distinct count of a partitioned table across all of the partitions. The approximation
usually falls within 1% of the actual count.
You can use the APPROX_COUNT_DISTINCT() function on column expressions of decimal, timestamp,
or any size integer datatype. You cannot use the function on floating point (FLOAT) or variable length
(VARCHAR and VARBINARY) columns.
Example
The following example returns an approximation of the number of distinct products available in each store.
SELECT store, APPROX_COUNT_DISTINCT(product_id) FROM catalog
GROUP BY store ORDER BY store;
195
SQL Functions
AREA()
AREA() Returns the area of a polygon in square meters.
Syntax
AREA( polygon )
Description
The AREA() function returns the area of a GEOGRAPHY value in square meters. The area is the total area
of the outer ring minus the area of any inner rings within the polygon. The area is returned as a FLOAT
value.
Example
The following example calculates the sum of the areas of multiple polygons representing fields on a farm.
SELECT farmer, SUM(AREA(field)) FROM farm
WHERE farmer = 'Old MacDonald' GROUP BY farmer;
196
SQL Functions
ARRAY_ELEMENT()
ARRAY_ELEMENT() Returns the element at the specified location in a JSON array.
Syntax
ARRAY_ELEMENT( JSON-array, element-position )
Description
The ARRAY_ELEMENT() function extracts a single element from a JSON array. The array position is
zero-based. In other words, the first element in the array is in position "0". The function returns the element
as a string. For example, the following function invocation returns the string "two":
ARRAY_ELEMENT('["zero","one","two","three"]',2)
Note that the array element is always returned as a string. So in the following example, the function returns
"2" as a string rather than an integer:
ARRAY_ELEMENT('[0,1,2,3]',2)
Finally, the element may itself be a valid JSON-encoded object. For example, the following function
returns the string "[0,1,2,3]":
ARRAY_ELEMENT('[[0,1,2,3],["zero","one","two","three"]]',0)
The ARRAY_ELEMENT() function can be combined with other functions, such as FIELD(), to traverse
more complex JSON structures. The function returns a NULL value if any of the following conditions
are true:
The position argument is less than zero
The position argument is greater than or equal to the length of the array
The JSON string does not represent an array (that is, the string is a valid JSON scalar value or object)
The function returns an error if the first argument is not a valid JSON string.
Example
The following example uses the ARRAY_ELEMENT() function along with FIELD() to extract specific
array elements from one field in a JSON-encoded VARCHAR column:
SELECT language,
ARRAY_ELEMENT(FIELD(words,'colors'),1) AS color,
ARRAY_ELEMENT(FIELD(words,'numbers'),2) AS number
FROM world_languages WHERE language = 'French';
Assuming the column words has the following structure, the query returns the strings "French', "vert",
and "trois".
{"colors":["rouge","vert","bleu"],
"numbers":["un","deux","trois"]}
197
SQL Functions
ARRAY_LENGTH()
ARRAY_LENGTH() Returns the number of elements in a JSON array.
Syntax
ARRAY_LENGTH( JSON-array )
Description
The ARRAY_LENGTH() returns the length of a JSON array; that is, the number of elements the array
contains. The length is returned as an integer.
The ARRAY_LENGTH() function can be combined with other functions, such as FIELD(), to traverse
more complex JSON structures.
The function returns NULL if the argument is a valid JSON string but does not represent an array. The
function returns an error if the argument is not a valid JSON string.
Example
The following example uses the ARRAY_LENGTH(), ARRAY_ELEMENT(), and FIELD() functions to
return the last element of an array in a larger JSON string. The functions perform the following actions:
Innermost, the FIELD() function extracts the JSON field "alerts", which is assumed to be an array, from
the column messages.
ARRAY_LENGTH() determines the number of elements in the array.
ARRAY_ELEMENT() returns the last element based on the value of ARRAY_LENGTH() minus one
(because the array positions are zero-based).
SELECT ARRAY_ELEMENT(FIELD(messages,'alerts'),
ARRAY_LENGTH(FIELD(messages,'alerts'))-1) AS last_alert,
station FROM reportlog
WHERE station=?;
198
SQL Functions
ASTEXT()
ASTEXT() Returns the Well Known Text (WKT) representation of a GEOGRAPHY or
GEOGRAPHY_POINT value.
Syntax
ASTEXT( polygon | point )
Description
The ASTEXT() function returns a text string containing a Well Known Text (WKT) representation of a
GEOGRAPHY or GEOGRAPHY_POINT value. ASTEXT( value ) produces the same results as calling
CAST( value AS VARCHAR).
Note that ASTEXT() does not return the identical text string that was originally input using POINTFROMTEXT() or POLYGONFROMTEXT(). When geospatial data is converted from WKT to its internal representation, the string representations of longitude and latitude are converted to double floating point values.
Rounding and differing levels of precision may result in small differences in the stored values. The use
of spaces and capitalization may also vary between the original input strings and the computed output of
the ASTEXT() function.
Examples
The following SELECT statement uses the ASTEXT() function to return the WKT representation of a
GEOGRAPHY_POINT value in the column location.
SELECT, name, ASTEXT(location) FROM city
WHERE state = 'NY' ORDER BY name;
199
SQL Functions
AVG()
AVG() Returns the average of a range of numeric column values.
Syntax
AVG( column-expression )
Description
The AVG() function returns the average of a range of numeric column values. The values being averaged
depend on the constraints defined by the WHERE and GROUP BY clauses.
Example
The following example returns the average price for each product category.
SELECT AVG(price), category FROM product_list
GROUP BY category ORDER BY category;
200
SQL Functions
BIN()
BIN() Returns the binary representation of a BIGINT value as a string.
Syntax
BIN( value )
Description
The BIN() function returns the binary representation of a BIGINT value as a string. The function will
return the shortest valid string representation, truncating any preceding zeros (except in the case of the
value zero, which is returned as the string "0").
Example
The following example use the BIN and BITAND functions to return the binary representations of two
BIGINT values and their binary intersection.
$ sqlcmd
1> create table bits (a bigint, b bigint);
2> insert into bits values(55,99);
3> select bin(a) as int1, bin(b) as int2,
4>
bin(bitand(a,b)) as intersection from bits;
INT1
INT2
INTERSECTION
-------- --------- ------------110111
1100011
100011
201
SQL Functions
BIT_SHIFT_LEFT()
BIT_SHIFT_LEFT() Shifts the bits of a BIGINT value to the left a specified number of places.
Syntax
BIT_SHIFT_LEFT( value, offset )
Description
The BIT_SHIFT_LEFT() function shifts the bit values of a BIGINT value to the left the number of places
specified by offset. The offset must be a positive integer value. The unspecified bits to the right are padded
with zeros. So, for example, if the offset is 5, the left-most 5 bits are dropped, the remaining bits are shifted
5 places to the left, and the right-most 5 bits are set to zero. The result is returned as a new BIGINT value
the arguments to the function are not modified.
The left-most bit of an integer number is the sign bit, but has no special meaning for bitwise operations.
However, The left-most bit set to 1 followed by all zeros is reserved as the NULL value. If you use a
NULL value as an argument, you will receive a NULL response. But in all other circumstances (using
non-NULL BIGINT arguments), the bitwise functions should never return a NULL result. Consequently
any bitwise operation that would result in only the left-most bit being set, will generate an error at runtime.
Examples
The following example shifts the bits in a BIGINT value three places to the left and displays the hexadecimal representation of both the initial value and the resulting value.
$ sqlcmd
1> create table bits (a bigint);
2> insert into bits values (112);
3> select hex(a), hex(bit_shift_left(a,3)) from bits;
C1
C2
-------- --------70
380
202
SQL Functions
BIT_SHIFT_RIGHT()
BIT_SHIFT_RIGHT() Shifts the bits of a BIGINT value to the right a specified number of places.
Syntax
BIT_SHIFT_RIGHT( value, offset )
Description
The BIT_SHIFT_RIGHT() function shifts the bit values of a BIGINT value to the right the number of
places specified by offset. The offset must be a positive integer value. The unspecified bits to the left are
padded with zeros. So, for example, if the offset is 5, the right-most 5 bits are dropped, the remaining bits
are shifted 5 places to the right, and the left-most 5 bits are set to zero. The result is returned as a new
BIGINT value the arguments to the function are not modified.
The left-most bit of an integer number is the sign bit, but has no special meaning for bitwise operations.
However, The left-most bit set to 1 followed by all zeros is reserved as the NULL value. If you use a
NULL value as an argument, you will receive a NULL response. But in all other circumstances (using
non-NULL BIGINT arguments), the bitwise functions should never return a NULL result. Consequently
any bitwise operation that would result in only the left-most bit being set, will generate an error at runtime.
Examples
The following example shifts the bits in a BIGINT value three places to the right and displays the hexadecimal representation of both the initial value and the resulting value.
$ sqlcmd
1> create table bits (a bigint);
2> insert into bits values (112);
3> select hex(a), hex(bit_shift_right(a,3)) from bits;
C1
C2
-------- ------70
E
203
SQL Functions
BITAND()
BITAND() Returns the mask of bits set in both of two BIGINT values
Syntax
BITAND( value, value )
Description
The BITAND() function returns the mask of bits set in both of two BIGINT integers. In other words, it
performs a bitwise AND operation on the two arguments. The result is returned as a new BIGINT value
the arguments to the function are not modified.
The left-most bit of an integer number is the sign bit, but has no special meaning for bitwise operations.
However, The left-most bit set to 1 followed by all zeros is reserved as the NULL value. If you use a
NULL value as an argument, you will receive a NULL response. But in all other circumstances (using
non-NULL BIGINT arguments), the bitwise functions should never return a NULL result. Consequently
any bitwise operation that would result in only the left-most bit being set, will generate an error at runtime.
Examples
The following example writes values into two BIGINT columns of the table bits and then returns the
bitwise AND of the columns:
$ sqlcmd
1> create table bits (a bigint, b bigint);
2> insert into bits (a,b) values (7,13);
3> select bitand(a,b) from bits;
C1
--5
204
SQL Functions
BITNOT()
BITNOT() Returns the mask reversing every bit of a BIGINT value.
Syntax
BITNOT( value )
Description
The BITNOT() function returns the mask reversing every bit in a BIGINT value. In other words, it performs
a bitwise NOT operation, returning the complement of the argument. The result is returned as a new
BIGINT value the argument to the function is not modified.
The left-most bit of an integer number is the sign bit, but has no special meaning for bitwise operations.
However, The left-most bit set to 1 followed by all zeros is reserved as the NULL value. If you use a
NULL value as an argument, you will receive a NULL response. But in all other circumstances (using
non-NULL BIGINT arguments), the bitwise functions should never return a NULL result. Consequently
any bitwise operation that would result in only the left-most bit being set, will generate an error at runtime.
Examples
The following example writes a value into a BIGINT column of the table bits and then returns the bitwise
NOT of the column:
$ sqlcmd
1> create table bits (a bigint);
2> insert into bits (a) values (1234567890);
3> select bitnot(a) from bits;
C1
------------1234567891
205
SQL Functions
BITOR()
BITOR() Returns the mask of bits set in either of two BIGINT values
Syntax
BITOR( value, value )
Description
The BITOR) function returns the mask of bits set in either of two BIGINT integers. In other words, it
performs a bitwise OR operation on the two arguments. The result is returned as a new BIGINT value
the arguments to the function are not modified.
The left-most bit of an integer number is the sign bit, but has no special meaning for bitwise operations.
However, The left-most bit set to 1 followed by all zeros is reserved as the NULL value. If you use a
NULL value as an argument, you will receive a NULL response. But in all other circumstances (using
non-NULL BIGINT arguments), the bitwise functions should never return a NULL result. Consequently
any bitwise operation that would result in only the left-most bit being set, will generate an error at runtime.
Examples
The following example writes values into two BIGINT columns of the table bits and then returns the
bitwise OR of the columns:
$ sqlcmd
1> create table bits (a bigint, b bigint);
2> insert into bits (a,b) values (7,13);
3> select bitor(a,b) from bits;
C1
--15
206
SQL Functions
BITXOR()
BITXOR() Returns the mask of bits set in one but not both of two BIGINT values
Syntax
BITXOR( value, value )
Description
The BITXOR() function returns the mask of bits set in one but not both of two BIGINT integers. In other
words, it performs a bitwise XOR operation on the two arguments. The result is returned as a new BIGINT
value the arguments to the function are not modified.
The left-most bit of an integer number is the sign bit, but has no special meaning for bitwise operations.
However, The left-most bit set to 1 followed by all zeros is reserved as the NULL value. If you use a
NULL value as an argument, you will receive a NULL response. But in all other circumstances (using
non-NULL BIGINT arguments), the bitwise functions should never return a NULL result. Consequently
any bitwise operation that would result in only the left-most bit being set, will generate an error at runtime.
Examples
The following example writes values into two BIGINT columns of the table bits and then returns the
bitwise XOR of the columns:
$ sqlcmd
1> create table bits (a bigint, b bigint);
2> insert into bits (a,b) values (7,13);
3> select bitxor(a,b) from bits;
C1
--10
207
SQL Functions
CAST()
CAST() Explicitly converts an expression to the specified datatype.
Syntax
CAST( expression AS datatype )
Description
The CAST() function converts an expression to a specified datatype. Cases where casting is beneficial
include when converting between numeric types (such as integer and float) or when converting a numeric
value to a string.
All numeric datatypes can be used as the source and numeric or string datatypes can be the target. When
converting from decimal values to integers, values are truncated. You can also cast from a TIMESTAMP
to a VARCHAR or from a VARCHAR to a TIMESTAMP, assuming the text string is formatted as YYYYMM-DD or YYYY-MM-DD HH:MM:SS.nnnnnnn. Where the runtime value cannot be converted (for example, the value exceeds the maximum allowable value of the target datatype) an error is thrown.
You cannot use VARBINARY as either the target or the source datatype. To convert between numeric and
TIMESTAMP values, use the TO_TIMESTAMP(), FROM_UNIXTIME(), and EXTRACT() functions.
The result of the CAST() function of a null value is the corresponding null in the target datatype.
Example
The following example uses the CAST() function to ensure the result of an expression is also a floating
point number and does not truncate the decimal portion.
SELECT contestant, CAST( (votes * 100) as FLOAT) / ? as percentage
FROM contest ORDER BY votes, contestant
208
SQL Functions
CEILING()
CEILING() Returns the smallest integer value greater than or equal to a numeric expression.
Syntax
CEILING( numeric-expression )
Description
The CEILING() function returns the next integer greater than or equal to the specified numeric expression.
In other words, the CEILING() function "rounds up" numeric values. For example:
CEILING(3.1415) = 4
CEILING(2.0) = 2
CEILING(-5.32) = -5
Example
The following example uses the CEILING function to calculate the shipping costs for a product based on
its weight in the next whole number of pounds.
SELECT shipping.cost_per_lb * CEILING(product.weight),
product.prod_id FROM product, shipping
ORDER BY product.prod_id;
209
SQL Functions
CENTROID()
CENTROID() Returns the central point of a polygon.
Syntax
CENTROID( polygon )
Description
The CENTROID() returns the central point of a GEOGRAPHY polygon. The centroid is the point where
any line passing through the centroid divides the polygon into two segments of equal area. The return value
of the CENTROID() function is a GEOGRAPHY_POINT value.
Note that the centroid may fall outside of the polygon itself. For example, if the polygon is a ring (that is,
a circle with an inner circle removed) or a horseshoe shape.
Example
The following example uses the CENTROID() and LATITUDE() functions to return a list of countries
where the majority of the land mass falls above the equator.
SELECT name, capital FROM country
WHERE LATITUDE(CENTROID(outline)) > 0
ORDER BY name, capital;
210
SQL Functions
CHAR()
CHAR() Returns a string with a single UTF-8 character associated with the specified character code.
Syntax
CHAR( integer )
Description
The CHAR() function returns a string containing a single UTF-8 character that matches the specified
UNICODE character code. One use of the CHAR() function is to insert non-printing and other hard to
enter characters into string expressions.
Example
The following example uses CHAR() to add a copyright symbol into a VARCHAR field.
UPDATE book SET copyright_notice= CHAR(169) || CAST(? AS VARCHAR)
WHERE isbn=?;
211
SQL Functions
CHAR_LENGTH()
CHAR_LENGTH() Returns the number of characters in a string.
Syntax
CHAR_LENGTH( string-expression )
Description
The CHAR_LENGTH() function returns the number of text characters in a string.
Note that the number of characters and the amount of physical space required to store those characters can
differ. To measure the length of the string, in bytes, use the OCTET_LENGTH() function.
Example
The following example returns the string in the column LastName as well as the number of characters and
length in bytes of that string.
SELECT LastName, CHAR_LENGTH(LastName), OCTET_LENGTH(LastName)
FROM Customers ORDER BY LastName, FirstName;
212
SQL Functions
COALESCE()
COALESCE() Returns the first non-null argument, or null.
Syntax
COALESCE( expression [, ... ] )
Description
The COALESCE() function takes multiple arguments and returns the value of the first argument that is
not null, or if all arguments are null the function returns null.
Examples
The following example uses COALESCE to perform two functions:
Replace possibly null column values with placeholder text
Return one of several column values
In the second usage, the SELECT statement returns the value of the column State, Province, or Territory
depending on the first that contains a non-null value. Or the function returns a null value if none of the
columns are non-null.
SELECT lastname, firstname,
COALESCE(address,'[address unkown]'),
COALESCE(state, province, territory),
country FROM users ORDER BY lastname;
213
SQL Functions
CONCAT()
CONCAT() Concatenates two or more strings and returns the result.
Syntax
CONCAT( string-expression { , ... } )
Description
The CONCAT() function concatenates two or more strings and returns the resulting string. The string
concatenation operator || performs the same function as CONCAT().
Example
The following example concatenates the contents of two columns as part of a SELECT expression.
SELECT price, CONCAT(category,part_name) AS full_part_name
FROM product_list ORDER BY price;
The next example does something similar but uses the || operator as a shorthand to concatenate three strings,
two columns and a string constant, as part of a SELECT expression.
SELECT lastname || ', ' || firstname AS full_name
FROM customers ORDER BY lastname, firstname;
214
SQL Functions
CONTAINS()
CONTAINS() Returns true or false depending if a point falls within the specified polygon.
Syntax
CONTAINS( polygon, point )
Description
The CONTAINS() function determines if a given point falls within the specified GEOGRAPHY polygon.
If so, the function returns a boolean value of true. If not, it returns false.
Example
The following example uses the CONTAINS function to see if a specific user is with the boundaries of a
city or not by evaluating if the user.location GEOGRAPHY_POINT column value falls within the polygon
defined by the city.boundary GEOGRAPHY column.
SELECT user.name, user.id, city.name FROM user, city
WHERE user.id = ? AND CONTAINS(city.boundary,user.location);
215
SQL Functions
COUNT()
COUNT() Returns the number of rows selected containing the specified column.
Syntax
COUNT( column-expression )
Description
The COUNT() function returns the number of rows selected for the specified column. Since the actual
value of the column is not used to calculate the count, you can use the asterisk (*) as a wildcard for any
column. For example the query SELECT COUNT(*) FROM widgets returns the number of rows in
the table widgets, without needing to know what columns the table contains.
The one case where the column name is significant is if you use the DISTINCT clause to constrain the
selection expression. For example, SELECT COUNT(DISTINCT last_name) FROM customer
returns the count of unique last names in the customer table.
Examples
The following example returns the number of rows where the product name starts with the captial letter A.
SELECT COUNT(*) FROM product_list
WHERE product_name LIKE 'A%';
The next example returns the total number of unique product categories in the product list.
SELECT COUNT(DISTINCT category) FROM product_list;
216
SQL Functions
CURRENT_TIMESTAMP
CURRENT_TIMESTAMP Returns the current time as a timestamp value.
Syntax
CURRENT_TIMESTAMP
Description
The CURRENT_TIMESTAMP function returns the current time as a VoltDB timestamp. The value of the
timestamp is determined when the query or stored procedure is invoked. Several important aspects of how
the CURRENT_TIMESTAMP function operates are:
The value returned is guaranteed to be identical for all partitions that execute the query.
The value returned is measured in milliseconds then padded to create a timestamp value in microseconds.
During command logging, the returned value is stored as part of the log, so when the command log is
replayed, the same value is used during the replay of the query.
Similarly, for database replication (DR) the value returned is passed and reused by the replica database
when replaying the query.
You can specify CURRENT_TIMESTAMP as a default value in the CREATE TABLE statement when
defining the schema of a VoltDB database.
The CURRENT_TIMESTAMP function cannot be used in the CREATE INDEX or CREATE VIEW
statements.
The NOW and CURRENT_TIMESTAMP functions are synonyms and perform an identical function.
Example
The following example uses CURRENT_TIMESTAMP in the WHERE clause to delete alert events that
occurred in the past:
DELETE FROM Alert_event WHERE event_timestamp < CURRENT_TIMESTAMP;
217
SQL Functions
DATEADD()
DATEADD() Returns a new timestamp value by adding a specified time interval to an existing timestamp value.
Syntax
DATEADD( time-unit, interval, timestamp )
Description
The DATEADD() function creates a new TIMESTAMP value by adding (or subtracting for negative values) the specified time interval from another TIMESTAMP value. The first argument specifies the time
unit of the interval. The valid time unit keywords are:
The second argument is an integer value specifying the interval to add to the TIMESTAMP value. A
positive interval moves the time ahead. A negative interval moves the time value backwards. The third
argument specifies the TIMESTAMP value to which the interval is applied.
The DATEADD function takes into account leap years and the variable number of days in a month. Therefore, if the year of either the specified timestamp or the resulting timestamp is a leap year, the day is adjusted to its correct value. For example, DATEADD(YEAR, 1, 2008-02-29) returns 2009-02-28. Similarly, if the original timestamp is the last day of a month, then the resulting timestamp will be adjusted as
necessary. For example, DATEADD(MONTH, 1, 2008-03-31) returns 2008-04-30.
Example
The following example uses the DATEADD() function to find all records where the TIMESTAMP column,
incident, occurs within one day before a specified timestamp (entered as a POSIX time value).
SELECT incident, description FROM securityLog
WHERE DATEADD(DAY, 1, incident) > FROM_UNIXTIME(?)
AND incident < FROM_UNIXTIME(?)
ORDER BY incident, description;
218
SQL Functions
DAY(), DAYOFMONTH()
DAY(), DAYOFMONTH() Returns the day of the month as an integer value.
Syntax
DAY( timestamp-value )
DAYOFMONTH( timestamp-value )
Description
The DAY() function returns an integer value between 1 and 31 representing the timestamp's day of the
month. The DAY() and DAYOFMONTH() functions are synonyms. These functions produce the same
result as using the DAY or DAY_OF_MONTH keywords with the EXTRACT() function.
Examples
The following example uses the DAY(), MONTH(), and YEAR() functions to return a timestamp column
as a formatted date string.
SELECT
219
SQL Functions
DAYOFWEEK()
DAYOFWEEK() Returns the day of the week as an integer between 1 and 7.
Syntax
DAYOFWEEK( timestamp-value )
Description
The DAYOFWEEK() function returns an integer value between 1 and 7 representing the day of the week
in a timestamp value. For the DAYOFTHEWEEK() function, the week starts (1) on Sunday and ends (7)
on Saturday.
This function produces the same result as using the DAY_OF_WEEK keyword with the EXTRACT()
function.
Examples
The following example uses DAYOFWEEK() and the DECODE() function to return a string value representing the day of the week for the specified TIMESTAMP value.
SELECT eventtime,
DECODE(DAYOFWEEK(eventtime),
1, 'Sunday',
2, 'Monday',
3, 'Tuesday',
4, 'Wednesday',
5, 'Thursday',
6, 'Friday',
7, 'Saturday') AS eventday
FROM event ORDER BY eventtime;
220
SQL Functions
DAYOFYEAR()
DAYOFYEAR() Returns the day of the year as an integer between 1 and 366.
Syntax
DAYOFYEAR( timestamp-value )
Description
The DAYOFYEAR() function returns an integer value between 1 and 366 representing the day of the year
of a timestamp value. This function produces the same result as using the DAY_OF_YEAR keyword with
the EXTRACT() function.
Examples
The following example uses the DAYOFYEAR() function to determine the number of days until an event
occurs.
SELECT DECODE(YEAR(NOW), YEAR(starttime),
CAST(DAYOFYEAR(starttime) - DAYOFYEAR(NOW) AS VARCHAR)
|| ' days remaining',
CAST(YEAR(starttime) - YEAR(NOW) AS VARCHAR)
|| ' years remaining'),
eventname FROM event;
221
SQL Functions
DECODE()
DECODE() Evaluates an expression against one or more alternatives and returns the matching response.
Syntax
DECODE( expression, { comparison-value, result } [,...] [,default-result] )
Description
The DECODE() function compares an expression against one or more possible comparison values. If the
expression matches the comparison-value, the associated result is returned. If the expression does not
match any of the comparison values, the default-result is returned. If the expression does not match any
comparison value and no default result is specified, the function returns NULL.
The DECODE() function operates the same way an IF-THEN-ELSE, or CASE statement does in other
languages.
Example
The following example uses the DECODE() function to interpret a coded data column and replace it with
the appropriate meaning for each code.
SELECT title, industry, DECODE(salary_range,
'A', 'under $25,000',
'B', '$25,000 - $34,999',
'C', '$35,000 - $49,999',
'D', '$50,000 - $74,999',
'E', '$75,000 - $99,000',
'F', '$100,000 and over',
'unspecified') from survey_results
order by industry, title;
The next example tests a value against three columns and returns the name of the column when a match
is found, or a message indicating no match if none is found.
SELECT product_name, DECODE(?,product_name,'PRODUCT NAME',
part_name, 'PART NAME',
category, 'CATEGORY',
'NO MATCH FOUND')
FROM product_list ORDER BY product_name;
222
SQL Functions
DISTANCE()
DISTANCE() Returns the distance between two points or a point and a polygon.
Syntax
DISTANCE( point-or-polygon, point-or-polygon )
Description
The DISTANCE() function returns the distance, measured in meters, between two points or a point
and a polygon. The arguments to the function can be either two GEOGRAPHY_POINT values or a
GEOGRAPHY_POINT and GEOGRAPHY value.
Examples
The following example finds the closest city to a specified user, using the GEOGRAPHY_POINT column
user.location and the GEOGRAPHY column city.boundary.
SELECT TOP 1 user.name, city.name,
DISTANCE(user.location, city.boundary)
FROM user, city WHERE user.id = ?
ORDER BY DISTANCE(user.location, city.boundary) ASC;
The next example finds the distance in kilometers from a truck to stores, listed in order with closest first,
using the two GEOGRAPHY_POINT columns truck.loc and store.loc.
SELECT store.address,
DISTANCE(store.loc,truck.loc) / 1000 AS distance
FROM store, truck WHERE truck.id = ?
ORDER BY DISTANCE(store.loc,truck.loc)/1000 ASC;
223
SQL Functions
DWITHIN()
DWITHIN() Returns true or false depending whether two geospatial entities are within a specified
distance of each other.
Syntax
DWITHIN( polygon-or-point, polygon-or-point, distance )
Description
The DWITHIN() function determines if two geospatial values are within the specified distance of each
other. The values can be two points (GEOGRAPHY_POINT) or a point and a polygon (GEOGRAPHY).
The maximum distance is specified as a numeric value measured in meters. If the distance between the
two geospatial values is less than or equal to the specified distance, the function returns true. If not, it
returns false.
Examples
The following example finds all the cities within five kilometers of a given user, by evaluating the distance
between the GEOGRAPHY_POINT column user.loc and the GEOGRAPHY column city.boundary.
SELECT user.name, city.name, DISTANCE(user.loc, city.boundary)
FROM user, city WHERE user.id=?
AND DWITHIN(user.loc, city.boundary, 5000)
ORDER BY DISTANCE(user.loc, city.boundary) ASC;
The next is a more generalized example, where the query returns all delivery trucks within a specified
distance of a store, where both the distance and the store ID are parameterized and can be input at runtime.
SELECT store.address, truck.license_number
DISTANCE(store.loc, truck.loc)/1000 AS distance_in_km
FROM store, truck WHERE DWITHIN(store.loc, truck.loc, ?) and store.id=?
ORDER BY DISTANCE(store.loc,truck.loc)/1000 ASC;
224
SQL Functions
EXP()
EXP() Returns the exponential of the specified numeric expression.
Syntax
EXP( numeric-expression )
Description
The EXP() function returns the exponential of the specified numeric expression. In other words, EXP(x)
is the equivalent of the mathematical expression ex.
Example
The following example uses the EXP function to calculate the potential population of certain species of
animal projecting out ten years.
SELECT species, population AS current,
(population/2) * EXP(10*(gestation/365)*litter) AS future
FROM animals
WHERE species = 'rabbit'
ORDER BY population;
225
SQL Functions
EXTRACT()
EXTRACT() Returns the value of a selected portion of a timestamp.
Syntax
EXTRACT( selection-keyword FROM timestamp-expression )
EXTRACT( selection-keyword, timestamp-expression )
Description
The EXTRACT() function returns the value of the selected portion of a timestamp. Table C.1, Selectable
Values for the EXTRACT Function lists the supported keywords, the datatype of the value returned by
the function, and a description of its contents.
Datatype
Description
YEAR
INTEGER
QUARTER
TINYINT
MONTH
TINYINT
DAY
TINYINT
DAY_OF_MONTH TINYINT
DAY_OF_WEEK
TINYINT
The day of the week as a numeric value between 1 and 7, starting with Sunday.
DAY_OF_YEAR
SMALLINT
WEEK
TINYINT
WEEK_OF_YEAR TINYINT
WEEKDAY
TINYINT
The day of the week as a numeric value between 0 and 6, starting with Monday.
HOUR
TINYINT
MINUTE
TINYINT
SECOND
DECIMAL
The timestamp expression is interpreted as a VoltDB timestamp; That is, time measured in microseconds.
Example
The following example lists all the contacts by name and birthday, listing the birthday as three separate
fields for month, day, and year.
SELECT Last_name, first_name, EXTRACT(MONTH FROM dateofbirth),
226
SQL Functions
227
SQL Functions
FIELD()
FIELD() Extracts a field value from a JSON-encoded string column.
Syntax
FIELD( column, field-name-path )
Description
The FIELD() function extracts a field value from a JSON-encoded string. For example, assume the VARCHAR column Profile contains the following JSON string:
{"first":"Charles","last":"Dickens","birth":1812,
"description":{"genre":"fiction",
"period":"Victorian",
"output":"prolific",
"children":["Charles","Mary","Kate","Walter","Francis",
"Alfred","Sydney","Henry","Dora","Edward"]
}
}
It is possible to extract individual field values using the FIELD() function, as in the following SELECT
statement:
SELECT FIELD(profile,'first') AS firstname,
FIELD(profile,'last') AS lastname FROM Authors;
It is also possible to find records based on individual JSON fields by using the FIELD() function in the
WHERE clause. For example, the following query retrieves all records from the Authors table where the
JSON field birth is 1812. Note that the FIELD() function always returns a string, even if the JSON type is
numeric. The comparison must match the string datatype, so the constant '1812' is in quotation marks:
SELECT * FROM Authors WHERE FIELD(profile,'birth') = '1812';
The second argument to the FIELD() function can be a simple field name, as in the previous examples.
In which case the function returns a first-level field matching the specified name. Alternately, you can
specify a path representing a hierarchy of names separated by periods. For example, you can specify the
genre element of the description field by specifying "description.genre" as the second argument, like so
SELECT * FROM Authors WHERE
FIELD(profile,'description.genre') = 'fiction';
You can also use array notation with square brackets and an integer value to identify array elements
by their position. So, for example, the function can return "Kate", the third child, by using the path specifier "description.children[2]", where "[2]" identifies the third array element because JSON arrays are zero-based.
Two important points to note concerning input to the FIELD() function:
If the requested field name does not exist, the function returns a null value.
The first argument to the FIELD() function must be a valid JSON-encoded string. However, the content
is not evaluated until the function is invoked at runtime. Therefore, it is the responsibility of the database
228
SQL Functions
application to ensure the validity of the content. If the FIELD() function encounters invalid content,
the query will fail.
Example
The following example uses the FIELD() function to both return specific JSON fields within a VARCHAR
column and filter the results based on the value of a third JSON field:
SELECT product_name, sku,
FIELD(specification,'color') AS color,
FIELD(specification,'weight') AS weight FROM Inventory
WHERE FIELD(specification, 'category') = 'housewares'
ORDER BY product_name, sku;
229
SQL Functions
FLOOR()
FLOOR() Returns the largest integer value less than or equal to a numeric expression.
Syntax
FLOOR( numeric-expression )
Description
The FLOOR() function returns the largest integer less then or equal to the specified numeric expression.
In other words, the FLOOR() function truncates fractional numeric values. For example:
FLOOR(3.1415) = 3
FLOOR(2.0) = 2
FLOOR(-5.32) = -6
Example
The following example uses the FLOOR function to calculate the whole number of stocks owned by a
specific shareholder.
SELECT customer, company,
FLOOR(num_of_stocks) AS stocks_available_for_sale
FROM shareholders WHERE customer_id = ?
ORDER BY company;
230
SQL Functions
FORMAT_CURRENCY()
FORMAT_CURRENCY() Converts a DECIMAL to a text string as a monetary value.
Syntax
FORMAT_CURRENCY( decimal-value, rounding-position )
Description
The FORMAT_CURRENCY() function converts a DECIMAL value to its string representation, rounding
to the specified position. The resulting string is formatted with commas separating every three digits of
the whole portion of the number (indicating thousands, millions, and so on) and a decimal point before
the fractional portion, as needed.
The rounding-position argument must be an integer between 12 and -25 and indicates the place to which the
numeric value should be rounded. Positive values indicate a decimal place; for example 2 means round to
2 decimal places. Negative values indicate rounding to a whole number position; for example, -2 indicates
the number should be rounded to the nearest hundred. A zero indicates that the value should be rounded
to the nearest whole number.
Rounding is performed using "banker's rounding", in that any fractional half is rounded to the nearest even
number. So, for example, if the rounding-position is 2, the value 22.225 is rounded to 22.22, but the value
33.335 is rounded to 33.34. The following list demonstrates some sample results.
FORMAT_CURRENCY( .123456789, 4) = 0.1235
FORMAT_CURRENCY( 123456789.123, 2 ) = 123,456,789.12
FORMAT_CURRENCY( 123456789.123, 0 ) = 123,456,789
FORMAT_CURRENCY( 123456789.123, -2 ) = 123,456,800
FORMAT_CURRENCY( 123456789.123, -6 ) = 123,000,000
FORMAT_CURRENCY( 123456789.123, 6 ) = 123,456,789.123000
Example
The following example uses the FORMAT_CURRENCY() function to return a DECIMAL column as a
string representation of its monetary value, rounding to two decimal places and appending the appropriate
currency symbol from a VARCHAR column.
SELECT country,
currency_symbol || format_currency(budget,2) AS annual_budget
FROM world_economy ORDER BY country;
231
SQL Functions
FROM_UNIXTIME()
FROM_UNIXTIME() Converts a UNIX time value to a VoltDB timestamp.
Syntax
FROM_UNIXTIME( integer-expression )
Description
The FROM_UNIXTIME() function converts an integer expression to a VoltDB timestamp, interpreting
the integer value as a POSIX time value; that is the number of seconds since the epoch (00:00.00 on
January 1, 1970 Consolidated Universal Time). This function is a synonym for TO_TIMESTAMP(second,
integer-expression).
Example
The following example inserts a record using FROM_UNIXTIME to convert the first argument, a POSIX
time value, into a VoltDB timestamp:
INSERT event (e_when, e_what, e_where)
VALUES (FROM_UNIX_TIME(?),?,?);
232
SQL Functions
HEX()
HEX() Returns the hexadecimal representation of a BIGINT value as a string.
Syntax
HEX( value )
Description
The HEX() function returns the hexadecimal representation of a BIGINT value as a string. The function
will return the shortest valid string representation, truncating any preceding zeros (except in the case of
the value zero, which is returned as the string "0").
Examples
The following example use the HEX and BITAND functions to return the hexadecimal representations of
two BIGINT values and their binary intersection.
$ sqlcmd
1> create table bits (a bigint, b bigint);
2> insert into bits values(555,999);
3> select hex(a) as int1, hex(b) as int2,
4>
hex(bitand(a,b)) as intersection from bits;
INT1
INT2
INTERSECTION
-------- --------- ------------22B
3E7
223
233
SQL Functions
HOUR()
HOUR() Returns the hour of the day as an integer value.
Syntax
HOUR( timestamp-value )
Description
The HOUR() function returns an integer value between 0 and 23 representing the hour of the day in a timestamp value. This function produces the same result as using the HOUR keyword with the EXTRACT()
function.
Examples
The following example uses the HOUR(), MINUTE(), and SECOND() functions to return the time portion
of a TIMESTAMP value in a formatted string.
SELECT eventname,
CAST(HOUR(starttime) AS VARCHAR) || ' hours, ' ||
CAST(MINUTE(starttime) AS VARCHAR) || ' minutes, and ' ||
CAST(SECOND(starttime) AS VARCHAR) || ' seconds.'
AS timestring FROM event;
234
SQL Functions
ISINVALIDREASON()
ISINVALIDREASON() Explains why a GEOGRAPHY polygon is invalid
Syntax
ISINVALIDREASON( polygon )
Description
The ISINVALIDREASON() function returns a text string explaining if the specified GEOGRAPHY value
is valid or not and, if not, why not. The argument to the ISINVALIDREASON() function must be a GEOGRAPHY value describing a polygon. This function is especially useful when validating geospatial data.
Example
The following example uses the ISVALID() and ISINVALIDREASON() functions to report on any invalid
polygons in the border column of the country table.
SELECT country_name, ISINVALIDREASON(border)
FROM Country WHERE NOT ISVALID(border);
235
SQL Functions
ISVALID()
ISVALID() Determines if the specified GEOGRAPHY value is a valid polygon.
Syntax
ISVALID( polygon )
Description
The ISVALID() function returns true or false depending on whether the specified GEOGRAPHY value
is a valid polygon or not. Polygons must follow rules defined by the Open Geospatial Consortium (OGC)
standard for Well Known Text (WKT). Specifically:
A GEOGRAPHY polygon consists of one or more rings, where a ring is a closed boundary described
by a sequence of vertices and the lines, or edges, between those vertices.
The first ring must be the outer ring and the vertices must be listed in counter clockwise order.
All subsequent rings represent "holes" in the outer ring. The inner rings must be wholly contained within
the outer ring and their vertices must be listed in clockwise order.
Rings cannot intersect or have adjacent edges.
The edges of an individual ring cannot cross (for example, a figure "8" is invalid).
For each ring, the first vertex is listed twice: as both the first and last vertex.
If the specified GEOGRAPHY value is a valid polygon, the function returns true. If not, it returns false.
To maximize performance, VoltDB does not validate the GEOGRAPHY values when they are inserted.
However, if you are not sure the WKT strings are valid, you can use ISVALID() to validate the resulting
GEOGRAPHY values before inserting them or after they are inserted into the database.
Examples
The first example shows an UPDATE statement that uses the ISVALID() function to remove the contents
of a GEOGRAPHY column (by setting it to NULL), if the current contents are invalid.
UPDATE REGION SET border = NULL WHERE NOT ISVALID(border);
The next example shows part of a stored procedure that uses ISVALID() to conditionally set the value of a
column, mustbevalid, that is defined as NOT NULL. By setting the column valid to NULL, the procedure
ensures that the INSERT statement fails and the stored procedure rolls back if the WKT input is invalid.
public class ValidateBorders extends VoltProcedure {
public final SQLStmt insertrec = new SQLStmt(
"INSERT INTO REGION COLUMNS (name, border, mustbevalid)" +
" SELECT ?, ?, CASE WHEN ISVALID(?) THEN 1 ELSE NULL END" +
" FROM anothertable LIMIT 1;"
);
236
SQL Functions
237
SQL Functions
LATITUDE()
LATITUDE() Returns the latitude of a GEOGRAPHY_POINT value.
Syntax
LATITUDE( point )
Description
The LATITUDE() function returns the latitude, as a floating point value, from a GEOGRAPHY_POINT
expression.
Example
The following example returns all ships that are located in the northern hemisphere by examining the
latitude of their current location.
SELECT ship.number, ship.country FROM ship
WHERE LATITUDE(ship.location) > 0;
238
SQL Functions
LEFT()
LEFT() Returns a substring from the beginning of a string.
Syntax
LEFT( string-expression, numeric-expression )
Description
The LEFT() function returns the first n characters from a string expression, where n is the second argument
to the function.
Example
The following example uses the LEFT function to return an abbreviation (the first three characters) of the
product category as part of the SELECT expression.
SELECT LEFT(category,3), product_name, price FROM product_list
ORDER BY category, product_name;
239
SQL Functions
LN(), LOG()
LN(), LOG() Returns the natural logarithm of a numeric value.
Syntax
LN( numeric-value )
LOG( numeric-value )
Description
The LN() function returns the natural logarithm of the specified input value. The log is returned as a
floating point (FLOAT) value. LN() and LOG() are synonyms and perform the same function.
Example
The following example uses the LN() function to calculate the rate of population growth from census data.
SELECT city, current_population,
( ( LN(current_population) - LN(base_population) )
/ (current_year - base_year)
) * 100.0 AS percent_growth
FROM census ORDER BY city;
240
SQL Functions
LONGITUDE()
LONGITUDE() Returns the longitude of a GEOGRAPHY_POINT value.
Syntax
LONGITUDE( point )
Description
The LONGITUDE() function returns the longitude, as a floating point value, from a
GEOGRAPHY_POINT expression.
Example
The following example returns all ships that are located in the western hemisphere by examining the
longitude of their current location.
SELECT ship.number, ship.country FROM ship
WHERE LONGITUDE(ship.location) < 0
AND LONGITUDE(ship.location) > -180;
241
SQL Functions
LOWER()
LOWER() Returns a string converted to all lowercase characters.
Syntax
LOWER( string-expression )
Description
The LOWER() function returns a copy of the input string converted to all lowercase characters.
Example
The following example uses the LOWER function to perform a case-insensitive search of a VARCHAR
field.
SELECT product_name, product_id FROM product_list
WHERE LOWER(product_name) LIKE 'acme%'
ORDER BY product_name, product_id
242
SQL Functions
MAX()
MAX() Returns the maximum value from a range of column values.
Syntax
MAX( column-expression )
Description
The MAX() function returns the highest value from a range of column values. The range of values depends
on the constraints defined by the WHERE and GROUP BY clauses.
Example
The following example returns the highest price in the product list.
SELECT MAX(price) FROM product_list;
The next example returns the highest price for each product category.
SELECT category, MAX(price) FROM product_list
GROUP BY category
ORDER BY category;
243
SQL Functions
MIN()
MIN() Returns the minimum value from a range of column values.
Syntax
MIN( column-expression )
Description
The MIN() function returns the lowest value from a range of column values. The range of values depends
on the constraints defined by the WHERE and GROUP BY clauses.
Example
The following example returns the lowest price in the product list.
SELECT MIN(price) FROM product_list;
The next example returns the lowest price for each product category.
SELECT category, MIN(price) FROM product_list
GROUP BY category
ORDER BY category;
244
SQL Functions
MINUTE()
MINUTE() Returns the minute of the hour as an integer value.
Syntax
MINUTE( timestamp-value )
Description
The MINUTE() function returns an integer value between 0 and 59 representing the minute of the hour
in a timestamp value. This function produces the same result as using the MINUTE keyword with the
EXTRACT() function.
Examples
The following example uses the HOUR(), MINUTE(), and SECOND() functions to return the time portion
of a TIMESTAMP value in a formatted string.
SELECT eventname,
CAST(HOUR(starttime) AS VARCHAR) || ' hours, ' ||
CAST(MINUTE(starttime) AS VARCHAR) || ' minutes, and ' ||
CAST(SECOND(starttime) AS VARCHAR) || ' seconds.'
AS timestring FROM event;
245
SQL Functions
MOD()
MOD() Returns the result of a modulo operation.
Syntax
MOD( dividend, divisor )
Description
The MOD() function performs a modulo operation. That is, it divides one integer value, the dividend, by
another integer value, the divisor, and returns the remainder of the division operation as a new integer
value. Both the dividend and the divisor must be integer values and the divisor must not be zero. Use of
non-integer datatypes or a divisor of zero will result in a runtime error.
Example
The following example uses the HOUR() and MOD() functions to return the hour of a timestamp in 12
hour format
SELECT event,
MOD(HOUR(eventtime)+11, 12)+1,
CASE WHEN HOUR(eventtime)/12 < 1
THEN 'AM'
ELSE 'PM'
END
FROM schedule ORDER BY 3, 2;
246
SQL Functions
MONTH()
MONTH() Returns the month of the year as an integer value.
Syntax
MONTH( timestamp-value )
Description
The MONTH() function returns an integer value between 1 and 12 representing the timestamp's month
of the year. The MONTH() function produces the same result as using the MONTH keyword with the
EXTRACT() function.
Examples
The following example uses the DAY(), MONTH(), and YEAR() functions to return a timestamp column
as a formatted date string.
SELECT
247
SQL Functions
NOW
NOW Returns the current time as a timestamp value.
Syntax
NOW
Description
The NOW function returns the current time as a VoltDB timestamp. The value of the timestamp is determined when the query or stored procedure is invoked. Several important aspects of how the NOW function
operates are:
The value returned is guaranteed to be identical for all partitions that execute the query.
The value returned is measured in milliseconds then padded to create a timestamp value in microseconds.
During command logging, the returned value is stored as part of the log, so when the command log is
replayed, the same value is used during the replay of the query.
Similarly, for database replication (DR) the value returned is passed and reused by the replica database
when replaying the query.
You can specify NOW as a default value in the CREATE TABLE statement when defining the schema
of a VoltDB database.
The NOW function cannot be used in the CREATE INDEX or CREATE VIEW statements.
The NOW and CURRENT_TIMESTAMP functions are synonyms and perform an identical function.
Example
The following example uses NOW in the WHERE clause to delete alert events that occurred in the past:
DELETE FROM Alert_event WHERE event_timestamp < NOW;
248
SQL Functions
NUMINTERIORRINGS()
NUMINTERIORRINGS() Returns the number of interior rings within a polygon GEOGRAPHY value.
Syntax
NUMINTERIORRINGS( polygon )
Description
The NUMINTERIORRINGS() function returns the number of interior rings within a polygon GEOGRAPHY value. Polygon GEOGRAPHY values can contain multiple polygons: one and only one outer polygon
and one or more optional inner polygons that define "holes" in the outer polygon. The NUMINTERIORRINGS() function counts the number of inner polygons and returns the result as an integer value.
Example
The following example lists the countries of the world based on the number of interior polygons within
the outline GEOGRAPHY column.
SELECT NUMINTERIORRINGS(outline), name, capital FROM country
ORDER BY NUMINTERIORRINGS(outline);
249
SQL Functions
NUMPOINTS()
NUMPOINTS() Returns the number of points within a polygon GEOGRAPHY value.
Syntax
NUMPOINTS( polygon )
Description
The NUMPOINTS() function returns the total number of points that comprise a polygon GEOGRAPHY
value. The number of points includes the points from both the outer polygon and any inner polygons. It
also includes all of the points defining the polygon. Which means the starting point for each polygon is
counted twice once as the starting point and once and the ending point because this is required in
the WKT representation of a polygon.
Example
The following example lists the countries of the world based on the number of points in their outlines.
SELECT NUMPOINTS(outline), name, capital FROM country
ORDER BY NUMPOINTS(outline);
250
SQL Functions
OCTET_LENGTH()
OCTET_LENGTH() Returns the number of bytes in a string.
Syntax
OCTET_LENGTH( string-expression )
Description
The OCTET_LENGTH() function returns the number of bytes of data in a string.
Note that the number of bytes required to store a string and the actual characters that make up the string
can differ. To count the number of characters in the string use the CHAR_LENGTH() function.
Example
The following example returns the string in the column LastName as well as the number of characters and
length in bytes of that string.
SELECT LastName, CHAR_LENGTH(LastName), OCTET_LENGTH(LastName)
FROM Customers ORDER BY LastName, FirstName;
251
SQL Functions
OVERLAY()
OVERLAY() Returns a string overwriting a portion of the original string with the specified replacement.
Syntax
OVERLAY( string PLACING replacement-string FROM position [FOR length] )
Description
The OVERLAY() function overwrites a portion of the original string with the replacement string and
returns the result. The replacement starts at the specified position in the original string and either replaces
the characters one-for-one for the length of the replacement string or, if a FOR length is specified, replaces
the specified number of characters.
For example, if the original string is 12 characters in length, the replacement string is 3 characters in length
and starts at position 4, and the FOR clause is left off, the resulting string consists of the first 3 characters
of the original string, the replacement string, and the last 6 characters of the original string:
OVERLAY('abcdefghijkl' PLACING 'XYZ' FROM 4) = 'abcXYZghijkl'
If the FOR clause is included specifying that the replacement string replaces 6 characters, the result is the
first 3 characters of the original string, the replacement string, and the last 3 characters of the original string:
OVERLAY('abcdefghijkl' PLACING 'XYZ' FROM 4 FOR 6) = 'abcXYZjkl'
If the combination of the starting position and the replacement length exceed the length of the original
string, the resulting output is extended as necessary to include all of the replacement string:
OVERLAY('abcdef' PLACING 'XYZ' FROM 5) = 'abcdXYZ'
If the starting position is greater than the length of the original string, the replacement string is appended
to the original string:
OVERLAY('abcdef' PLACING 'XYZ' FROM 20) = 'abcdefXYZ'
Similarly, if the combination of the starting position and the FOR length is greater than the length of the
original string, the replacement string simply overwrites the remainder of the original string:
OVERLAY('abcdef' PLACING 'XYZ' FROM 2 FOR 20) = 'aXYZ'
The starting position and length must be specified as non-negative integers. The starting position must be
greater than zero and the length can be zero or greater.
Example
The following example uses the OVERLAY function to redact part of a name.
SELECT OVERLAY( fullname PLACING '****' FROM 2
FOR CHAR_LENGTH(fullname)-2
) FROM users ORDER BY fullname;
252
SQL Functions
PI()
PI() Returns the value of the mathematical constant pi () as a FLOAT value.
Syntax
PI()
Description
The PI() function returns the value of the mathematical constant pi () as a double floating point (FLOAT)
value.
Example
The following example uses the PI() function to return the surface area of a sphere.
SELECT radius, 4*PI()*POWER(radius, 2) FROM Sphere ORDER BY radius;
253
SQL Functions
POINTFROMTEXT()
POINTFROMTEXT() Returns a GEOGRAPHY_POINT value from the corresponding WKT
Syntax
POINTFROMTEXT( string )
Description
The POINTFROMTEXT() function generates a GEOGRAPHY_POINT value from a string containing
a well known text (WKT) representation of a geographic point. The WKT string must be in the form
'POINT( longitude latitude )' where longitude and latitude are floating point values.
if the argument is not a valid WKT representation of a point, the function generates an error.
Example
The following example uses the POINTFROMTEXT() function to update a record containing a
GEOGRAPHY_POINT column using two floating point input values (representing longitude and latitude).
UPDATE user SET location =
POINTFROMTEXT( CONCAT('POINT(',CAST(? AS VARCHAR),' ',CAST(? AS VARCHAR),')') )
WHERE id = ?;
254
SQL Functions
POLYGONFROMTEXT()
POLYGONFROMTEXT() Returns a GEOGRAPHY value from the corresponding WKT
Syntax
POLYGONFROMTEXT( text )
Description
The POLYGONFROMTEXT() function generates a GEOGRAPHY value from a string containing a well
known text (WKT) representation of a geographic polygon. The WKT string must be a valid representation
of a polygon with only one outer polygon and, optionally, one or more inner polygons.
if the argument is not a valid WKT representation of a polygon, the function generates an error.
Example
The following example uses the POLYGONFROMTEXT() function to insert a record containing a GEOGRAPHY column using a text input value containing the WKT representation of a geographic polygon.
INSERT INTO city (name, state, boundary) VALUES(?, ?, POLYGONFROMTEXT(?));
255
SQL Functions
POSITION()
POSITION() Returns the starting position of a substring in another string.
Syntax
POSITION( substring-expression IN string-expression )
Description
The POSITION() function returns the starting position of a substring in another string. The position, if a
match is found, is an integer number between one and the length of the string being searched. If no match
is found, the function returns zero.
Example
The following example selects all books where the title contains the word "poodle" and returns the book's
title and the position of the substring "poodle" in the title.
SELECT Title, POSITION('poodle' IN Title) FROM Books
WHERE Title LIKE '%poodle%' ORDER BY Title;
256
SQL Functions
POWER()
POWER() Returns the value of the first argument raised to the power of the second argument.
Syntax
POWER( numeric-expression, numeric-expression )
Description
The POWER() function takes two numeric expressions and returns the value of the first raised to the power
of the second. In other words, POWER(x,y) is the equivalent of the mathematical expression xy.
Example
The following example uses the POWER function to return the surface area and volume of a cube.
SELECT length, 6 * POWER(length,2) AS surface,
POWER(length,3) AS volume FROM Cube
ORDER BY length;
257
SQL Functions
QUARTER()
QUARTER() Returns the quarter of the year as an integer value
Syntax
QUARTER( timestamp-value )
Description
The QUARTER() function returns an integer value between 1 and 4 representing the quarter of the year
in a TIMESTAMP value. The QUARTER() function produces the same result as using the QUARTER
keyword with the EXTRACT() function.
Examples
The following example uses the QUARTER() and YEAR() functions to group and sort records containing
a timestamp.
SELECT year(starttime), quarter(starttime),
count(*) as eventsperquarter
FROM event
GROUP BY year(starttime), quarter(starttime)
ORDER BY year(starttime), quarter(starttime);
258
SQL Functions
REGEXP_POSITION()
REGEXP_POSITION() Returns the starting position of a regular expression within a text string.
Syntax
REGEXP_POSITION( string, pattern [, flag] )
Description
The REGEXP_POSITION() function returns the starting position of the first instance of the specified
regular expression within a text string. The position value starts at one (1) for the first position in the string
and the functions returns zero (0) if the regular expression is not found.
The first argument to the function is the VARCHAR character string to be searched. The second argument
is the regular expression pattern to look for. The third argument is an optional flag that specifies whether
the search is case sensitive or not. The flag must be single character VARCHAR with one of the following
values:
Flag
Description
Case-insensitive matching
There are several different formats for regular expressions. The REGEXP_POSITION() uses the revised
Perl compatible regular expression (PCRE2) syntax, which is described in detail on the PCRE website.
Examples
The following example uses the REGEXP_POSITION() to filter all records where the column description
matches a specific pattern. The examples uses the optional flag argument to make the pattern match text
regardless of case.
SELECT incident, description FROM securityLog
WHERE REGXP_POSITION(description,
'host:\s*10\.186\.[0-9]+\.[0-9]+',
'i') > 0
ORDER BY incident;
259
SQL Functions
REPEAT()
REPEAT() Returns a string composed of a substring repeated the specified number of times.
Syntax
REPEAT( string-expression, numeric-expression )
Description
The REPEAT() function returns a string composed of the substring string-expression repeated n times
where n is defined by the second argument to the function.
Example
The following example uses the REPEAT and the CHAR_LENGTH functions to replace a column's actual
contents with a mask composed of the letter "X" the same length as the original column value.
SELECT username, REPEAT('X', CHAR_LENGTH(password)) as Password
FROM accounts ORDER BY username;
260
SQL Functions
REPLACE()
REPLACE() Returns a string replacing the specified substring of the original string with new text.
Syntax
REPLACE( string, substring, replacement-string )
Description
The REPLACE() function returns a copy of the first argument, replacing all instances of the substring
identified by the second argument with the third argument. If the substring is not found, no changes are
made and a copy of the original string is returned.
Example
The following example uses the REPLACE function to update the Address column, replacing the string
"Ceylon" with "Sri Lanka".
UPDATE Customers SET address=REPLACE( address,'Ceylon', 'Sri Lanka')
WHERE address LIKE '%Ceylon%';
261
SQL Functions
RIGHT()
RIGHT() Returns a substring from the end of a string.
Syntax
RIGHT( string-expression, numeric-expression )
Description
The RIGHT() function returns the last n characters from a string expression, where n is the second argument
to the function.
Example
The following example uses the LEFT() and RIGHT() functions to return an abbreviated summary of the
Description column, ensuring the result fits within 20 characters.
SELECT product_name,
LEFT(description,10) || '...' || RIGHT(description,7)
FROM product_list ORDER BY product_name;
262
SQL Functions
SECOND()
SECOND() Returns the seconds of the minute as a floating point value.
Syntax
SECOND( timestamp-value )
Description
The SECOND() function returns an floating point value between 0 and 60 representing the whole and
fractional part of the number of seconds in the minute of a timestamp value. This function produces the
same result as using the SECOND keyword with the EXTRACT() function.
Examples
The following example uses the HOUR(), MINUTE(), and SECOND() functions to return the time portion
of a TIMESTAMP value in a formatted string.
SELECT eventname,
CAST(HOUR(starttime) AS VARCHAR) || ' hours, ' ||
CAST(MINUTE(starttime) AS VARCHAR) || ' minutes, and ' ||
CAST(SECOND(starttime) AS VARCHAR) || ' seconds.'
AS timestring FROM event;
263
SQL Functions
SET_FIELD()
SET_FIELD() Returns a copy of a JSON-encoded string, replacing the specified field value.
Syntax
SET_FIELD( column, field-name-path, string-value )
Description
The SET_FIELD() function finds the specified field within a JSON-encoded string and returns a copy of
the string with the new value replacing that field's previous value. Note that the SET_FIELD() function
returns an altered copy of the JSON-encoded string it does not change any column values in place. So
to change existing database columns, you must use SET_FIELD() with an UPDATE statement.
For example, assume the Product table contains a VARCHAR column Productinfo which for one row
contains the following JSON string:
{"product":"Acme widget",
"availability":"plenty",
"info": { "description": "A fancy widget.",
"sku":"ABCXYZ",
"part_number":1234},
"warehouse":[{"location":"Dallas","units":25},
{"location":"Chicago","units":14},
{"location":"Troy","units":67}]
}
It is possible to change the value of the availability field using the SET_FIELD function, like so:
UPDATE Product SET Productinfo =
SET_FIELD(Productinfo,'availability','"limited"')
WHERE FIELD(Productinfo,'product') = 'Acme widget';
The second argument to the SET_FIELD() function can be a simple field name, as in the previous example,
In which case the function replaces the value of the top field matching the specified name. Alternately, you
can specify a path representing a hierarchy of names separated by periods. For example, you can replace
the SKU number by specifying "info.sku" as the second argument, or you can replace the number of units
in the second warehouse by specifying the field path "warehouse[1].units". For example, the following
UPDATE statement does both by nesting SET_FIELD commands:
UPDATE Product SET Productinfo =
SET_FIELD(
SET_FIELD(Productinfo,'info.sku','"DEFGHI"'),
'warehouse[1].units', '128')
WHERE FIELD(Productinfo,'product') = 'Acme widget';
Note that the third argument is the string value that will be inserted into the JSON-encoded string. To insert
a numeric value, you enclose the value in single quotation marks, as in the preceding example where '128'
is used as the replacement value for the warehouse[1].units field. To insert a string value, you must
include the string quotation marks within the replacement string itself. For example, the preceding code
uses the SQL string constant '"DEFGHI"' to specify the replacement value for the text field info.sku.
264
SQL Functions
Finally, the replacement string value can be any valid JSON value, including another JSON-encoded object
or array. It does not have to be a scalar string or numeric value.
Example
The following example uses the SET_FIELD() function to add a new array element to the warehouse field.
UPDATE Product SET Productinfo =
SET_FIELD(Productinfo,'warehouse',
'[{"location":"Dallas","units":25},
{"location":"Chicago","units":14},
{"location":"Troy","units":67},
{"location":"Phoenix","units":23}]')
WHERE FIELD(Productinfo,'product') = 'Acme widget';
265
SQL Functions
SINCE_EPOCH()
SINCE_EPOCH() Converts a VoltDB timestamp to an integer number of time units since the POSIX
epoch.
Syntax
SINCE_EPOCH( time-unit, timestamp-expression )
Description
The SINCE_EPOCH() function converts a VoltDB timestamp into an 64-bit integer value (BIGINT) representing the equivalent number since the POSIX epoch in a specified time unit. POSIX time is usually
represented as the number of seconds since the epoch; that is, since 00:00.00 on January 1, 1970 Consolidated Universal Time (UTC). So the function SINCE_EPOCH(SECONDS, timestamp) returns the POSIX
time equivalent for the value of timestamp. However, you can also request the number of milliseconds or
microseconds since the epoch. The valid keywords for specifying the time units are:
SECOND Seconds since the epoch
MILLISECOND, MILLIS Milliseconds since the epoch
MICROSECOND, MICROS Microseconds since the epoch
You cannot perform arithmetic on timestamps directly. So SINCE_EPOCH() is useful for performing
calculations on timestamp values in SQL expressions. For example, the following SQL statement looks for
events that are less than a minute in length, based on the timestamp columns STARTTIME and ENDTIME:
SELECT * FROM Event
WHERE ( SINCE_EPOCH(Second, endtime)
- SINCE_EPOCH(Second, starttime) ) < 60;
The TO_TIMESTAMP() function performs the inverse of SINCE_EPOCH(), by converting an integer
value to a VoltDB timestamp based on the specified time units.
Example
The following example returns a timestamp column as the equivalent POSIX time value.
SELECT event_id, event_name,
SINCE_EPOCH(Second, starttime) as posix_time FROM Event
ORDER BY event_id;
The next example uses SINCE_EPOCH() to return the length of an event, in microseconds, by calculating
the difference between two timestamp columns.
SELECT event_id, event_type
SINCE_EPOCH(Microsecond, endtime)
-SINCE_EPOCH(Microsecond, starttime) AS delta
FROM Event ORDER BY event_id;
266
SQL Functions
SPACE()
SPACE() Returns a string of spaces of the specified length.
Syntax
SPACE( numeric-expression )
Description
The SPACE() function returns a string composed of n spaces where the string length n is specified by the
function's argument. SPACE(n) is a synonym for REPEAT(' ', n).
Example
The following example uses the SPACE and CHAR_LENGTH functions to ensure the result is a fixed
length, padded with blank spaces.
SELECT product_name || SPACE(80 - CHAR_LENGTH(product_name))
FROM product_list ORDER BY product_name;
267
SQL Functions
SQRT()
SQRT() Returns the square root of a numeric expression.
Syntax
SQRT( numeric-expression )
Description
The SQRT() function returns the square root of the specified numeric expression.
Example
The following example uses the SQRT and POWER functions to return the distance of a graph point from
the origin.
SELECT location, x, y,
SQRT(POWER(x,2) + POWER(y,2)) AS distance
FROM points ORDER BY location;
268
SQL Functions
SUBSTRING()
SUBSTRING() Returns the specified portion of a string expression.
Syntax
SUBSTRING( string-expression FROM position [FOR length] )
SUBSTRING( string-expression, position [, length] )
Description
The SUBSTRING() function returns a specified portion of the string expression, where position specifies
the starting position of the substring (starting at position 1) and length specifies the maximum length of
the substring. The length of the returned substring is the lower of the remaining characters in the string
expression or the value specified by length.
For example, if the string expression is "ABCDEF" and position is specified as 3, the substring starts with
the character "C". If length is also specified as 3, the return value is "CDE". If, however, the length is
specified as 5, only the remaining four characters "CDEF" are returned.
If length is not specified, the remainder of the string, starting from the specified by position, is returned.
For example, SUBSTRING("ABCDEF",3) and SUBSTRING("ABCDEF"3,4) return the same value.
Example
The following example uses the SUBSTRING function to return the month of the year, which is a VARCHAR column, as a three letter abbreviation.
SELECT event, SUBSTRING(month,1,3), day, year FROM calendar
ORDER BY event ASC;
269
SQL Functions
SUM()
SUM() Returns the sum of a range of numeric column values.
Syntax
SUM( column-expression )
Description
The SUM() function returns the sum of a range of numeric column values. The values being added together
depend on the constraints defined by the WHERE and GROUP BY clauses.
Example
The following example uses the SUM() function to determine how much inventory exists for each product
type in the catalog.
SELECT category, SUM(quantity) AS inventory FROM product_list
GROUP BY category ORDER BY category;
270
SQL Functions
TO_TIMESTAMP()
TO_TIMESTAMP() Converts an integer value to a VoltDB timestamp based on the time unit specified.
Syntax
TO_TIMESTAMP( time-unit, integer-expression )
Description
The TO_TIMESTAMP() function converts an integer expression to a VoltDB timestamp, interpreting the
integer value as the number of specified time units since the POSIX epoch. POSIX time is usually represented as the number of seconds since the epoch; that is, since 00:00.00 on January 1, 1970 Consolidated Universal Time (UTC). So the function TO_TIMESTAMP(Second, timeinsecs) returns the VoltDB
TIMESTAMP equivalent of timeinsecs as a POSIX time value. However, you can also request the integer
value be interpreted as milliseconds or microseconds since the epoch. The valid keywords for specifying
the time units are:
SECOND Seconds since the epoch
MILLISECOND. MILLIS Milliseconds since the epoch
MICROSECOND, MICROS Microseconds since the epoch
You cannot perform arithmetic on timestamps directly. So TO_TIMESTAMP() is useful for converting the
results of arithmetic expressions to VoltDB TIMESTAMP values. For example, the following SQL statement uses TO_TIMESTAMP to convert a POSIX time value before inserting it into a VoltDB TIMESTAMP column:
INSERT INTO Event
(event_id,event_name,event_type, starttime)
VALUES(?,?,?,TO_TIMESTAMP(Second, ?));
The SINCE_EPOCH() function performs the inverse of TO_TIMESTAMP(), by converting a VoltDB
TIMESTAMP to an integer value based on the specified time units.
Example
The following example updates a TIMESTAMP column, adding one hour (in seconds) to the current value
using SINCE_EPOCH() and TO_TIMESTAMP() to perform the conversion and arithmetic:
UPDATE Contest
SET deadline=TO_TIMESTAMP(Second, SINCE_EPOCH(Second,deadline) + 3600)
WHERE expired=1;
271
SQL Functions
TRIM()
TRIM() Returns a string with leading and/or training spaces removed.
Syntax
TRIM( [[ LEADING | TRAILING | BOTH ] [character] FROM] string-expression )
Description
The TRIM() function returns a string with leading and/or trailing spaces removed. By default, the TRIM
function removes spaces from both the beginning and end of the string. If you specify the LEADING or
TRAILING clause, spaces are removed from either the beginning or end of the string only.
You can also specify an alternate character to remove. By default only spaces (UTF-8 character code 32)
are removed. If you specify a different character, only that character will be removed. For example, the
following INSERT statement uses the TRIM function to remove any TAB characters from the beginning
of the string input for the ADDRESS column:
INSERT INTO Customers (first, last, address)
VALUES(?, ?,
TRIM( LEADING CHAR(9) FROM CAST(? AS VARCHAR) )
);
Example
The following example uses TRIM() to remove extraneous leading and trailing spaces from the output for
three VARCHAR columns:
SELECT TRIM(first), TRIM(last), TRIM(address) FROM Customer
ORDER BY last, first;
272
SQL Functions
TRUNCATE()
TRUNCATE() Truncates a VoltDB timestamp to the specified time unit.
Syntax
TRUNCATE( time-unit, timestamp )
Description
The TRUNCATE() function truncates a timestamp value to the specified time unit. For example,
if the timestamp column Apollo has the value July 20, 1969 4:17:40 P.M, then using the function
TRUNCATE(hour,apollo) would return the value July 20, 1969 4:00:00 P.M. Allowable time units for
truncation include the following:
YEAR
QUARTER
MONTH
DAY
HOUR
MINUTE
SECOND
MILLISECOND, MILLIS
Example
The following example uses the TRUNCATE function to find records where the timestamp column, incident, falls within a specific day, entered as a POSIX time value.
SELECT incident, description FROM securitylog
WHERE TRUNCATE(DAY, incident) = TRUNCATE(DAY,FROM_UNIXTIME(?))
ORDER BY incident, description;
273
SQL Functions
UPPER()
UPPER() Returns a string converted to all uppercase characters.
Syntax
UPPER( string-expression )
Description
The UPPER() function returns a copy of the input string converted to all uppercase characters.
Example
The following example uses the UPPER function to return results alphabetically regardless of case.
SELECT UPPER(product_name), product_id
ORDER BY UPPER(product_name)
274
FROM product_list
SQL Functions
VALIDPOLYGONFROMTEXT()
VALIDPOLYGONFROMTEXT() Returns a validated GEOGRAPHY value from the corresponding
WKT
Syntax
VALIDPOLYGONFROMTEXT( text )
Description
The VALIDPOLYGONFROMTEXT() function generates a valid GEOGRAPHY value from a string containing a well known text (WKT) representation of a geographic polygon. If the GEOGRAPHY value
resulting from the WKT string is not a valid representation of a polygon, the function returns an error. The
error message includes an explanation of why the WKT is not valid.
The difference between the POLYGONFROMTEXT() function and the VALIDPOLYGONFROMTEXT() function is that the VALIDPOLYGONFROMTEXT() verifies that the resulting polygon meets
all of the requirements for use by VoltDB. If not, the function returns an error. The POLYGONFROMTEXT() function simply constructs a GEOGRAPHY value without validating all of the requirements of a
VoltDB polygon and may need separate validation (using the ISVALID() function) before it can be used
effectively with other geospatial functions. See the description of the ISVALID() function for a description
of the requirements for a valid polygon.
Example
The following example uses the VALIDPOLYGONFROMTEXT() function to insert a record containing
a GEOGRAPHY column using a text input value containing the WKT representation of a geographic
polygon. Note that if
INSERT INTO city (name, state, boundary) VALUES(?, ?, VALIDPOLYGONFROMTEXT(?));
275
SQL Functions
WEEK(), WEEKOFYEAR()
WEEK(), WEEKOFYEAR() Returns the week of the year as an integer value.
Syntax
WEEK( timestamp-value )
WEEKOFYEAR( timestamp-value )
Description
The WEEK() and WEEKOFYEAR() functions are synonyms and return an integer value between 1 and
52 representing the timestamp's week of the year. These functions produce the same result as using the
WEEK_OF_YEAR keyword with the EXTRACT() fucntion.
Examples
The following example uses the WEEK() function to group and sort records containing a timestamp.
SELECT week(starttime), count(*) as eventsperweek
FROM event GROUP BY week(starttime) ORDER BY week(starttime);
276
SQL Functions
WEEKDAY()
WEEKDAY() Returns the day of the week as an integer between 0 and 6.
Syntax
WEEKDAY( timestamp-value )
Description
The WEEKDAY() function returns an integer value between 0 and 6 representing the day of the week in a
timestamp value. For the WEEKDAY() function, the week starts (0) on Monday and ends (6) on Sunday.
This function is provided for compatibility with MySQL and produces the same result as using the WEEKDAY keyword with the EXTRACT() function.
Examples
The following example uses WEEKDAY() and the DECODE() function to return a string value representing the day of the week for the specified TIMESTAMP value.
SELECT eventtime,
DECODE(WEEKDAY(eventtime),
0, 'Monday',
1, 'Tuesday',
2, 'Wednesday',
3, 'Thursday',
4, 'Friday',
5, 'Saturday',
6, 'Sunday') AS eventday
FROM event ORDER BY eventtime;
277
SQL Functions
YEAR()
YEAR() Returns the year as an integer value.
Syntax
YEAR( timestamp-value )
Description
The YEAR() function returns an integer value representing the year of a TIMESTAMP value. The YEAR()
function produces the same result as using the YEAR keyword with the EXTRACT() function.
Examples
The following example uses the DAY(), MONTH(), and YEAR() functions to return a timestamp column
as a formatted date string.
SELECT
278
csvloader
jdbcloader
kafkaloader
sqlcmd
voltadmin
voltdb
279
csvloader
csvloader Imports the contents of a CSV file and inserts it into a VoltDB table.
Syntax
csvloader table-name [arguments]
csvloader -p procedure-name [arguments]
Description
The csvloader command reads comma-separated values and inserts each valid line of data into the specified
table in a VoltDB database. The most common way to use csvloader is to specify the database table to be
loaded and a CSV file containing the data, like so:
$ csvloader employees -f acme_employees.csv
Alternately, you can use standard input as the source of the data:
$ csvloader employees < acme_employees.csv
In addition to inserting all valid content into the specified database table, csvloader creates three output
files:
Error log The error log provides details concerning any errors that occur while processing the input
file. This includes errors in the format of the input as well as errors that occur attempting the insert into
VoltDB. For example, if two rows contain the same value for a column that is declared as unique, the
error log indicates that the second insert fails due to a constraint violation.
Failed input A separate file contains the contents of each line that failed to load. This file is useful
because it allows you to correct any formatting issues and retry just the failed content, rather than having
to restart and reload the entire table.
Summary report Once all input lines are processed, csvloader generates a summary report listing
how many lines were read, how many were successfully loaded and how long the operation took.
All three files are created, by default, in the current working directory using "csvloader" and the table
name as prefixes. For example, using csvloader to insert contestants into the sample voter database creates
the following files:
csvloader_contestants_insert_log.log
csvloader_contestants_invalidrows.csv
csvloader_contestants_insert_report.log
It is possible to use csvloader to load text files other than CSV files, using the --separator, -quotechar, and --escape flags. Note that csvloader uses Python to process the command line arguments. So to enter certain non-alphanumeric characters, you must use the appropriate escaping mechanism
for Python command lines. For example, to use a tab-delimited file as input, you need to use the --separator flag, escaping the tab character like so:
$ csvloader --separator=$'\t' \
-f employees.tab employees
280
Arguments
--batch {integer}
Specifies the number of rows to submit in a batch. If you do not specify an insert procedure, rows of
input are sent in batches to maximize overall throughput. You can specify how many rows are sent
in each batch using the --batch flag. The default batch size is 200. If you use the --procedure
flag, no batching occurs and each row is sent separately.
--blank {error | null | empty }
Specifies what to do with missing values in the input. By default, if a line contains a missing value,
it is interpreted as a null value in the appropriate datatype. If you do not want missing values to
be interpreted as nulls, you can use the --blank argument to specify other behaviors. Specifying -blank error results in an error if a line contains any missing values and the line is not inserted.
Specifying --blank empty returns the corresponding "empty" value in the appropriate datatype.
An empty value is interpreted as the following:
Zero for all numeric columns
Zero, or the Unix epoch value, for timestamp columns
An empty or zero-length string for VARCHAR and VARBINARY columns
--columnsizelimit {integer}
Specifies the maximum size of quoted column input, in bytes. Mismatched quotation marks in the
input can cause csvloader to read all subsequent input including line breaks as part of the column.
To avoid excessive memory use in this situation, the flag sets a limit on the maximum number of bytes
that will be accepted as input for a column that is enclosed in quotation marks and spans multiple
lines. The default is 16777216 (that is, 16MB).
--escape {character}
Specifies the escape character that must precede a separator or quotation character that is supposed to
be interpreted as a literal character in the CSV input. The default escape character is the backslash (\).
-f, --file {file-specification}
Specifies the location of a CSV file to read as input. If you do not specify an input file, csvloader
reads input from standard input.
--limitrows {integer}
Specifies the maximum number of rows to be read from the input stream. This argument (along with
--skip) lets you load a subset of a larger CSV file.
-m, --maxerrors {integer}
Specifies the target number of errors before csvloader stops processing input. Once csvloader encounters the specified number of errors while trying to insert rows, it will stop reading input and end the
process. Note that, since csvloader performs inserts asynchronously, it often attempts more inserts
before the target number of exceptions are returned from the database. So it is possible more errors
could be returned after the target is met. This argument lets you conditionally stop a large loading
process if more than an acceptable number of errors occur.
--noquotechar
Disables the interpretation of quotation characters in the CSV input. All input other than the separator
character and line break will be treated as literal input characters.
--nowhitespace
Specifies that the CSV input must not contain any whitespace between data values and separators. By
default, csvloader ignores extra space between values, quotation marks, and the value separators. If
281
you use this argument, any input lines containing whitespace will generate an error and not be inserted
into the database.
--password {text}
Specifies the password to use when connecting to the database. You must specify a username and
password if security is enabled for the database. If you specify a username with the --user argument
but not the --password argument, VoltDB prompts for the password. This is useful when writing shell
scripts because it avoids having to hardcode passwords as plain text in the script.
--port {port-number}
Specifies the network port to use when connecting to the database. If you do not specify a port,
csvloader uses the default client port 21212.
-p, --procedure {procedure-name}
Specifies a stored procedure to use for loading each record from the data file. The named procedure
must exist in the database schema and must accept the fields of the data record as input parameters.
By default, csvloader uses a custom procedure to batch multiple rows into a single insert operation.
If you explicitly name a procedure, batching does not occur.
--quotechar {character}
Specifies the quotation character that is used to enclose values. By default, the quotation character is
the double quotation mark (").
-r, --reportdir {directory}
Specifies the directory where csvloader writes the three output files. By default, csvloader writes
output files to the current working directory. This argument lets you redirect output to an alternative
location.
--s, --servers=server-id[,...]
Specifies the network address of one or more nodes of a database cluster. By default, csvloader attempts to insert the CSV data into a database on the local system (localhost). To load data into a remote
database, use the --servers argument to specify the database nodes the loader should connect to.
--separator {charactor}
Specifies the character used to separate individual values in the input. By default, the separator character is the comma (,).
--skip {integer}
Specifies the number of lines from the input stream to skip before inserting rows into the database.
This argument (along with --limitrows) lets you load a subset of a larger CSV file.
--strictquotes
Specifies that all values in the CSV input must be enclosed in quotation marks. If you use this argument, any input lines containing unquoted values will generate an error and not be inserted into the
database.
--update
Specifies that existing records with a matching primary key are updated, rather than being rejected. By
default, csvloader attempts to create new records. The --update flag lets you load updates to existing
records and create new records where the primary key does not already exist. To use --update, the
table must have a primary key.
--user {text}
Specifies the username to use when connecting to the database. You must specify a username and
password if security is enabled for the database.
282
Examples
The following example loads the data from a CSV file, languages.csv, into the helloworld table from
the Hello World example database and redirects the output files to the ./logs subfolder.
$ csvloader helloworld -f languages.csv -r ./logs
The following example performs the same function, providing the input interactively.
$ csvloader helloworld -r ./logs
"Hello", "World", "English"
"Bonjour", "Monde", "French"
"Hola", "Mundo", "Spanish"
"Hej", "Verden", "Danish"
"Ciao", "Mondo", "Italian"
CTRL-D
283
jdbcloader
jdbcloader Extracts a table from another database via JDBC and inserts it into a VoltDB table.
Syntax
jdbcloader table-name [arguments]
jdbcloader -p procedure-name [arguments]
Description
The jdbcloader command uses the JDBC interface to fetch all records from the specified table in a remote
database and then insert those records into a matching table in VoltDB. The most common way to use
jdbcloader is to copy matching tables from another database to VoltDB. In this case, you specify the name
of the table, plus any JDBC-specific arguments that are needed. Usually, the required arguments are the
JDBC connection URL, the source table, the username, password, and local JDBC driver. For example:
$ jdbcloader employees \
--jdbcurl=jdbc:postgresql://remotesvr/corphr \
--jdbctable=employees \
--jdbcuser=charlesdickens \
--jdbcpassword=bleakhouse \
--jdbcdriver=org.postgresql.Driver
In addition to inserting all valid content into the specified database table, jdbcloader creates three output
files:
Error log The error log provides details concerning any errors that occur while processing the input
file. This includes errors that occur attempting the insert into VoltDB. For example, if two rows contain
the same value for a column that is declared as unique, the error log indicates that the second insert fails
due to a constraint violation.
Failed input A separate file contains the contents of each record that failed to load. The records are
stored in CSV (comma-separated value) format. This file is useful because it allows you to correct any
formatting issues and retry just the failed content using the csvloader.
Summary report Once all input records are processed, jdbcloader generates a summary report listing
how many records were read, how many were successfully loaded and how long the operation took.
All three files are created, by default, in the current working directory using "jdbcloader" and the table
name as prefixes. For example, using jdbcloader to insert contestants into the sample voter database creates
the following files:
jdbcloader_contestants_insert_log.log
jdbcloader_contestants_insert_invalidrows.csv
jdbcloader_contestants_insert_report.log
It is possible to use jdbcloader to perform other input operations. For example, if the source table does
not have the same structure as the target table, you can use a custom stored procedure to perform the
necessary translation from one to the other by specifying the procedure name on the command line with
the --procedure flag:
$ jdbcloader --procedure translateEmpRecords \
284
--jdbcurl=jdbc:postgresql://remotesvr/corphr \
--jdbctable=employees \
--jdbcuser=charlesdickens \
--jdbcpassword=bleakhouse \
--jdbcdriver=org.postgresql.Driver
Arguments
--batch {integer}
Specifies the number of rows to submit in a batch to the target VoltDB database. If you do not specify
an insert procedure, rows of input are sent in batches to maximize overall throughput. You can specify
how many rows are sent in each batch using the --batch flag. The default batch size is 200. If you
use the --procedure flag, no batching occurs and each row is sent separately.
--fetchsize {integer}
Specifies the number of records to fetch in each JDBC call to the source database. The default fetch
size is 100 records,
--jdbcdriver {class-name}
Specifies the class name of the JDBC driver to invoke. The driver must exist locally and be accessible
either from the CLASSPATH environment variable or in the lib/extension directory where
VoltDB is installed.
--jdbcpassword {text}
Specifies the password to use when connecting to the source database via JDBC. You must specify a
username and password if security is enabled on the source database.
--jdbctable {table-name}
Specifies the name of source table on the remote database. By default, jdbcloader assumes the source
table has the same name as the target VoltDB table.
--jdbcurl {connection-URL}
Specifies the JDBC connection URL for the source database. This argument is required.
--jdbcuser {text}
Specifies the username to use when connecting to the source database via JDBC. You must specify a
username and password if security is enabled on the source database.
--limitrows {integer}
Specifies the maximum number of rows to be read from the input stream. This argument lets you load
a subset of a remote database table.
-m, --maxerrors {integer}
Specifies the target number of errors before jdbcloader stops processing input. Once jdbcloader encounters the specified number of errors while trying to insert rows, it will stop reading input and end
the process. Note that, since jdbcloader performs inserts asynchronously, it often attempts more inserts
before the target number of exceptions are returned from the database. So it is possible more errors
could be returned after the target is met. This argument lets you conditionally stop a large loading
process if more than an acceptable number of errors occur.
--password {text}
Specifies the password to use when connecting to the database. You must specify a username and
password if security is enabled for the database. If you specify a username with the --user argument
but not the --password argument, VoltDB prompts for the password. This is useful when writing shell
scripts because it avoids having to hardcode passwords as plain text in the script.
285
--port {port-number}
Specifies the network port to use when connecting to the VoltDB database. If you do not specify a
port, jdbcloader uses the default client port 21212.
-p, --procedure {procedure-name}
Specifies a stored procedure to use for loading each record from the input source. The named procedure
must exist in the VoltDB database schema and must accept the fields of the data record as input
parameters. By default, jdbcloader uses a custom procedure to batch multiple rows into a single insert
operation. If you explicitly name a procedure, batching does not occur.
-r, --reportdir {directory}
Specifies the directory where jdbcloader writes the three output files. By default, jdbcloader writes
output files to the current working directory. This argument lets you redirect output to an alternative
location.
--s, --servers=server-id[,...]
Specifies the network address of one or more nodes of a VoltDB cluster. By default, jdbcloader attempts to insert the data into a VoltDB database on the local system (localhost). To load data into a
remote database, use the --servers argument to specify the VoltDB database nodes the loader should
connect to.
--user {text}
Specifies the username to use when connecting to the VoltDB database. You must specify a username
and password if security is enabled on the target database.
Example
The following example loads records from the Products table of the Warehouse database on server
hq.mycompany.com and writes the records into the Products table of the VotlDB database on servers svrA,
svrB, and svrC, using the MySQL JDBC driver to access to source database. Note that the --jdbctable flag
is not needed since the source and target tables have the same name.
$ jdbcloader Products --servers="svrA,svrB,svrC" \
--jdbcurl="jdbc:mysql://hq.mycompany.com/warehouse" \
--jdbcdriver="com.mysql.jdbc.Driver" \
--jdbcuser="ceo" \
--jdbcpassword="headhoncho"
286
kafkaloader
kafkaloader Imports data from a Kafka message queue into the specified database table.
Syntax
kafkaloader table-name [arguments]
Description
The kafkaloader utility loads data from a Kafka message queue and inserts each message as a separate
record into the specified database table. Apache Kafka is a distributed messaging service that lets you set
up message queues which are written to and read from by "producers" and "consumers", respectively. In
the Apache Kafka model, the kafkaloader acts as a "consumer".
When you start the kafkaloader, you must specify at least three arguments:
The database table
The Kafka server to read messages from, specified using the --zookeeper flag
The Kafka "topic" where the messages are stored, specified using the --topic flag
For example:
$ kafkaloader --zookeeper=quesvr:2181 --topic=voltdb_customer customer
Note that Kafka does not impose any specific format on the messages it manages. The format of the
messages are application specific. In the case of kafkaloader, VoltDB assumes the messages are encoded
as standard comma-separated value (CSV) strings, with the values representing the columns of the table
in the order listed in the schema definition. Each Kafka message contains a single row to be inserted into
the database table.
It is also important to note that, unlike the csvloader which reads a static file, the kafkaloader is reading
from a queue where messages can be written at any time, on an ongoing basis. Therefore, the kafkaloader
process does not stop when it reads the last message on the queue; instead it continues to monitor the queue
and process any new messages it receives. The kafkaloader process will continue to read from the queue
until one of the following events occur:
The connection to all of the VoltDB servers is broken and so kafkaloader can no longer access the
VoltDB database.
The maximum number of errors (specified by --maxerrors) is reached.
The user explicitly stops the process.
The kafkaloader will not terminate if it loses its connection to the Kafka zookeeper. Therefore, it is important to monitor the Kafka service and restart the kafkaloader if and when the Kafka service is interrupted.
Finally, kafkaloader acks, or acknowledges, receipt of the messages from Kafka as soon as they are read
from the queue. The messages are then batched for insert into the VoltDB database. This means that the
queue messages are acked regardless of whether they are successfully inserted into the database or not. It
is also possible messages may be lost if the loader process stops between when the messages are read and
the insert transaction is sent to the VoltDB database.
287
Arguments
--batch {integer}
Specifies the number of rows to submit in a batch. By default, rows of input are sent in batches to
maximize overall throughput. You can specify how many rows are sent in each batch using the -batch flag. The default batch size is 200.
Note that --batch and --flush work together. Whichever limit is reached first triggers an insert to the
database.
--flush {integer}
Specifies the maximum number of seconds before pending data is written to the database. The default
flush period is 10 seconds.
If data is inserted into the kafka queue intermittently, there could be a long delay between when data
is read from the queue and when enough records have been read to meet the --batch limit. The
flush value avoids unnecessary delays in this situation by periodically writing all pending data. If the
flush limit is reached, all pending records are written to the database, even if the --batch limit has
not been satisfied.
-m, --maxerrors {integer}
Specifies the target number of input errors before kafkaloader stops processing input. Once
kafkaloader encounters the specified number of errors while trying to insert rows, it will stop reading
input and end the process.
The default maximum error count is 100. Since kafka import can be an persistent process, you can
avoid having input errors cancel ongoing import by setting the maximum error count to zero, which
means that the loader will continue to run no matter how many input errors are generated.
--password {text}
Specifies the password to use when connecting to the database. You must specify a username and
password if security is enabled for the database. If you specify a username with the --user argument
but not the --password argument, VoltDB prompts for the password. This is useful when writing shell
scripts because it avoids having to hardcode passwords as plain text in the script.
--port {port-number}
Specifies the network port to use when connecting to the database. If you do not specify a port,
kafkaloader uses the default client port 21212.
-p, --procedure {procedure-name}
Specifies a stored procedure to use for loading each record from the data file. The named procedure
must exist in the database schema and must accept the fields of the data record as input parameters.
By default, kafkaloader uses a custom procedure to batch multiple rows into a single insert operation.
If you explicitly name a procedure, batching does not occur.
--s, --servers=server-id[,...]
Specifies the network address of one or more nodes of a database cluster. By default, kafkaloader
attempts to insert the data into a database on the local system (localhost). To load data into a remote
database, use the --servers argument to specify the database nodes the loader should connect to.
--update
Specifies that existing records with a matching primary key are updated, rather than being rejected.
By default, kafkaloader attempts to create new records. The --update flag lets you load updates to
existing records and create new records where the primary key does not already exist. To use -update, the table must have a primary key.
288
--user {text}
Specifies the username to use when connecting to the database. You must specify a username and
password if security is enabled for the database.
--zookeeper {kafka-server[:port]}
Specifies the network address of the Kafka Zookeeper instance to connect to. The Kafka service must
be running Kafka 0.8.
Examples
The following example starts the kafkaloader to read messages from the voltdb_customer topic on the
Kafka server quesvr:2181, inserting the resulting records into the CUSTOMER table in the VoltDB cluster
that includes the servers dbsvr1, dbsvr2, and dbsvr3. The process will continue, regardless of errors, until
connection to the VoltDB database is lost or the user explicitly ends the process.
$ kafkaloader --maxerrors=0 customer \
--zookeeper=quesvr:2181 --topic=voltdb_customer
--servers=dbsvr1,dbsvr2,dbsvr3
289
sqlcmd
sqlcmd Starts an interactive command prompt for issuing SQL queries to a running VoltDB database
Syntax
sqlcmd [args...]
Description
The sqlcmd command lets you query a VoltDB database interactively. You can execute SQL statements,
invoke stored procedures, or use directives to examine the structure of the database. When sqlcmd starts
it provides its own command line prompt until you exit the session. When you start the session, you can
optionally specify one or more database servers to access. By default, sqlcmd accesses the database on
the local system via localhost.
At the sqlcmd prompt, you have several options:
SQL queries You can enter ad hoc SQL queries that are run against the database and the results
displayed. You must terminate the query with a semi-colon and carriage return.
Procedure calls You can have sqlcmd execute a stored procedure. You identify a procedure call
with the exec directive, followed by the procedure class name, the procedure parameters, and a closing
semi-colon. For example, the following sqlcmd directive executes the @SystemCatalog system procedure requesting information about the stored procedures.
$ sqlcmd
1> exec @SystemCatalog procedures;
Note that string values can be entered as plain text or enclosed in single quotation marks. Also, the exec
directive must be terminated by a semi-colon.
Show and Explain directives The show and explain directives let you examine the structure of the
schema and user-defined stored procedures. Valid directives are:
SHOW CLASSES Lists the user-defined classes in the database. Classes are grouped into procedures classes (those that can be invoked as a stored procedure) and non-procedure classes (shared
classes that cannot themselves be called as stored procedures but can be invoked from within stored
procedures).
SHOW PROCEDURES Lists the user-defined, default, and system procedures for the current
database, including the type and number of arguments for each.
SHOW TABLES Lists the tables in the schema.
EXPLAIN {sql-query} Displays the execution plan for the specified SQL statement.
EXPLAINPROC {procedure-name} Displays the execution plan for the specified stored procedure.
Class management directives The load classes and remove classes directives let you add and remove
Java classes from the database:
LOAD CLASSES Loads any classes in the specified JAR file. If a class already exists in the database, it is replaced by the new class definition in the JAR file.
290
REMOVE CLASSES Removes any classes that match the specified class name string. The class
specification can include wildcards.
Command recall You can recall previous commands using the up and down arrow keys. Or you
can recall a specific command by line number (the command prompt shows the line number) using the
recall command. For example:
$ sqlcmd
1> select * from votes;
2> show procedures;
3> recall 1
select * from votes;
Once recalled, you can edit the command before reissuing it using typical editing keys, such as the left
and right arrow keys and backspace and delete.
Script files You can run multiple queries or stored procedures in a single command using the file
directive. The file directive takes a text file as an argument and executes all of the SQL queries and exec
directives in the file as if they were entered interactively. Any show, explain, recall, or exit directives
are ignored. For example, the following command processes all of the SQL queries and procedure invocations in the file myscript.sql:
$ sqlcmd
1> file myscript.sql;
If the file contains only data definition language (DDL) statements, you can also have the entire file
processed as a batch by including the -batch argument:
$ sqlcmd
1> file -batch myscript.sql;
If a file or set of statements includes both DDL and DML statements, you can still batch process a
group of DDL statements by enclosing the statements in a file -inlinebatch directive and the
specified end marker. For example, in the following code the three CREATE PROCEDURE statements
are processed as a batch:
load classes myprocs.jar;
file -inlinebatch END_OF_BATCH
CREATE PROCEDURE FROM CLASS procs.AddEmployee;
CREATE PROCEDURE FROM CLASS procs.ChangeDept;
CREATE PROCEDURE FROM CLASS procs.PromoteEmployee;
END_OF_BATCH
Batch processing the DDL statements has two effects:
Batch processing can significantly improve performance since all of the schema changes are
processed and distributed to the cluster nodes at one time, rather than individually for each statement.
The batch operates as a transaction, succeeding or failing as a unit. If any statement fails, all of the
schema changes are rolled back.
Exit When you are done with your interactive session, enter the exit directive to end the session and
return to the shell prompt.
To run a sqlcmd command without starting the interactive prompt, you can pipe the command through
standard input to the sqlcmd command. For example:
291
Arguments
--help
Displays the sqlcmd help text then returns to the shell prompt.
--servers=server-id[,...]
Specifies the network address of one or more nodes in the database cluster. By default, sqlcmd attempts
to connect to a database on localhost.
--port=port-num
Specifies the port number to use when connecting to the database servers. All servers must be using
the same port number. By default, sqlcmd connects to the standard client port (21212).
--user=user-id
Specifies the username to use for authenticating to the database. The username is required if the
database has security enabled.
--password={text}
Specifies the password to use when connecting to the database. You must specify a username and
password if security is enabled for the database. If you specify a username with the --user argument
but not the --password argument, VoltDB prompts for the password. This is useful when writing shell
scripts because it avoids having to hardcode passwords as plain text in the script.
--output-format={csv | fixed | tab}
Specifies the format of the output of query results. Output can be formatted as comma-separated values
(csv), fixed monospaced text (fixed), or tab-separated text fields (tab). By default, the output is in
fixed monospaced text.
--output-skip-metadata
Specifies that the column headings and other metadata associated with query results are not displayed.
By default, the output includes such metadata. However, you can use this argument, along with the
--output-format argument, to write just the data itself to an output file.
--query-timeout=time-limit
Specifies a time limit for read-only queries. Any read-only queries that exceed the time limit are
canceled and control returned to the user. Specify the time out as an integer number of milliseconds.
The default timeout is set in the cluster deployment file (or set to 10 seconds by default, if not set by
the deployment file). Only users with ADMIN privileges can set a sqlcmd timeout longer than the
cluster-wide setting.
Example
The following example demonstrates an sqlcmd session, accessing the voter sample database running on
node zeus.
$ sqlcmd --servers=zeus
SQL Command :: zeus:21212
1> select * from contestants;
1 Edwina Burnam
292
2
3
4
5
6
Tabatha Gehling
Kelly Clauss
Jessie Alloway
Alana Bregman
Jessie Eichman
(6 row(s) affected)
2> select sum(num_votes) as total, contestant_number from
v_votes_by_contestant_number_State group by contestant_number
order by total desc;
TOTAL
CONTESTANT_NUMBER
------- -----------------757240
1
630429
6
442962
5
390353
4
384743
2
375260
3
(6 row(s) affected)
3> exit
$
293
voltadmin
voltadmin Performs administrative functions on a VoltDB database.
Syntax
voltadmin {command} [args...]
Description
The voltadmin command allows you to perform administrative tasks on a VoltDB database. You specify
the database server to access and, optionally, authentication credentials using arguments to the voltadmin
command. Individual administrative commands may have they own unique arguments as well.
Arguments
The following global arguments are available for all voltadmin commands.
-h, --help
Displays information about how to use a command. The --help flag and the help command perform
the same function.
-H, --host=server-id[:port]
Specifies which database server to connect to. You can specify the server as a network address or
hostname. By default, voltadmin attempts to connect to a database on localhost. You can optionally
specify the port number. If you do not specify a port, voltadmin uses the default admin port.
-p, --password={text}
Specifies the password to use when connecting to the database. You must specify a username and
password if security is enabled for the database. If you specify a username with the --user argument
but not the --password argument, VoltDB prompts for the password. This is useful when writing shell
scripts because it avoids having to hardcode passwords as plain text in the script.
-u, --user=user-id
Specifies the username to use for authenticating to the database. The username is required if the
database has security enabled.
-v, -verbose
Displays additional information about the specific commands being executed.
Commands
The following are the administrative functions that you can invoke using voltadmin.
help [command]
Displays information about the usage of individual commands or, if you do not specify a command,
summarizes usage information for all commands. The help command and --help qualifier are synonymous.
dr reset
Resets the database replication (DR) connection on a master database. Performing a reset breaks the
existing DR connection, deletes pending binary logs and stops the queuing of DR data. This command
294
is useful for eliminating unnecessary resource usage on a master database after the replica stops or is
promoted. Note, however, after a reset DR must start over from scratch; it cannot be restarted where
it left off.
pause [--wait]
Pauses the database, stopping any additional activity on the client port. Normally, pause returns immediately. However, you can use the --wait flag to have the command wait until all pending transactions are processed and all database replication (DR) and export queues are flushed. Use of --wait is
recommended if you are shutting down the database and do not intend to restart with recover, since
--wait ensures all associated DR or export data is delivered prior to shutdown.
promote
Promotes a replica database, stopping replication and enabling read/write queries on the client port.
resume
Resumes normal database operation after a pause.
save {directory} {unique-ID}
Creates a snapshot containing the current database contents. The contents are saved to disk on the
server(s) using the unique ID as a file prefix and the directory specification as the file path. Additional
arguments for the save command are:
--format={ csv | native }
Specifies the format of the snapshot files. The allowable formats are CSV (comma-separated
value) and native formats. Native format snapshots can be used for restoring the database. CSV
files can be used by other utilities (such as spreadsheets or the VoltDB CSV loader) but cannot
be restored using the voltadmin restore command.
--blocking
Specifies that the snapshot will block all other transactions until the snapshot is complete. The
advantage of blocking snapshots is that once the command completes you know the snapshot is
finished. The disadvantage is that the snapshot blocks ongoing use of the database.
By default, voltadmin performs non-blocking snapshots so as not to interfere with ongoing database operation. However, note that the non-blocking save command only starts the snapshot. You
must use show snapshots to determine when the snapshot process is finished if you want to know
when it is safe, for example, to shutdown the database.
--skiptables={ table-name [,...] }
Specifies one or more tables to leave out of the snapshot. Separate multiple table names with
commas.
--tables={ table-name [,...] }
Specifies what table(s) to include in the snapshot. Only the specified tables will be included.
Separate multiple table names with commas.
restore {directory} {unique-ID}
Restores the data from a snapshot to the database. The data is read from a snapshot using the same
unique ID and directory path that were used when the snapshot was created. If no tables exist in the
database (that is, no schema has been defined) the restore command will also restore the original
schema, including stored procedure classes, before restoring the data.
show snapshots
Displays information about up to ten previous snapshots. This command is useful for determining the
success or failure of snapshots started with the save command.
295
Example
The following example illustrates one way to perform an orderly shutdown of a VoltDB cluster, including
pausing and saving the database contents.
$ voltadmin pause
$ voltadmin save --blocking ./ mydb
$ voltadmin shutdown
296
voltdb
voltdb Performs management tasks on the current server, such as starting and recovering the database.
Syntax
voltdb collect [args] voltdbroot-directory
voltdb mask [args] source-deployment-file [new-deployment-file]
voltdb create [args]
voltdb recover [args]
voltdb add [args]
voltdb rejoin [args]
Description
The voltdb command performs local management functions on the current system, including:
Starting the database process
Adding or rejoining a node to a running database cluster
Collecting log files into a single compressed file
Hiding passwords in the deployment file
The action that is performed depends on which start action you specify to the voltdb command:
collect the collect option collects system and process logs related to the VoltDB database process
on the current system and compresses them into a single file. This command is helpful when reporting
problems to VoltDB support. The only required argument to the collect command is the path to the
voltdbroot directory where the database was run. By default, the root directory is a subfolder, voltdbroot, in the current working directory where the database was started.
mask the mask option disguises the passwords associated with user accounts in the security section
of the deployment file. The output of the voltdb mask command is either a new deployment file with
hashed passwords or, if you do not specify an output file, the original input file is modified in place.
create the create option starts a new, empty database. This option is useful when starting a database
for the first time or if you are updating the cluster configuration by performing a save, shutdown, startup,
and restore. (See Chapter 9, Using VoltDB in a Cluster for information on updating the cluster.)
recover the recover option starts the database and restores a previous state from the last known
snapshot or from command logs. VoltDB uses the snapshot and command log paths specified in the
deployment file when looking for content to restore. If you specify recover as the startup action and no
snapshots or command logs can be found, startup will fail.
add the add option adds the current node to an existing cluster. See Section 9.2, Updating the Cluster
Configuration for details on elastic scaling.
297
rejoin If a node on a K-safe cluster fails, you can use the rejoin start action to have the node (or a
replacement node) rejoin the cluster. The host-id you specify with the host argument can be any node
still present in the database cluster; it does not have to be the host node specified when the cluster was
started. You can also request a blocking rejoin by including the --blocking flag.
Finally, when creating a new database (create) or recovering an existing database (recover) you can include the --replica flag to create a recipient for database replication.
When starting the database, the voltdb command uses Java to instantiate the process. It is possible to
customize the Java environment, if necessary, by passing command line arguments to Java through the
following environment variables:
LOG4J_CONFIG_PATH Specifies an alternate Log4J configuration file.
VOLTDB_HEAPMAX Specifies the maximum heap size for the Java process. Specify the value
as an integer number of megabytes. By default, the maximum heap size is set to 2048.
VOLTDB_OPTS Specifies all other Java command line arguments. You must include both the
command line flag and argument. For example, this environment variable can be used to specify system
properties using the -D flag:
export VOLTDB_OPTS="-DmyApp.DebugFlag=true"
298
299
cluster are compared so matches of the rightmost name will be avoided first, then matches of the two
rightmost names, and so on.
--ignore=thp
For Linux systems, allows the database to start even if the server is configured to use Transparent
Huge Pages (THP). THP is a known problem for memory-intense applications like VoltDB. So under
normal conditions VoltDB will not start if the use of THP is enabled. This flag allows you to ignore
that restriction for test purposes. Do not use this flag on production systems.
--blocking
For the rejoin operation only, specifies that the database should block client transactions for the affected partitions until the rejoin is complete.
-r, --replica
For the create and recover operations only, specifies that the database starts in read-only mode as
a replica for database replication (DR). To create or recover a replica database, the deployment file
must configure DR appropriately, including a <connection> tag identifying the source, or master
database, for replication. See Chapter 11, Database Replication for more information.
300
--internal=[ip-address:]{port-number}
Specifies the internal port used to communicate between cluster nodes.
--replication=[ip-address:]{port-number}
Specifies the replication port used for database replication. The --replication flag overrides the replication port setting in the deployment file.
--zookeeper=[ip-address:]{port-number}
Specifies the zookeeper port. By default, the zookeeper port is bound to the server's internal interface
(127.0.0.1).
Examples
The first example shows the command for creating a database using a custom configuration file,
2nodedeploy.xml, and the node zeus as the host.
$ voltdb create --deployment=2nodedeploy.xml \
--host=zeus
The second example takes advantage of the defaults for the host and deployment arguments to start a
single-node database.
$ voltdb create
301
302
303
Table E.1, Deployment File Elements and Attributes provides further detail on the elements, including
their relationships (as child or parent) and the allowable attributes for each.
Child of
Parent of
(root element)
admin-mode,
commandlog, cluster, export, heartbeat, httpd,
import, partition-detection, paths, security, snapshot, systemsettings, users
Attributes
cluster*
deployment
hostcount={int}*
sitesperhost={int}
kfactor={int}
admin-mode
deployment
port={int}
adminstartup={true|false}
heartbeat
deployment
timeout={int}*
partition-detection
deployment
snapshot
partition-detection
commandlog
deployment
frequency
commandlog
dr
deployment
connection
dr
export
configuration
property
enabled={true|false}
prefix={text}*
frequency
connection
configuration
export
property
enabled={true|false}
target={text}*
type={file|http|jdbc|kafka|rabbitmq|custom}
exportconnectorclass={class-name}
name={text}*
deployment
configuration
import
property
configuration
httpd
deployment
jsonapi
httpd
paths
deployment
id={int}*
listen={true|false}
port={int}
source={server[,..]}*
deployment
property
enabled={true|false}
synchronous={true|false}
logsize={int}
time={int}
transactions={int}
configuration
import
configuration
snapshot
enabled={true|false}
module={text}
format={csv|tsv}
type={kafka|custom}*
name={text}
jsonapi
port={int}
enabled={true|false}
enabled={true|false}
commandlog, commandlogsnapshot,
304
Element
Child of
Parent of
droverflow,
exportoverflow, snapshots,
voltdbroot
Attributes
paths
path={directory-path}*
commandlogsnapshot paths
path={directory-path}*
droverflow
paths
path={directory-path}*
exportoverflow
paths
path={directory-path}*
snapshots
paths
path={directory-path}*
voltdbroot
paths
path={directory-path}*
security
deployment
enabled={true|false}
provider={hash|kerberos}
snapshot
deployment
frequency={int}{s|m|h}
prefix={text}
retain={int}
enabled={true|false}
systemsettings
deployment
elastic
systemsettings
duration={int}
throughput={int}
query
systemsettings
timeout={int}*
resourcemonitor
systemsettings
disklimit,
rylimit
disklimit
resourcemonitor
feature
feature
diskllimit
name={text}*
size={int[%]}*
memorylimit
systemsettings
size={int[%]}*
snapshot
systemsettings
priority={int}*
temptables
systemsettings
maxsize={int}*
users
deployment
user
users
commandlog
memo- frequency={int}
user
name={text}*
password={text}*
roles={role-name[,..]}
Required
305
Notes
Larger datatypes (short, int, and long) are valid input types. However, VoltDB throws a runtime error if the value exceeds the allowable range of a
TINYINT.
String input must be a properly formatted text
representation of an integer value in the correct
range.
SMALLINT
byte
short
int
long
String
INTEGER
byte
short
int
long
306
SQL Datatype
Notes
String input must be a properly formatted text
representation of an integer value in the correct
range.
BIGINT
byte
short
int
long
String
FLOAT
double
float
byte
short
int
long
String
String input must be a properly formatted text representation of a floating point value.
DECIMAL
BigDecimal
double
float
byte
short
int
long
String
GEOGRAPHY
(none)
GEOGRAPHY_POINT (none)
VARCHAR()
String
byte[]
byte
short
int
long
float
double
BigDecimal
VoltDB TimestampType
String
byte[]
VARBINARY()
307
SQL Datatype
TIMESTAMP
Notes
For String variables, the text must be formatted as
either YYYY-MM-DD hh.mm.ss.nnnnnn or
just the date portion YYYY-MM-DD.
308
@AdHoc
@Explain
@ExplainProc
@GetPartitionKeys
@Pause
@Promote
@Quiesce
@Resume
@Shutdown
@SnapshotDelete
@SnapshotRestore
@SnapshotSave
@SnapshotScan
@SnapshotStatus
@Statistics
@StopNode
@SystemCatalog
@SystemInformation
@UpdateApplicationCatalog
@UpdateClasses
@UpdateLogging
309
System Procedures
@AdHoc
@AdHoc Executes an SQL statement specified at runtime.
Syntax
@AdHoc String SQL-statement
Description
The @AdHoc system procedure lets you perform arbitrary SQL statements on a running VoltDB database.
You can execute multiple SQL statements either queries or data definition language (DDL) statements
in a single call to @AdHoc by separating the individual statements with semicolons. When you do this,
the statements are performed as a single transaction. That is, the statements all succeed as a group or they
all roll back if any of them fail. You cannot mix SQL queries and DDL in a single @AdHoc call.
Performance of ad hoc queries is optimized, where possible. However, it is important to note that ad hoc
queries are not pre-compiled, like queries in stored procedures. Therefore, use of stored procedures is
recommended over @AdHoc for frequent, repetitive, or performance-sensitive queries.
Return Values
Returns one VoltTable for each statement, with as many rows as there are records returned by the statement.
The column names and datatypes match the names and datatypes of the fields returned by the query.
Examples
The following program example uses @AdHoc to execute an SQL SELECT statement and display the
number of reservations for a specific customer in the flight reservation database.
try {
VoltTable[] results = client.callProcedure("@AdHoc",
"SELECT COUNT(*) FROM RESERVATION " +
"WHERE CUSTOMERID=" + custid).getResults();
System.out.printf("%d reservations found.\n",
results[0].fetchRow(0).getLong(0));
}
catch (Exception e) {
e.printStackTrace();
}
Note that you do not need to explicitly invoke @AdHoc when using sqlcmd. You can type your statement
directly into the sqlcmd prompt, like so:
$ sqlcmd
1> SELECT COUNT(*) FROM RESERVATION WHERE CUSTOMERID=12345;
310
System Procedures
@Explain
@Explain Returns the execution plan for the specified SQL query.
Syntax
@Explain String SQL-statement
Description
The @Explain system procedure evaluates the specified SQL query and returns the resulting execution
plan. Execution, or explain, plans describe how VoltDB expects to execute the query at runtime, including
what indexes are used, the order the tables are joined, and so on. Execution plans are useful for identifying
performance issues in query design. See the chapter on execution plans in the VoltDB Guide to Performance and Customization for information on how to interpret the plans.
Return Values
Returns one VoltTable with one row and one column.
Name
Datatype
Description
EXECUTION_PLAN
VARCHAR
Examples
The following program example uses @Explain to evaluate an ad hoc SQL SELECT statement against
the voter sample application.
try {
String query = "SELECT COUNT(*) FROM CONTESTANTS;";
VoltTable[] results = client.callProcedure("@Explain",
query ).getResults();
System.out.printf("Query: %d\nPlan:\n%d",
query, results[0].fetchRow(0).getString(0));
}
catch (Exception e) {
e.printStackTrace();
}
In the sqlcmd utility, the "explain" command is a shortcut for "exec @Explain". So the following two
commands are equivalent:
$ sqlcmd
1> exec @Explain 'SELECT COUNT(*) FROM CONTESTANTS';
2> explain SELECT COUNT(*) FROM CONTESTANTS;
311
System Procedures
@ExplainProc
@ExplainProc Returns the execution plans for all SQL queries in the specified stored procedure.
Syntax
@ExplainProc String procedure-name
Description
The @ExplainProc system procedure returns the execution plans for all of the SQL queries within the specified stored procedure. Execution, or explain, plans describe how VoltDB expects to execute the queries
at runtime, including what indexes are used, the order the tables are joined, and so on. Execution plans
are useful for identifying performance issues in query and stored procedure design. See the chapter on
execution plans in the VoltDB Guide to Performance and Customization for information on how to interpret the plans.
Return Values
Returns one VoltTable with one row for each query in the stored procedure.
Name
Datatype
Description
SQL_STATEMENT
VARCHAR
EXECUTION_PLAN
VARCHAR
Examples
The following example uses @ExplainProc to evaluate the execution plans associated with the ContestantWinningStates stored procedure in the voter sample application.
try {
VoltTable[] results = client.callProcedure("@ExplainProc",
"ContestantWinningStates" ).getResults();
results[0].resetRowPosition();
while (results[0].advanceRow()) {
System.out.printf("Query: %d\nPlan:\n%d",
results[0].getString(0),results[0].getString(1));
}
}
catch (Exception e) {
e.printStackTrace();
}
In the sqlcmd utility, the "explainproc" command is a shortcut for "exec @ExplainProc". So the following
two commands are equivalent:
$ sqlcmd
1> exec @ExplainProc 'ContestantWinningStates';
2> explainproc ContestantWinningStates;
312
System Procedures
@GetPartitionKeys
@GetPartitionKeys Returns a list of partition values, one for every partition in the database.
Syntax
@GetPartitionKeys String datatype
Description
The @GetPartitionKeys system procedure returns a set of partition values that you can use to reach every
partition in the database. This procedure is useful when you want to run a stored procedure in every partition
but you do not want to use a multi-partition procedure. By running multiple single-partition procedures,
you avoid the impact on latency and throughput that can result from a multi-partition procedure. This
is particularly true for longer running procedures. Using multiple, smaller procedures can also help for
queries that modify large volumes of data, such as large deletes.
When you call @GetPartitionKeys you specify the datatype of the keys to return as the second parameter.
You specify the datatype as a case-insensitive string. Valid options are "INTEGER", "STRING", and
"VARCHAR" (where "STRING" and "VARCHAR" are synonyms).
Note that the results of the system procedure are valid at the time they are generated. If the cluster is static
(that is, no nodes are being added and any rebalancing is complete), the results remain valid until the next
elastic event. However, during rebalancing, the distribution of partitions is likely to change. So it is a good
idea to call @GetPartitionKeys once to get the keys, act on them, then call the system procedure again to
verify that the partitions have not changed.
Return Values
Returns one VoltTable with a row for every unique partition in the cluster.
Name
Datatype
Description
PARTITION_ID
INTEGER
PARTITION_KEY
INTEGER or A valid partition key for the partition. The datatype of the
STRING
key matches the type requested in the procedure call.
Examples
The following example shows the use of sqlcmd to get integer key values from @GetPartitionKeys:
$sqlcmd
1> exec @GetPartitionKeys integer;
The next example shows a Java program using @GetPartitionKeys to execute a stored procedure to clear
out old records, one partition at a time.
VoltTable[] results = client.callProcedure("@GetPartitionKeys",
"INTEGER").getResults();
VoltTable keys = results[0];
for (int k=0;k<keys.getRowCount();k++) {
long key = keys.fetchRow(k).getLong(1);
313
System Procedures
client.callProcedure("PurgeOldData", key);
}
314
System Procedures
@Pause
@Pause Initiates read-only mode on the cluster.
Syntax
@Pause
Description
The @Pause system procedure initiates admin mode on the cluster. Admin mode puts the database into
read-only mode and ensures no further changes to the database can be made through the client port when
performing sensitive administrative operations, such as taking a snapshot before shutting down.
While in admin mode, any write transactions on the client port are rejected and return an error status. Readonly transactions, including system procedures, are allowed. However, write transactions such as inserts,
deletes, or schema changes are only allowed through the admin port.
Several important points to consider concerning @Pause are:
@Pause must be called through the admin port, not the standard client port.
Although write transactions on the client port are rejected in admin mode, existing connections from
client applications are not removed.
To return to normal database operation, you must call the system procedure @Resume on the admin port.
Return Values
Returns one VoltTable with one row.
Name
Datatype
Description
STATUS
BIGINT
Examples
It is possible to call @Pause using the sqlcmd utility. However, you must explicitly connect to the admin
port when starting sqlcmd to do this. Also, it is often easier to use the voltadmin utility, which connects
to the admin port by default. For example, the following commands demonstrate pausing and resuming
the database using both sqlcmd and voltadmin:
$ sqlcmd --port=21211
1> exec @Pause;
2> exec @Resume;
$ voltadmin pause
$ voltadmin resume
The following program example, if called through the admin port, initiates admin mode on the database
cluster.
client.callProcedure("@Pause");
315
System Procedures
@Promote
@Promote Promotes a replica database to normal operation.
Syntax
@Promote
Description
The @Promote system procedure promotes a replica database to normal operation. During database replication, the replica database only accepts input from the master database. If, for any reason, the master
database fails and replication stops, you can use @Promote to change the replica database from a replica
to a normal database. When you invoke the @Promote system procedure, the replica exits read-only mode
and becomes a fully operational VoltDB database that can receive and execute both read-only and read/
write queries.
Note that once a database is promoted, it cannot return to its original role as the receiving end of database
replication without first stopping and reinitializing the database as a replica. If the database is not a replica,
invoking @Promote returns an error.
Return Values
Returns one VoltTable with one row.
Name
Datatype
Description
STATUS
BIGINT
Examples
The following programming example promotes a database cluster.
client.callProcedure("@Promote");
It is also possible to promote a replica database using sqlcmd or the voltadmin promote command. The
following commands are equivalent:
$ sqlcmd
1> exec @Promote;
$ voltadmin promote
316
System Procedures
@Quiesce
@Quiesce Waits for all queued export data to be written to the connector.
Syntax
@Quiesce
Description
The @Quiesce system procedure waits for any queued export data to be written to the export connector
before returning to the calling application. @Quiesce also does an fsync to ensure any pending export
overflow is written to disk. This system procedure should be called after stopping client applications and
before calling @Shutdown to ensure that all export activity is concluded before shutting down the database.
If export is not enabled, the procedure returns immediately.
Return Values
Returns one VoltTable with one row.
Name
Datatype
Description
STATUS
BIGINT
Examples
The following example calls @Quiesce using sqlcmd:
$ sqlcmd
1> exec @Quiesce;
The following program example uses drain and @Quiesce to complete any asynchronous transactions and
clear the export queues before shutting down the database.
// Complete all outstanding activities
try {
client.drain();
client.callProcedure("@Quiesce");
}
catch (Exception e) {
e.printStackTrace();
}
// Shutdown the database.
try {
client.callProcedure("@Shutdown");
}
// We expect an exception when the connection drops.
// Report any other exception.
catch (org.voltdb.client.ProcCallException e) { }
catch (Exception e) { e.printStackTrace(); }
317
System Procedures
@Resume
@Resume Returns a paused database to normal operating mode.
Syntax
@Resume
Description
The @Resume system procedure switches all nodes in a database cluster from admin mode to normal
operating mode. In other words, @Resume is the opposite of @Pause.
After calling this procedure, the cluster returns to accepting read/write ad hoc queries and stored procedure
invocations from clients connected to the standard client port.
@Resume must be invoked from a connection to the admin port.
Return Values
Returns one VoltTable with one row.
Name
Datatype
Description
STATUS
BIGINT
Examples
You can call @Resume using the sqlcmd utility. However, you must explicitly connect to the admin port
when starting sqlcmd to do this. It is often easier to use the voltadmin resume command, which connects
to the admin port by default. For example, the following commands are equivalent:
$ sqlcmd --port=21211
1> exec @Resume;
$ voltadmin resume
The following program example uses @Resume to return the cluster to normal operation.
client.callProcedure("@Resume");
318
System Procedures
@Shutdown
@Shutdown Shuts down the database.
Syntax
@Shutdown
Description
The @Shutdown system procedure performs an orderly shut down of a VoltDB database on all nodes of
the cluster.
VoltDB is an in-memory database. By default, data is not saved when you shut down the database. If
you want to save the data between sessions, you can enable command logging or save a snapshot (either
manually or using automated snapshots) before the shutdown. See Chapter 14, Command Logging and
Recovery and Chapter 13, Saving & Restoring a VoltDB Database for more information.
Note that once the database shuts down, the client connection is lost and the calling program cannot make
any further requests to the server.
Examples
The following examples show calling @Shutdown from sqlcmd and using the voltadmin shutdown command. These two commands are equivalent:
$ sqlcmd
1> exec @Shutdown;
$ voltadmin shutdown
The following program example uses @Shutdown to stop the database cluster. Note the use of catch to
separate out a VoltDB call procedure exception (which is expected) from any other exception.
try {
client.callProcedure("@Shutdown");
}
// we expect an exception when the connection drops.
catch (org.voltdb.client.ProcCallException e) {
System.out.println("Database shutdown initiated.");
}
// report any other exception.
catch (Exception e) {
e.printStackTrace();
}
319
System Procedures
@SnapshotDelete
@SnapshotDelete Deletes one or more native snapshots.
Syntax
@SnapshotDelete String[] directory-paths, String[] Unique-IDs
Description
The @SnapshotDelete system procedure deletes native snapshots from the database cluster. This is a cluster-wide operation and a single invocation will remove the snapshot files from all of the nodes.
The procedure takes two parameters: a String array of directory paths and a String array of unique IDs
(prefixes).
The two arrays are read as a series of value pairs, so that the first element of the directory path array and
the first element of the unique ID array will be used to identify the first snapshot to delete. The second
element of each array will identify the second snapshot to delete. And so on.
@SnapshotDelete can delete native format snapshots only. The procedure cannot delete CSV format snapshots.
Return Values
Returns one VoltTable with a row for every snapshot file affected by the operation.
Name
Datatype
Description
HOST_ID
INTEGER
HOSTNAME
STRING
PATH
STRING
NONCE
STRING
NAME
STRING
SIZE
BIGINT
DELETED
STRING
RESULT
STRING
ERR_MSG
STRING
Example
The following example uses @SnapshotScan to identify all of the snapshots in the directory /tmp/voltdb/backup/. This information is then used by @SnapshotDelete to delete those snapshots.
try {
results = client.callProcedure("@SnapshotScan",
320
System Procedures
"/tmp/voltdb/backup/").getResults();
}
catch (Exception e) { e.printStackTrace(); }
VoltTable table = results[0];
int numofsnapshots = table.getRowCount();
int i = 0;
if (numofsnapshots > 0) {
String[] paths = new String[numofsnapshots];
String[] nonces = new String[numofsnapshots];
for (i=0;i<numofsnapshots;i++) { paths[i] = "/etc/voltdb/backup/"; }
table.resetRowPosition();
i = 0;
while (table.advanceRow()) {
nonces[i] = table.getString("NONCE");
i++;
}
try {
client.callProcedure("@SnapshotDelete",paths,nonces);
}
catch (Exception e) { e.printStackTrace(); }
}
321
System Procedures
@SnapshotRestore
@SnapshotRestore Restores a database from disk using a native format snapshot.
Syntax
@SnapshotRestore String directory-path, String unique-ID
Description
The @SnapshotRestore system procedure restores a previously saved database from disk to memory. The
snapshot must be in native format. (You cannot restore a CSV format snapshot using @SnapshotRestore.)
The restore request is propagated to all nodes of the cluster, so a single call to @SnapshotRestore will
restore the entire database cluster.
The first parameter, directory-path, specifies where VoltDB looks for the snapshot files.
The second parameter, unique-ID, is a unique identifier that is used as a filename prefix to distinguish
between multiple snapshots.
You can perform only one restore operation on a running VoltDB database. Subsequent attempts to call
@SnapshotRestore result in an error. Note that this limitation applies to both manual and automated restores. Since command logging often includes snapshots, you should never perform a manual @SnapshotRestore after recovering a database using command logs.
See Chapter 13, Saving & Restoring a VoltDB Database for more information about saving and restoring
VoltDB databases.
Return Values
Returns one VoltTable with a row for every table restored at each execution site.
Name
Datatype
Description
HOST_ID
INTEGER
HOSTNAME
STRING
SITE_ID
INTEGER
TABLE
STRING
PARTITION_ID
INTEGER
The numeric ID for the logical partition that this site represents. When using a K value greater than zero, there are
multiple copies of each logical partition.
RESULT
STRING
ERR_MSG
STRING
Examples
The following example uses @SnapshotRestore to restore previously saved database content from the path
/tmp/voltdb/backup/ using the unique identifier flight.
322
System Procedures
$ sqlcmd
1> exec @SnapshotRestore '/tmp/voltdb/backup/', 'flight';
Alternately, you can use the voltadmin restore command to perform the same function:
$ voltadmin restore /tmp/voltdb/backup/ flight
Since there are a number of situations that impact what data is restored, it is a good idea to review the return
values to see what tables and partitions were affected. In the following program example, the contents of
the VoltTable array is written to standard output so the operator can confirm that the restore completed
as expected.
VoltTable[] results = null;
try {
results = client.callProcedure("@SnapshotRestore",
"/tmp/voltdb/backup/",
"flight").getResults();
}
catch (Exception e) {
e.printStackTrace();
}
for (int t=0; t<results.length; t++) {
VoltTable table = results[t];
for (int r=0;r<table.getRowCount();r++) {
VoltTableRow row = table.fetchRow(r);
System.out.printf("Node %d Site %d restoring " +
"table %s partition %d.\n",
row.getLong("HOST_ID"), row.getLong("SITE_ID"),
row.getString("TABLE"),row.getLong("PARTITION"));
}
}
323
System Procedures
@SnapshotSave
@SnapshotSave Saves the current database contents to disk.
Syntax
@SnapshotSave String directory-path, String unique-ID, Integer blocking-flag
@SnapshotSave String json-encoded-options
Description
The @SnapshotSave system procedure saves the contents of the current in-memory database to disk. Each
node of the database cluster saves its portion of the database locally.
There are two forms of the @SnapshotSave stored procedure: a procedure call with individual argument
parameters and a procedure call with all arguments in a single JSON-encoded string. When you specify the
arguments as individual parameters, VoltDB creates a native mode snapshot that can be used to recover
or restore the database. When you specify the arguments as a JSON-encoded string, you can request a
different format for the snapshot, including CSV (comma-separated value) files that can be used for import
into other databases or utilities.
Individual Arguments
When you specify the arguments as individual parameters, you must specify three arguments:
1. The directory path where the snapshot files are stored
2. An identifier that is included in the file names to uniquely identify the files that make up a single
snapshot
3. A flag value indicating whether the snapshot should block other transactions until it is complete or not
The resulting snapshot consists of multiple files saved to the directory specified by directory-path using
unique-ID as a filename prefix. The third argument, blocking-flag, specifies whether the save is performed
synchronously (thereby blocking any following transactions until the save completes) or asynchronously.
If this parameter is set to any non-zero value, the save operation will block any following transactions. If
it is zero, others transactions will be executed in parallel.
The files created using this invocation are in native VoltDB snapshot format and can be used to restore
or recover the database at some later time. This is the same format used for automatic snapshots. See
Chapter 13, Saving & Restoring a VoltDB Database for more information about saving and restoring
VoltDB databases.
JSON-Encoded Arguments
When you specify the arguments as a JSON-encoded string, you can specify what snapshot format you
want to create. Table G.1, @SnapshotSave Options describes all possible options when creating a snapshot using JSON-encoded arguments.
Description
324
System Procedures
uripath
Specifies the path where the snapshot files are created. Note that, as a JSON-encoded
argument, the path must be specified as a URI, not just a system directory path.
Therefore, a local directory must be specified using the file:// identifier, such
as "file:///tmp", and the path must exist on all nodes of the cluster.
nonce
block
Specifies whether the snapshot should be synchronous (true) and block other transactions or asynchronous (false).
format
Specifies the format of the snapshot. Valid formats are "csv" and "native".
When you save a snapshot in CSV format, the resulting files are in standard comma-separated value format, with only one file for each table. In other words, duplicates (from replicated tables or duplicate partitions due to K-safety) are eliminated.
CSV formatted snapshots are useful for import or reuse by other databases or utilities. However, they cannot be used to restore or recover a VoltDB database.
When you save a snapshot in native format, each node and partition saves its contents
to separate files. These files can then be used to restore or recover the database. It
is also possible to later convert native format snapshots to CSV using the snapshot
utilities described in the VoltDB Administrator's Guide.
skiptables
Specifies tables to leave out of the snapshot. Use of tables or skiptables allows you
to create a partial snapshot of the larger database. Specify the list of tables as a
JSON array. For example, the following JSON argument excludes the Areacode and
Country tables from the snapshot:
"skiptables":["areacode","country"]
tables
Specifies tables to include in the snapshot. Use of tables or skiptables allows you to
create a partial snapshot of the larger database. Specify the list of tables as a JSON
array. For example, the following JSON argument includes only the Employee and
Company tables in the snapshot:
"tables":["employee","company"]
For example, the JSON-encoded arguments to synchronously save a CSV formatted snapshot to /tmp using
the unique identifier "mydb" is the following:
{uripath:"file:///tmp",nonce:"mydb",block:true,format:"csv"}
The block and format arguments are optional. If you do not specify them they default to block:false
and format:"native". The arguments uripath and nonce are required. The tables and skiptables
arguments are mutually exclusive.
Because the unique identifier is used in the resulting filenames, the identifier can contain only characters
that are valid for Linux file names. In addition, hyphens ("-") and commas (",") are not permitted.
Note that it is normal to perform manual saves synchronously, to ensure the snapshot represents a known
state of the database. However, automatic snapshots are performed asynchronously to reduce the impact
on ongoing database activity.
Return Values
The @SnapshotSave system procedure returns two different VoltTables, depending on the outcome of
the request.
325
System Procedures
Option #1: one VoltTable with a row for every execution site. (That is, the number of hosts multiplied
by the number of sites per host.).
Name
Datatype
Description
HOST_ID
INTEGER
HOSTNAME
STRING
SITE_ID
INTEGER
RESULT
STRING
ERR_MSG
STRING
Datatype
Description
HOST_ID
INTEGER
HOSTNAME
STRING
TABLE
STRING
RESULT
STRING
ERR_MSG
STRING
Examples
The following example uses @SnapshotSave to save the current database content in native snapshot format
to the path /tmp/voltdb/backup/ using the unique identifier flight on each node of the cluster.
$ sqlcmd
1> exec @SnapshotSave '/tmp/voltdb/backup/', 'flight', 1;
Alternately, you can use the voltadmin save command to perform the same function. When using the
voltadmin save command, you use the --blocking flag instead of a third parameter to request a blocking save:
$ voltadmin save --blocking /tmp/voltdb/backup/ flight
Note that the procedure call will return successfully even if the save was not entirely successful. The
information returned in the VoltTable array tells you what parts of the operation were successful or not.
For example, save may succeed on one node but not on another.
The following code sample performs the same function, but also checks the return values and notifies the
operator when portions of the save operation are not successful.
VoltTable[] results = null;
try { results = client.callProcedure("@SnapshotSave",
"/tmp/voltdb/backup/",
"flight", 1).getResults(); }
326
System Procedures
327
System Procedures
@SnapshotScan
@SnapshotScan Lists information about existing native snapshots in a given directory path.
Syntax
@SnapshotScan String directory-path
Description
The @SnapshotScan system procedure provides information about any native snapshots that exist within
the specified directory path for all nodes on the cluster. The procedure reports the name (prefix) of the
snapshot, when it was created, how long it took to create, and the size of the individual files that make
up the snapshot(s).
@SnapshotScan does not include CSV format snapshots in its output. Only native format snapshots are
listed.
Return Values
On successful completion, this system procedure returns three VoltTables providing the following information:
A summary of the snapshots found
Available space in the directories scanned
Details concerning the Individual files that make up the snapshots
The first table contains one row for every snapshot found.
Name
Datatype
Description
PATH
STRING
NONCE
STRING
TXNID
BIGINT
CREATED
BIGINT
SIZE
BIGINT
TABLES_REQUIRED
STRING
TABLES_MISSING
STRING
TABLES_INCOMPLETE
STRING
COMPLETE
STRING
328
System Procedures
Name
Datatype
Description
umn is "FALSE", the preceding two columns provide additional information concerning what is missing.
Datatype
Description
HOST_ID
INTEGER
HOSTNAME
STRING
PATH
STRING
TOTAL
BIGINT
FREE
BIGINT
USED
BIGINT
RESULT
STRING
ERR_MSG
STRING
The third table contains one row for every file in the snapshot collection.
Name
Datatype
Description
HOST_ID
INTEGER
HOSTNAME
STRING
PATH
STRING
NAME
STRING
TXNID
BIGINT
CREATED
BIGINT
TABLE
STRING
COMPLETED
STRING
SIZE
BIGINT
IS_REPLICATED
STRING
A string indicating whether the table in question is replicated ("TRUE") or partitioned ("FALSE").
PARTITIONS
STRING
TOTAL_PARTITIONS
BIGINT
READABLE
STRING
RESULT
STRING
329
System Procedures
Name
Datatype
Description
ERR_MSG
STRING
If the system procedure fails because it cannot access the specified path, it returns a single VoltTable with
one row and one column.
Name
Datatype
Description
ERR_MSG
STRING
Examples
The following example uses @SnapshotScan to list information about the snapshots in the directory /
tmp/voltdb/backup/.
$ sqlcmd
1> exec @SnapshotScan /tmp/voltdb/backup/;
The following program example performs the same function, using the VoltTable toString() method
to display the results of the procedure call:
VoltTable[] results = null;
try { results = client.callProcedure("@SnapshotScan",
"/tmp/voltdb/backup/").getResults();
}
catch (Exception e) { e.printStackTrace(); }
for (VoltTable t: results) {
System.out.println(t.toString());
}
In the return value, the first VoltTable in the array lists the snapshots and certain status information. The
second element of the array provides information about the directory itself (such as used, free, and total disk
space). The third element of the array lists specific information about the individual files in the snapshot(s).
330
System Procedures
@SnapshotStatus
@SnapshotStatus Lists information about the most recent snapshots created from the current database.
Syntax
@SnapshotStatus
Description
Warning
The @SnapshotStatus system procedure is being deprecated and may be removed in future versions. Please use the @Statistics "SNAPSHOTSTATUS" selector, which returns the same results,
to retrieve information about recent snapshots.
The @SnapshotStatus system procedure provides information about up to ten of the most recent snapshots
performed on the current database. The information provided includes the directory path and prefix for
the snapshot, when it occurred and how long it took, as well as whether the snapshot was completed
successfully or not.
@SnapshotStatus provides status of any snapshots, including both native and CSV snapshots, as well as
manual, automated, and command log snapshots.
Note that @SnapshotStatus does not tell you whether the snapshot files still exist, only that the snapshot
was performed. You can use the procedure @SnapshotScan to determine what snapshots are available.
Also, the status information is reset each time the database is restarted. In other words, @SnapshotStatus
only provides information about the most recent snapshots since the current database instance was started.
Return Values
Returns one VoltTable with a row for every snapshot file in the recent snapshots performed on the cluster.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
TABLE
STRING
The name of the database table whose data the file contains.
PATH
STRING
FILENAME
STRING
NONCE
STRING
TXNID
BIGINT
START_TIME
BIGINT
END_TIME
BIGINT
331
System Procedures
Name
Datatype
Description
SIZE
BIGINT
DURATION
BIGINT
THROUGHPUT
FLOAT
RESULT
STRING
Examples
The following example uses @SnapshotStatus to display information about the most recent snapshots
performed on the current database:
$ sqlcmd
1> exec @SnapshotStatus;
The following code example demonstrates how to perform the same function programmatically:
VoltTable[] results = null;
try {
results = client.callProcedure("@SnapshotStatus").getResults();
}
catch (Exception e) { e.printStackTrace(); }
for (VoltTable t: results) {
System.out.println(t.toString());
}
332
System Procedures
@Statistics
@Statistics Returns statistics about the usage of the VoltDB database.
Syntax
@Statistics String component, Integer delta-flag
Description
The @Statistics system procedure returns information about the VoltDB database. The second argument,
component, specifies what aspect of VoltDB to return statistics about. The third argument, delta-flag,
specifies whether statistics are reported from when the database started or since the last call to @Statistics
where the flag was set.
If the delta-flag is set to zero, the system procedure returns statistics since the database started. If the deltaflag is non-zero, the system procedure returns statistics for the interval since the last time @Statistics was
called with a non-zero flag. (If @Statistics has not been called with a non-zero flag before, the first call
with the flag set returns statistics since startup.)
Note that in a cluster with K-safety, if a node fails, the statistics reported by this procedure are reset to
zero for the node when it rejoins the cluster.
The following are the allowable values of component:
"COMMANDLOG [335]"
"CPU [336]"
"DRCONSUMER [336]"
"DRPRODUCER [337]"
"IMPORTER [338]"
"INDEX [339]"
333
System Procedures
"INITIATOR [339]"
"IOSTATS [340]"
"LIVECLIENTS [340]"
"MANAGEMENT"
Returns the same information as INDEX [339], INITIATOR [339], IOSTATS [340], MEMORY [341], PROCEDURE [343], and TABLE [346], except all in a single procedure
call.
"MEMORY [341]"
Returns statistics on the use of memory for each node in the cluster.
MEMORY statistics include the current resident set size (RSS) of the
VoltDB server process; the amount of memory used for Java temporary
storage, database tables, indexes, and string (including varbinary) storage; as well as other information.
"PARTITIONCOUNT [342]"
"PLANNER [342]"
"PROCEDURE [343]"
Returns information on the usage of stored procedures for each site within the database cluster sorted by partition. The information includes the
name of the procedure, the number of invocations (for each site), and
selected performance information on minimum, maximum, and average
execution time.
"PROCEDUREINPUT [343]"
"PROCEDUREOUTPUT [344]"
334
System Procedures
"PROCEDUREPROFILE [344]"
Returns summary information on the usage of stored procedures averaged across all partitions in the cluster. The information from PROCEDUREPROFILE is similar to the information from PROCEDURE, except it focuses on the performance of the individual procedures rather
than on procedures by partition. The weighted average across partitions
is helpful for determining which stored procedures the application is
spending most of its time in.
"REBALANCE [345]"
Returns information on the current progress of rebalancing on the cluster. Rebalancing occurs when one or more nodes are added "on the fly"
to an elastic cluster. If no rebalancing is occurring, no data is returned.
During a rebalance, this selector returns information about the speed of
migration of the data, the latency of rebalance tasks, and the estimated
time until completion.
For rebalance, the delta flag to the system procedure is ignored. All rebalance statistics are cumulative for the current rebalance activity.
"SNAPSHOTSTATUS [345]"
Returns information about up to ten of the most recent snapshots performed by the database. The results include the directory path and prefix for the snapshot, when it occurred, how long it took, and whether
the snapshot was completed successfully or not. The results report on
both native and CSV snapshots, as well as manual, automated, and command log snapshots. Note that this selector does not tell you whether the
snapshot files still exist, only that the snapshot was performed. Use the
@SnapshotScan procedure to determine what snapshots are available.
"TABLE [346]"
Note that INITIATOR and PROCEDURE report information on both user-declared stored procedures and
system procedures. These include certain system procedures that are used internally by VoltDB and are
not intended to be called by client applications. Only the system procedures documented in this appendix
are intended for client invocation.
Return Values
Returns different VoltTables depending on which component is requested. The following tables identify
the structure of the return values for each component. (Note that the MANAGEMENT component returns
seven VoltTables.)
COMMANDLOG Returns a row for every server in the cluster.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
OUTSTANDING_BYTES
BIGINT
335
System Procedures
Name
Datatype
Description
yet to be written to disk. For synchronous logging, this value
is always zero.
OUTSTANDING_TXNS
BIGINT
IN_USE_SEGMENT_COUNT
INTEGER
The total number of segment files currently in use for command logging.
SEGMENT_COUNT
INTEGER
FSYNC_INTERVAL
INTEGER
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
PERCENT_USED
BIGINT
DRCONSUMER Returns two VoltTables. The first table returns a row for every host in the cluster,
showing whether a replication snapshot is in progress and if it is, the status of transmission to the replica.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
CLUSTER_ID
INTEGER
STATE
STRING
A text string indicating the current state of replication. Possible values are:
UNINITIALIZED DR has not begun yet or has
stopped
INITIALIZE DR is enabled and the replica is attempting to contact the master
SYNC DR has started and the replica is synchronizing
by receiving snapshots of existing data from the master
RECEIVE DR is underway and the replica is receiving
binary logs from the master
END DR has been canceled for some reason and the
replica is stopping DR
REPLICATION_RATE_1M BIGINT
336
System Procedures
Name
Datatype
REPLICATION_RATE_5M BIGINT
Description
The average rate of replication over the past five minutes.
The data rate is measured in bytes per second.
The second table contains information about the replication streams, which consist of a row per partition
for each server. The data shows the current state of replication and how much data has been received by
the replica.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
CLUSTER_ID
INTEGER
PARTITION_ID
INTEGER
IS_COVERED
STRING
A text string of "true" or "false" indicating whether this partition is currently connected to and receiving data from a
matching partition on the master cluster.
COVERING_HOST
STRING
The host name of the server in the master cluster that is providing DR data to this partition. If IS_COVERED is "false",
this field is empty.
LAST_RECEIVED
_TIMESTAMP
TIMESTAMP
LAST_APPLIED
_TIMESTAMP
TIMESTAMP
DRPRODUCER Returns two VoltTables. The first table contains information about the replication
streams, which consist of a row per partition for each server. The data shows the current state of replication
and how much data is currently queued for the replica.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
PARTITION_ID
INTEGER
STREAMTYPE
STRING
TOTALBYTES
BIGINT
TOTALBYTESIN
MEMORY
BIGINT
TOTALBUFFERS
BIGINT
The total number of buffers in this partition currently waiting for acknowledgement from the replica. The partitions
337
System Procedures
Name
Datatype
Description
buffer the binary logs to reduce overhead and optimize network transfers.
LASTQUEUEDDRID
BIGINT
LASTACKDRID
BIGINT
LASTQUEUEDTIMESTAMP
TIMESTAMP
The timestamp of the last transaction queued for transmission to the replica.
LASTACKTIMESTAMP
TIMESTAMP
ISSYNCED
STRING
A text string indicating whether the database is currently being replicated. If replication has not started, or the overflow
capacity has been exceeded (that is, replication has failed),
the value of ISSYNCED is "false". If replication is currently in progress, the value is "true".
MODE
STRING
The second table returns a row for every host in the cluster, showing whether a replication snapshot is in
progress and if it is, the status of transmission to the replica.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
STATE
STRING
SYNCSNAPSHOTSTATE
STRING
A text string indicating the current state of the synchronization snapshot that begins replication. During normal operation, this value is "NONE" indicating either that replication
is not active or that transactions are actively being replicated. If a synchronization snapshot is in progress, this value
provides additional information about the specific activity
underway.
ROWSINSYNC
SNAPSHOT
BIGINT
ROWSACKEDFORSYNC
SNAPSHOT
BIGINT
IMPORTER Returns a separate row for each import stream and each server.
338
System Procedures
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
SITE_ID
INTEGER
IMPORTER_NAME
STRING
PROCEDURE_NAME
STRING
SUCCESSES
BIGINT
FAILURES
BIGINT
OUTSTANDING_REQUESTS
BIGINT
RETRIES
BIGINT
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
BIGINT
HOSTNAME
STRING
SITE_ID
BIGINT
PARTITION_ID
BIGINT
The numeric ID for the logical partition that this site represents. When using a K value greater than zero, there are
multiple copies of each logical partition.
INDEX_NAME
STRING
TABLE_NAME
STRING
INDEX_TYPE
STRING
IS_UNIQUE
TINYINT
IS_COUNTABLE
TINYINT
ENTRY_COUNT
BIGINT
MEMORY_ESTIMATE
BIGINT
INITIATOR Returns a separate row for each connection and the stored procedures initiated by that
connection.
339
System Procedures
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
SITE_ID
INTEGER
CONNECTION_ID
BIGINT
CONNECTION_HOST
NAME
STRING
The server name of the node from which the client connection originates. In the case of import procedures, the name
of the importer is reported here.
PROCEDURE_NAME
STRING
The name of the stored procedure. If import is enabled, import procedures are included as well.
INVOCATIONS
BIGINT
AVG_EXECUTION_TIME INTEGER
The average length of time (in milliseconds) it took to execute the stored procedure.
MIN_EXECUTION_TIME INTEGER
The minimum length of time (in milliseconds) it took to execute the stored procedure.
MAX_EXECUTION_TIME INTEGER
ABORTS
BIGINT
FAILURES
BIGINT
IOSTATS Returns one row for every client connection on the cluster.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
CONNECTION_ID
BIGINT
CONNECTION_HOST
NAME
STRING
The server name of the node from which the client connection originates.
BYTES_READ
BIGINT
The number of bytes of data sent from the client to the host.
MESSAGES_READ
BIGINT
BYTES_WRITTEN
BIGINT
The number of bytes of data sent from the host to the client.
MESSAGES_WRITTEN
BIGINT
LIVECLIENTS Returns a row for every client connection currently active on the cluster.
340
System Procedures
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
CONNECTION_ID
BIGINT
CLIENT_HOSTNAME
STRING
The server name of the node from which the client connection originates.
ADMIN
TINYINT
OUTSTANDING_
REQUEST_BYTES
BIGINT
OUTSTANDING_
RESPONSE_MESSAGES
BIGINT
OUTSTANDING_
TRANSACTIONS
BIGINT
The number of transactions (that is, stored procedures) initiated on behalf of the client that have yet to be completed.
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
RSS
INTEGER
The current resident set size. That is, the total amount of
memory allocated to the VoltDB processes on the server.
JAVAUSED
INTEGER
JAVAUNUSED
INTEGER
TUPLEDATA
BIGINT
TUPLEALLOCATED
BIGINT
The amount of memory (in kilobytes) allocated for the storage of database records (including free space).
INDEXMEMORY
BIGINT
STRINGMEMORY
BIGINT
TUPLECOUNT
BIGINT
POOLEDMEMORY
BIGINT
341
System Procedures
Name
Datatype
Description
PHYSICALMEMORY
BIGINT
JAVAMAXHEAP
INTEGER
PARTITIONCOUNT Returns one row identifying the total number of partitions and the host that
provided that information.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
PARTITION_COUNT
INTEGER
PLANNER Returns a row for every planner cache. That is, one cache per execution site, plus one
global cache per server. (The global cache is identified by a site and partition ID of minus one.)
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
SITE_ID
INTEGER
PARTITION_ID
INTEGER
The numeric ID for the logical partition that this site represents. When using a K value greater than zero, there are
multiple copies of each logical partition.
CACHE1_LEVEL
INTEGER
CACHE2_LEVEL
INTEGER
CACHE1_HITS
INTEGER
CACHE2_HITS
INTEGER
CACHE_MISSES
INTEGER
PLAN_TIME_MIN
BIGINT
PLAN_TIME_MAX
BIGINT
PLAN_TIME_AVG
BIGINT
The average length of time (in nanoseconds) it took to complete the planning of ad hoc queries.
FAILURES
BIGINT
342
System Procedures
PROCEDURE Returns a row for every stored procedure that has been executed on the cluster, grouped
by execution site.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
SITE_ID
INTEGER
PARTITION_ID
INTEGER
The numeric ID for the logical partition that this site represents. When using a K value greater than zero, there are
multiple copies of each logical partition.
PROCEDURE
STRING
INVOCATIONS
BIGINT
TIMED_INVOCATIONS
BIGINT
MIN_EXECUTION_TIME BIGINT
MAX_EXECUTION_TIME BIGINT
AVG_EXECUTION_TIME BIGINT
The average length of time (in nanoseconds) it took to execute the stored procedure.
MIN_RESULT_SIZE
INTEGER
MAX_RESULT_SIZE
INTEGER
AVG_RESULT_SIZE
INTEGER
MIN_PARAMETER
_SET_SIZE
INTEGER
MAX_PARAMETER
_SET_SIZE
INTEGER
AVG_PARAMETER
_SET_SIZE
INTEGER
ABORTS
BIGINT
FAILURES
BIGINT
PROCEDUREINPUT Returns a row for every stored procedure that has been executed on the cluster,
summarized across the cluster.
Name
Datatype
Description
TIMESTAMP
BIGINT
343
System Procedures
Name
Datatype
Description
PROCEDURE
STRING
WEIGHTED_PERC
BIGINT
A weighted average expressed as a percentage of the parameter set size for invocations of this stored procedure compared to all stored procedure invocations.
INVOCATIONS
BIGINT
MIN_PARAMETER
_SET_SIZE
BIGINT
MAX_PARAMETER
_SET_SIZE
BIGINT
AVG_PARAMETER
_SET_SIZE
BIGINT
TOTAL_PARAMETER
_SET_SIZE_MB
BIGINT
PROCEDUREOUTPUT Returns a row for every stored procedure that has been executed on the
cluster, summarized across the cluster.
Name
Datatype
Description
TIMESTAMP
BIGINT
PROCEDURE
STRING
WEIGHTED_PERC
BIGINT
A weighted average expressed as a percentage of the result set size returned by invocations of this stored procedure
compared to all stored procedure invocations.
INVOCATIONS
BIGINT
MIN_RESULT_SIZE
BIGINT
MAX_RESULT_SIZE
BIGINT
AVG_RESULT_SIZE
BIGINT
TOTAL_RESULT
_SIZE_MB
BIGINT
PROCEDUREPROFILE Returns a row for every stored procedure that has been executed on the
cluster, summarized across the cluster.
Name
Datatype
Description
TIMESTAMP
BIGINT
PROCEDURE
STRING
WEIGHTED_PERC
BIGINT
A weighted average expressed as a percentage of the execution time for this stored procedure compared to all stored
procedure invocations.
INVOCATIONS
BIGINT
AVG
BIGINT
The average length of time (in nanoseconds) it took to execute the stored procedure.
344
System Procedures
Name
Datatype
Description
MIN
BIGINT
MAX
BIGINT
ABORTS
BIGINT
FAILURES
BIGINT
REBALANCE Returns one row if the cluster is rebalancing. No data is returned if the cluster is not
rebalancing.
Warning
The rebalance selector is still under development. The return values are likely to change in upcoming releases.
Name
Datatype
Description
TOTAL_RANGES
BIGINT
PERCENTAGE_MOVED
FLOAT
MOVED_ROWS
BIGINT
ROWS_PER_SECOND
FLOAT
ESTIMATED_REMAINING BIGINT
MEGABYTES_PER_SECOND
FLOAT
CALLS_PER_SECOND
FLOAT
The average number of rebalance work units, or transactions, executed per second.
CALLS_LATENCY
FLOAT
SNAPSHOTSTATUS Returns a row for every snapshot file in the recent snapshots performed on the
cluster.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
INTEGER
HOSTNAME
STRING
TABLE
STRING
The name of the database table whose data the file contains.
PATH
STRING
FILENAME
STRING
NONCE
STRING
TXNID
BIGINT
START_TIME
BIGINT
345
System Procedures
Name
Datatype
Description
END_TIME
BIGINT
SIZE
BIGINT
DURATION
BIGINT
THROUGHPUT
FLOAT
RESULT
STRING
TYPE
STRING
String value indicating how the snapshot was initiated. Possible values are:
AUTO an automated snapshot as defined by the deployment file
COMMANDLOG a command log snapshot
MANUAL a manual snapshot initiated by a user
TABLE Returns a row for every table, per partition. In other words, the number of tables, multiplied
by the number of sites per host and the number of hosts.
Name
Datatype
Description
TIMESTAMP
BIGINT
HOST_ID
BIGINT
HOSTNAME
STRING
SITE_ID
BIGINT
PARTITION_ID
BIGINT
The numeric ID for the logical partition that this site represents. When using a K value greater than zero, there are
multiple copies of each logical partition.
TABLE_NAME
STRING
TABLE_TYPE
STRING
The type of the table. Values returned include "PersistentTable" for normal data tables and views and "StreamedTable" for export-only tables.
TUPLE_COUNT
BIGINT
TUPLE_ALLOCATED
_MEMORY
BIGINT
TUPLE_DATA_MEMORY BIGINT
346
System Procedures
Name
Datatype
Description
STRING_DATA
_MEMORY
BIGINT
TUPLE_LIMIT
INTEGER
The row limit for this table. Row limits are optional and are
defined in the schema as a maximum number of rows that
any partition can contain. If no row limit is set, this value
is null.
PERCENT_FULL
INTEGER
Examples
The following example uses @Statistics to gather information about the distribution of table rows within
the cluster:
$ sqlcmd
1> exec @Statistics TABLE, 0;
The next program example shows a procedure that collects and displays the number of transactions (i.e.
stored procedures) during a given interval, by setting the delta-flag to a non-zero value. By calling this
procedure iteratively (for example, every five minutes), it is possible to identify fluctuations in the database
workload over time (as measured by the number of transactions processed).
void measureWorkload() {
VoltTable[] results = null;
String procName;
int procCount = 0;
int sysprocCount = 0;
try { results = client.callProcedure("@Statistics",
"INITIATOR",1).getResults(); }
catch (Exception e) { e.printStackTrace(); }
for (VoltTable t: results) {
for (int r=0;r<t.getRowCount();r++) {
VoltTableRow row = t.fetchRow(r);
procName = row.getString("PROCEDURE_NAME");
/* Count system procedures separately */
if (procName.substring(0,1).compareTo("@") == 0)
{ sysprocCount += row.getLong("INVOCATIONS"); }
else
{ procCount += row.getLong("INVOCATIONS"); }
}
}
System.out.printf("System procedures: %d\n" +
"User-defined procedures: %d\n",+
sysprocCount,procCount);
}
347
System Procedures
@StopNode
@StopNode Stops a VoltDB server process, removing the node from the cluster.
Syntax
@StopNode Integer host-ID
Description
The @StopNode system procedure lets you stop a specific server in a K-safe cluster. You specify which
node to stop using the host ID, which is the unique identifier for the node assigned by VoltDB when the
server joins the cluster.
Note that by calling the @StopNode procedure on a node other than the node being stopped, you will
receive a return status indicating the success or failure of the call. If you call the procedure on the node that
you are requesting to stop, the return status can only indicate that the call was interrupted (by the VoltDB
process on the node stopping), not whether it was successfully completed or not.
If you call @StopNode on a node or cluster that is not K-safe either because it was started with a Ksafety value of zero or one or more nodes have failed so any further failure could crash the database the
@StopNode procedure will not be executed. You can only stop nodes on a cluster that will remain viable
after the node stops. To stop the entire cluster, please use the @Shutdown system procedure.
Return Values
Returns one VoltTable with one row.
Name
Datatype
Description
STATUS
BIGINT
Examples
The following program example uses grep, sqlcmd, and the @SystemInformation stored procedure to
identify the host ID for a specific node (doodah) of the cluster. The example then uses that host ID (2) to
call @StopNode and stop the desired node.
$ echo "exec @SystemInformation overview;" | sqlcmd | grep "doodah"
2 HOSTNAME
doodah
$ sqlcmd
1> exec @StopNode 2;
The following Java code fragment performs the same function.
try {
results = client.callProcedure("@SystemInformation",
"overview").getResults();
}
catch (Exception e) { e.printStackTrace(); }
VoltTable table = results[0];
348
System Procedures
table.resetRowPosition();
int targetHostID = -1;
while (table.advanceRow() && targetHostId < 0) {
if ( (table.getString("KEY") == "HOSTNAME") &&
(table.getString("VALUE") == targetHostName) ) {
targetHostId = (int) table.getLong("HOST_ID");
}
}
try {
client.callProcedure("@SStopNode",
targetHostId).getResults();
}
catch (Exception e) { e.printStackTrace(); }
349
System Procedures
@SystemCatalog
@SystemCatalog Returns metadata about the database schema.
Syntax
@SystemCatalog String component
Description
The @SystemCatalog system procedure returns information about the schema of the VoltDB database, depending upon the component keyword you specify. The following are the allowable values of component:
"TABLES"
"COLUMNS"
"INDEXINFO"
Returns information about the indexes in the database schema. Note that the
procedure returns information for each column in the index. In other words,
if an index is composed of three columns, the result set will include three
separate entries for the index, one for each column.
"PRIMARYKEYS"
Returns information about the primary keys in the database schema. Note
that the procedure returns information for each column in the primary key.
If an primary key is composed of three columns, the result set will include
three separate entries.
"PROCEDURES"
Returns information about the stored procedures defined in the database, including system procedures.
"PROCEDURECOLUMNS"
Return Values
Returns a different VoltTable for each component. The layout of the VoltTables is designed to match the
corresponding JDBC data structures. Columns are provided for all JDBC properties, but where VoltDB
has no corresponding element the column is unused and a null value is returned.
For the TABLES component, the VoltTable has the following columns:
Name
Datatype
Description
TABLE_CAT
STRING
Unused.
TABLE_SCHEM
STRING
Unused.
TABLE_NAME
STRING
TABLE_TYPE
STRING
REMARKS
STRING
Unused.
TYPE_CAT
STRING
Unused.
350
System Procedures
Name
Datatype
Description
TYPE_SCHEM
STRING
Unused.
TYPE_NAME
STRING
Unused.
SELF_REFERENCING
_COL_NAME
STRING
Unused.
REF_GENERATION
STRING
Unused.
For the COLUMNS component, the VoltTable has the following columns:
Name
Datatype
Description
TABLE_CAT
STRING
Unused.
TABLE_SCHEM
STRING
Unused.
TABLE_NAME
STRING
COLUMN_NAME
STRING
DATA_TYPE
INTEGER
TYPE_NAME
STRING
COLUMN_SIZE
INTEGER
The length of the column in bits, characters, or digits, depending on the datatype.
BUFFER_LENGTH
INTEGER
Unused.
DECIMAL_DIGITS
INTEGER
NUM_PREC_RADIX
INTEGER
Specifies the radix, or numeric base, for calculating the column size. A radix of 2 indicates the column size is measured
in bits while a radix of 10 indicates a measurement in bytes
or digits.
NULLABLE
INTEGER
REMARKS
STRING
Contains the string "PARTITION_COLUMN" if the column is the partitioning key for a partitioned table. Otherwise null.
COLUMN_DEF
STRING
SQL_DATA_TYPE
INTEGER
Unused.
SQL_DATETIME_SUB
INTEGER
Unused.
CHAR_OCTET_LENGTH
INTEGER
For variable length columns (VARCHAR and VARBINARY), the maximum length of the column. Null for all
other datatypes.
ORDINAL_POSITION
INTEGER
IS_NULLABLE
STRING
SCOPE_CATALOG
STRING
Unused.
SCOPE_SCHEMA
STRING
Unused.
SCOPE_TABLE
STRING
Unused.
351
System Procedures
Name
Datatype
Description
SOURCE_DATE_TYPE
SMALLINT Unused.
IS_AUTOINCREMENT
STRING
For the INDEXINFO component, the VoltTable has the following columns:
Name
Datatype
Description
TABLE_CAT
STRING
Unused.
TABLE_SCHEM
STRING
Unused.
TABLE_NAME
STRING
NON_UNIQUE
TINYINT
INDEX_QUALIFIER
STRING
Unused.
INDEX_NAME
STRING
TYPE
ORDINAL_POSITION
COLUMN_NAME
STRING
ASC_OR_DESC
STRING
A string value specifying the sort order of the index. Possible values are "A" for ascending or null for unsorted indexes.
CARDINALITY
INTEGER
Unused.
PAGES
INTEGER
Unused.
FILTER_CONDITION
STRING
Unused.
For the PRIMARYKEYS component, the VoltTable has the following columns:
Name
Datatype
Description
TABLE_CAT
STRING
Unused.
TABLE_SCHEM
STRING
Unused.
TABLE_NAME
STRING
COLUMN_NAME
STRING
KEY_SEQ
SMALLINT An index specifying the position of the column in the primary key, starting at 1.
PK_NAME
STRING
For the PROCEDURES component, the VoltTable has the following columns:
Name
Datatype
Description
PROCEDURE_CAT
STRING
Unused.
PROCEDURE_SCHEM
STRING
Unused.
PROCEDURE_NAME
STRING
RESERVED1
STRING
Unused.
352
System Procedures
Name
Datatype
Description
RESERVED2
STRING
Unused.
RESERVED3
STRING
Unused.
REMARKS
STRING
Unused.
PROCEDURE_TYPE
SPECIFIC_NAME
STRING
Same as PROCEDURE_NAME.
For the PROCEDURECOLUMNS component, the VoltTable has the following columns:
Name
Datatype
Description
PROCEDURE_CAT
STRING
Unused.
PROCEDURE_SCHEM
STRING
Unused.
PROCEDURE_NAME
STRING
COLUMN_NAME
STRING
COLUMN_TYPE
SMALLINT An enumerated value specifying the parameter type. Always returns 1, corresponding to procedureColumnIn.
DATA_TYPE
INTEGER
TYPE_NAME
STRING
PRECISION
INTEGER
LENGTH
INTEGER
SCALE
RADIX
SMALLINT Specifies the radix, or numeric base, for calculating the precision. A radix of 2 indicates the precision is measured in
bits while a radix of 10 indicates a measurement in bytes
or digits.
NULLABLE
SMALLINT Unused.
REMARKS
STRING
If
this
column
contains
the
string
"PARTITION_PARAMETER", the parameter is the partitioning key for a single-partitioned procedure. If the column
contains the string "ARRAY_PARAMETER" the parameter is a native Java array. Otherwise this column is null.
COLUMN_DEF
STRING
Unused.
SQL_DATA_TYPE
INTEGER
Unused.
SQL_DATETIME_SUB
INTEGER
Unused.
CHAR_OCTET_LENGTH
INTEGER
For variable length columns (VARCHAR and VARBINARY), the maximum length of the column. Null for all
other datatypes.
ORDINAL_POSITION
INTEGER
353
System Procedures
Name
Datatype
Description
IS_NULLABLE
STRING
Unused.
SPECIFIC_NAME
STRING
Same as COLUMN_NAME
Examples
The following example calls @SystemCatalog to list the stored procedures in the active database schema:
$ sqlcmd
1> exec @SystemCatalog procedures;
The next program example uses @SystemCatalog to display information about the tables in the database
schema.
VoltTable[] results = null;
try {
results = client.callProcedure("@SystemCatalog",
"TABLES").getResults();
System.out.println("Information about the database schema:");
for (VoltTable node : results) System.out.println(node.toString());
}
catch (Exception e) {
e.printStackTrace();
}
354
System Procedures
@SystemInformation
@SystemInformation Returns configuration information about VoltDB and the individual nodes of the
database cluster.
Syntax
@SystemInformation
@SystemInformation String component
Description
The @SystemInformation system procedure returns information about the configuration of the VoltDB
database or the individual nodes of the database cluster, depending upon the component keyword you
specify. The following are the allowable values of component:
"DEPLOYMENT"
Returns information about the configuration of the database. In particular, this keyword returns information about the various features and settings enabled through the
deployment file, such as export, snapshots, K-safety, and so on. These properties are
returned in a single VoltTable of name/value pairs.
"OVERVIEW"
Returns information about the individual servers in the database cluster, including the
host name, the IP address, the version of VoltDB running on the server, as well as
the path to the deployment file in use. The overview also includes entries for the start
time of the server and length of time the server has been running.
If you do not specify a component, @SystemInformation returns the results of the OVERVIEW component
(to provide compatibility with previous versions of the procedure).
Return Values
Returns one of two VoltTables depending upon which component is requested.
For the DEPLOYMENT component, the VoltTable has the columns specified in the following table.
Name
Datatype
Description
PROPERTY
STRING
VALUE
STRING
For the OVERVIEW component, information is reported for each server in the cluster, so an additional
column is provided identifying the host node.
Name
Datatype
Description
HOST_ID
INTEGER
KEY
STRING
VALUE
STRING
355
System Procedures
Examples
The first example displays information about the individual servers in the database cluster:
$ sqlcmd
1> exec @SystemInformation overview;
The following program example uses @SystemInformation to display information about the nodes in the
cluster and then about the database itself.
VoltTable[] results = null;
try {
results = client.callProcedure("@SystemInformation",
"OVERVIEW").getResults();
System.out.println("Information about the database cluster:");
for (VoltTable node : results) System.out.println(node.toString());
results = client.callProcedure("@SystemInformation",
"DEPLOYMENT").getResults();
System.out.println("Information about the database deployment:");
for (VoltTable node : results) System.out.println(node.toString());
}
catch (Exception e) {
e.printStackTrace();
}
356
System Procedures
@UpdateApplicationCatalog
@UpdateApplicationCatalog Reconfigures the database by replacing the application catalog and/or
deployment configuration.
Syntax
@UpdateApplicationCatalog byte[] catalog, String deployment
Description
The @UpdateApplicationCatalog system procedure lets you modify the configuration of a running database without having to shutdown and restart.
Note
The @UpdateApplicationCatalog system procedure is primarily for updating the deployment
configuration. Updating an application catalog is only supported for databases that were started
from a catalog and with the deployment setting schema="catalog". In general, updating the
database schema interactively is recommended and use of application catalogs is being phased
out.
@UpdateApplicationCatalog supports the following changes to the database:
Add, remove, or modify stored procedures
Add, remove, or modify database tables and columns
Add, remove, or modify indexes (except where new constraints are introduced)
Add or remove views and export-only tables
Modify the security permissions in the database schema
@UpdateApplicationCatalog supports the following changes to the deployment file:
Modify the security settings in the database configuration
Modify the settings for automated snapshots (whether they are enabled or not, their frequency, location,
prefix, and number retained)
Modify the export settings
In general, you can make any changes to the database schema as long as there is no data in the tables.
However, if there is data in a table, the following changes are not allowed:
Changing the partitioning of the table
Changing columns to NOT NULL
Reducing the datatype size of a column (for example, from INTEGER to SMALLINT) or changing to
an incompatible datatype (for example, from VARCHAR to INTEGER)
Adding or broadening constraints, such as indexes and primary keys, that could conflict with existing
data in the table
357
System Procedures
The arguments to the system procedure are a byte array containing the contents of the new catalog jar and
a string containing the contents of the deployment file. That is, you pass the actual contents of the catalog
and deployment files, using a byte array for the binary catalog and a string for the text deployment file.
You can use null for either argument to change just the catalog or the deployment.
The new catalog and the deployment file must not contain any changes other than the allowed modifications
listed above. Currently, if there are any other changes from the original catalog and deployment file (such
as changes to the export configuration or to the configuration of the cluster), the procedure returns an error
indicating that an incompatible change has been found.
If you call @UpdateApplicationCatalog on a master database while database replication (DR) is active, the
DR process automatically communicates any changes to the application catalog to the replica database to
keep the two databases in sync. However, any changes to the deployment file apply to the master database
only. To change the deployment settings on a replica database, you must stop and restart the replica (and
database replication) using an updated deployment file.
To simplify the process of encoding the catalog contents, the Java client interface includes two helper
methods (one synchronous and one asynchronous) to encode the files and issue the stored procedure request:
ClientResponse client.updateApplicationCatalog( File catalog-file, File deployment-file)
ClientResponse client.updateApplicationCatalog( clientCallback callback, File catalog-file, File
deployment-file)
Similarly, the sqlcmd utility interprets both arguments as filenames.
Examples
The following example uses sqlcmd to update the application catalog using the files mycatalog.jar
and mydeploy.xml:
$ sqlcmd
1> exec @UpdateApplicationCatalog mycatalog.jar, mydeploy.xml;
An alternative is to use the voltadmin update command. In which case, the following command performs
the same function as the preceding sqlcmd example:
$ voltadmin update mycatalog.jar mydeploy.xml
The following program example uses the @UpdateApplicationCatalog procedure to update the current database catalog, using the catalog at project/newcatalog.jar and configuration file at
project/production.xml.
String newcat = "project/newcatalog.jar";
String newdeploy = "project/production.xml";
try {
File file = new File(newcat);
FileInputStream fin = new FileInputStream(file);
byte[] catalog = new byte[(int)file.length()];
fin.read(catalog);
fin.close();
file = new File(newdeploy);
fin = new FileInputStream(file);
358
System Procedures
359
System Procedures
@UpdateClasses
@UpdateClasses Adds and removes Java classes from the database.
Syntax
@UpdateClasses byte[] JAR-file, String class-selector
Description
The @UpdateClasses system procedure performs two functions:
Loads into the database any Java classes in the JAR file passed as the first parameter
Removes any classes matching the class selector string passed as the second parameter
You need to compile and pack your stored procedure classes into a JAR file and load them into the database using @UpdateClasses before entering the CREATE PROCEDURE and PARTITION PROCEDURE
statements that define those classes as VoltDB stored procedures. Note that, for interactive use, the sqlcmd
utility has two directives, load classes and remove classes, that perform these actions in separate steps.
To remove classes, you specify the class names in the second parameter, the class selector. You can include
multiple class selectors using a comma-separated list. You can also use Ant-style wildcards in the class
specification to identify multiple classes. For example, the following command deletes all classes that are
children of org.mycompany.utils as well as *.DebugHandler:
sqlcmd
1> exec @UpdateClasses NULL "org.mycompany.utils.*,*.DebugHandler";
You can also use the @UpdateClasses system procedure to include reusable code that is accessed by
multiple stored procedures. Any classes and methods called by stored procedures must follow the same
rules for deterministic behavior that stored procedures follow, as described in Section 5.1.2, VoltDB
Stored Procedures are Deterministic.
However, use of @UpdateClasses is not recommended for large, established libraries of classes used by
stored procedures. For larger, static libraries that do not need to be modified on the fly, the preferred
approach is to include the code by placing JAR files in the /lib directory where VoltDB is installed on
the database servers.
Examples
The following example compiles and packs Java stored procedures into the file myapp.jar. The example
then uses @UpdateCLasses to load the classes from the JAR file, then defines and partitions a stored
procedure based on the uploaded classes.
$ javac -cp "/opt/voltdb/voltdb/*" -d obj src/myapp/*.java
$ jar cvf myapp.jar -C obj .
$ sqlcmd
1> exec @UpdateClasses myapp.jar "";
2> CREATE PROCEDURE FROM CLASS myapp.procedures.AddCustomer;
3> PARTITION PROCEDURE AddCustomer ON TABLE Customer COLUMN CustomerID;
360
System Procedures
The second example removes the class added and defined in the preceding example. Note that you must
drop the procedure definition first; you cannot delete classes that are referenced by defined stored procedures.
$ sqlcmd
1> DROP PROCEDURE AddCustomer;
2> exec @UpdateClasses NULL "myapp.procedures.AddCustomer";
As an alternative, the loading and removing of classes can be performed using native sqlcmd directives
load classes and remove classes. So the previous tasks can be performed using the following commands:
$ sqlcmd
1> load classes myapp.jar "";
2> CREATE PROCEDURE FROM CLASS myapp.procedures.AddCustomer;
3> PARTITION PROCEDURE AddCustomer ON TABLE Customer COLUMN CustomerID;
1> DROP PROCEDURE AddCustomer;
2> remove classes "myapp.procedures.AddCustomer";
361
System Procedures
@UpdateLogging
@UpdateLogging Changes the logging configuration for a running database.
Syntax
@UpdateLogging CString configuration
Description
The @UpdateLogging system procedure lets you change the logging configuration for VoltDB. The second
argument, configuration, is a text string containing the Log4J XML configuration definition.
Return Values
Returns one VoltTable with one row.
Name
Datatype
Description
STATUS
BIGINT
Examples
It is possible to use sqlcmd to update the logging configuration. However, the argument is interpreted as
raw XML content rather than as a file specification. Consequently, it can be difficult to use interactively.
But you can write the file contents to an input file and then pipe that to sqlcmd, like so:
$
$
$
$
The following program example demonstrates another way to update the logging, using the contents of an
XML file (identified by the string xmlfilename).
try {
Scanner scan = new Scanner(new File(xmlfilename));
scan.useDelimiter("\\Z");
String content = scan.next();
client.callProcedure("@UpdateLogging",content);
}
catch (Exception e) {
e.printStackTrace();
}
362