MonetDB Server Reference Manual
MonetDB Server Reference Manual
MonetDB Server Reference Manual
Version 5.0
Table of Contents
1 General Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Intended Audience . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 How to read this manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.3 Features and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.1 When to consider MonetDB ? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3.2 When not to consider MonetDB ? . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.3 What are key features of MonetDB . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3.4 Size Limitations for MonetDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 A Brief History of MonetDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.5 Manual Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.5.1 Conventions and Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.5.2 Additional Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6 Downloads and Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.6.1 Developers Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.6.2 Experts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.7 How To Start with MonetDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.8 The Suite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.9 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.10 Space Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.11 Getting the Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.12 CVS checkout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.13 Bootstrap, Configure and Make . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.14 Bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.15 Configure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.16 Configure defaults and recommendations . . . . . . . . . . . . . . . . . . . . . 12
1.17 Make . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.18 Testing the Build . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.19 Install . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.20 Testing the Installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.21 Usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.22 Troubleshooting. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.23 Reporting Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.24 Building MonetDB On Windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.25 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.26 buildtools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.27 MonetDB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.28 clients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.29 MonetDB4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.30 MonetDB5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.31 sql . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.32 pathfinder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.33 java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.34 geom . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
ii
1.35 testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.36 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.37 CVS (Concurrent Version System) . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.38 Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.39 Python . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.40 Bison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.41 Flex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.42 Pthreads . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.43 Diff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.44 PsKill . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.45 PCRE (Perl Compatible Regular Expressions) . . . . . . . . . . . . . . . . 19
1.46 OpenSSL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.47 libxml2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.48 geos (Geometry Engine Open Souce) . . . . . . . . . . . . . . . . . . . . . . . . . 20
1.49 Optional Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.50 iconv . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.51 zlib . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
1.52 Perl . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.53 PHP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.54 SWIG (Simplified Wrapper and Interface Generator) . . . . . . . . . 22
1.55 Java . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
1.56 Apache Ant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.57 Build Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.58 Placement of Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.59 Build Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.60 Environment Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
1.61 Compiler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.62 Internal Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.63 PATH and PYTHONPATH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.64 Compilation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.65 Building and Installing Buildtools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.66 Building and Installing the Other Components . . . . . . . . . . . . . . . 26
1.67 Building Installers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.67.1 Daily Builds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.67.1.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.67.1.2 Portability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.68 Development Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
1.68.1 Server Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.68.2 SQL Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
1.68.3 Embedded MonetDB Roadmap . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.69 MonetDB Version 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.70 Design Considerations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
1.71 Architecture Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
1.72 MonetDB Assembly Language (MAL) . . . . . . . . . . . . . . . . . . . . . . . . 33
1.73 Execution Engine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
1.74 Session Scenarios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.75 Scenario management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
1.76 Server Management . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
iii
2 Client Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1 The Mapi Client Utility . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
2.1.1 Online help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
2.2 Jdbc Client . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
1 General Introduction
The MonetDB reference manual serves as the primary entry point to locate information on
its functionality, system architecture, services and best practices on using its components.
The manual is produced from a Texinfo framework file, which collects and organizes
bits-and-pieces of information scattered around the many source components comprising
the MonetDB software family. The Texinfo file is turned into a HTML browse-able version
using makeinfo program. The PDF version can be produced using pdflatex. Alternative
formats, e.g., XML and DocBook format, can be readily obtained from the Texinfo file.
The copyright(2008) on the MonetDB software, documentation and logo is owned by
CWI. Other trademarks and copyrights referred to in this manual are the property of their
respective owners.
A subscription to the mailing list helps the developer team to justify their hours put into
MonetDB’s development and maintenance.
target was to aim for better support of scientific databases with their then archaic file
structures.
The Data Distilleries Era [1996-2003] The datamining projects running as of 1993 called
for better database support. It culminated in the spin-off Data Distilleries, which based
their analytical customer relationship suite on the power provided by the early MonetDB
implementations. In the years following, many technical innovations were paired with strong
industrial maturing of the software base. Data Distilleries became a subsidiary of SPSS in
2003 and its development activity was shifted to Chicago in 2007.
The Open-Source Challenge [2003-2007] Moving MonetDB Version 4 into the open-
source field required a large number of extensions to the code base. It became utmost
important to support a mature implementation of the SQL-99 standard, and the bulk of
application programming interfaces (PHP,JDBC,Perl,ODBC). The result of this activity
was the first official release in 2004 and the release of the XQuery front-end in 2005. The
XQuery code generator used grew out of a student summer project ("milprint summer") and
proved that scalable, high-performance XQuery processing on a relation DBMS is possible.
The Road Ahead [2008- This manual describes the MonetDB Version 5 release, the result
of a multi-year activity to clean up the software stack and to better support both simple
and complex database requests.
The Future New versions in the MonetDB software family are under development. Ex-
tensions and renovation of the kernel are studied in the X100 project. Its Volcano-style
interpreter aims to provide performance in I/O-dominant and streaming settings using vec-
torized processing and Just-In-Time (de)compression.
The scene of distributed database is (again) addressed in the Armada project, but not
using the traditional centralized administration focus. Instead the Armada project seeks the
frontiers of autonomous database systems, which still provide a coherent functional view to
its users. In its approach it challenges many dogmas in distributed database technology, such
as the perspective on global consistency, the role of the client in managing the distributed
world, and the way resources are spread.
The MonetDB software framework provides a rich setting to pursue these alleys of data-
base research. We hope that many may benefit from our investments, both research and
business wise.
The Experts distribution is meant for MonetDB kernel software developers only. They
should have a clear understanding of Linux development tools, e.g. automake, config,
CVS, and team-based software development and the interdependencies of the MonetDB
components.
If you encounter errors during the installation, please have a look at the MonetDB
mailing list for common errors and some advice on how to proceed.
1.6.2 Experts
The experts may want more control than provided by the developer distribution support.
Set up of a fully functional system requires downloading and installation of the latest pack-
ages from SourceForge. The compatibility table below illustrates the packages in the CVS
repository.
MonetDB/SQL MonetDB/XQueryClients JDBC & XR-
PCwrapper
java x
buildtools x x x
clients x x x
MonetDB x x
MonetDB4 x
MonetDB5 x
SQL x
Pathfinder x
Thanks to the GNU autoconf and automake tools, the MonetDB software runs on a
wide variety of hardware/software platforms. The MonetDB development team uses many
of the platforms available to perform automated nightly regression testing. For more details
see The Test Web.
The MonetDB code base -with daily builds available for users preferring living at the
edge- evolves quickly. Application developers, however, may tune into the MonetDB mailing
list to be warned when a major release has become available, or when detected errors require
a patch.
Chapter 1: General Introduction 8
1.9 Prerequisites
CVS You only need this if you are building from CVS. If you start with the source
distribution from SourceForge you don’t need CVS.
You need to have a working CVS. For instructions, see the SourceForge docu-
mentation and look under the heading CVS Instructions.
Chapter 1: General Introduction 9
Python MonetDB uses Python (version 2.0.0 or better) during configuration of the soft-
ware. See http://www.python.org/ for more information. (It must be admitted,
version 2.0.0 is ancient and has not recently been tested, we currently use 2.4
and newer.)
autoconf/automake/libtool
MonetDB uses GNU autoconf (>= 2.57) and automake (>= 1.5) during the
Bootstrap phase, and libtool (>= 1.4) during the Make phase. autoconf and
automake are not needed when you start with the source distribution.
The following are not needed when you start with the source distribution:
• a C++ compiler (e.g. GNU’s g++);
• a lexical analyzer generator (e.g., lex or flex);
• a parser generator (e.g., yacc or bison).
The following are optional. They are checked for during configuration and if
they are missing, the feature is just missing:
• swig
• perl
• php
libxml2 The XML parsing library libxml2 is only used by XML/XQuery (pathfinder).
The library is used for:
1. the XML Schema import feature of the Pathfinder compiler, and
2. the XML document loader (runtime/shredder.mx).
The following instructions first describe how to check out the source code from the CVS
repository on SourceForge; in case you downloaded the pre-packaged source distribution,
you can skip this section and proceed to Configure and Make.
In case you checked out the CVS version, you have to run bootstrap first; in case you
downloaded the pre-packaged source distribution, you should skip bootstrap and start with
configure (see Configure).
For each of the packages do all the following steps (bootstrap, configure, make, make
install) before proceeding to the next package.
1.14 Bootstrap
This step is only needed when building from CVS.
In the top-level directory of the package type the command (note that this uses
autogen.py which is part of the buildtools package — make sure it can be found in your
$PATH):
./bootstrap
1.15 Configure
Then in any directory (preferably a new, empty directory and not in the MonetDB top-level
directory) give the command:
.../configure [<options>]
where ... is replaced with the (absolute or relative) path to the MonetDB top-level
directory.
The directory where you execute configure is the place where all intermediate source
and object files are generated during compilation via make.
By default, MonetDB is installed in /usr/local. To choose another target directory,
you need to call
.../configure --prefix=<prefixdir> [<options>]
Some other useful configure options are:
--enable-debug
enable full debugging default=[see Configure defaults and recommendations
below]
--enable-optimize
enable extra optimization default=[see Configure defaults and recommenda-
tions below]
--enable-assert
enable assertions in the code default=[see Configure defaults and recommenda-
tions below]
--enable-strict
enable strict compiler flags default=[see Configure defaults and recommenda-
tions below]
--enable-warning
enable extended compiler warnings default=off
--enable-profile
enable profiling default=off
Chapter 1: General Introduction 12
--enable-instrument
enable instrument default=off
--with-mx=<Mx>
which Mx binary to use (default: whichever Mx is found in your PATH)
--with-mel=<mel>
which mel binary to use (default: whichever mel is found in your PATH)
--enable-bits=<#bits>
specify number of bits (32 or 64) default is compiler default
--enable-oid32
use 32-bit OIDs on 64-bit systems default=off
You can also add options such as CC=<compiler> to specify the compiler and compiler
flags to use.
Use configure --help to find out more about configure options.
The --with-mx and --with-mel options are only used when configuring the sources as
retrieved through CVS.
1.17 Make
In the same directory (where you called configure) give the command
make
to compile the source code. Please note that parallel make runs (e.g. make -j2) are
currently known to be unsuccessful.
Chapter 1: General Introduction 13
1.19 Install
Give the command
make install
By default (if no --prefix option was given to configure above), this will install in
/usr/local. Make sure you have appropriate privileges.
1.21 Usage
The MonetDB4 and MonetDB5 engines can be used interactively or as a server. The XQuery
and SQL back-ends can only be used as servers.
To run MonetDB4 interactively, just run:
Mserver
To run MonetDB5 interactively, just run:
mserver5
The disadvantage of running the systems interactively is that you don’t get readline
support (if available on your system). A more pleasant environment can be had by using
the system as a server and using mclient to interact with the system. For MonetDB4 use:
Chapter 1: General Introduction 14
1.22 Troubleshooting
bootstrap fails if any of the requisite programs cannot be found or is an incompatible
version.
bootstrap adds files to the source directory, so it must have write permissions.
During bootstrap, warnings like
Remember to add ‘AC_PROG_LIBTOOL’ to ‘configure.in’.
You should add the contents of ‘/usr/share/aclocal/libtool.m4’ to ‘aclocal.m4’.
configure.in:37: warning: do not use m4_patsubst: use patsubst or m4_bpatsubst
configure.in:104: warning: AC_PROG_LEX invoked multiple times
configure.in:334: warning: do not use m4_regexp: use regexp or m4_bregexp
Chapter 1: General Introduction 15
and type y when asked whether to remove the listed files. This will remove all the files
that were created during bootstrap. Only do this with sources obtained through CVS.
4. In the top-level source directory, re-run:
./bootstrap
Only do this with sources obtained through CVS.
5. In the build-directory, re-run:
configure
as described above.
6. In the build-directory, re-run:
make
make install
as described above.
1.25 Introduction
The MonetDB suite of programs consists of a number of components which we will describe
briefly here. The general rule is that the components should be compiled and installed in
the order given here, although some components can be compiled and installed in a different
order. Unless you know the inter-component dependencies, it is better to stick to this order.
Also note that before the next component is built, the previous ones need to be installed.
The section names are the names of the CVS modules on SourceForge.
1.26 buildtools
The buildtools component is required in order to build the sources from the CVS repository.
If you get the pre-packaged sources (i.e. the one in tar balls), you don’t need the buildtools
component (although this has not been tested on Windows).
Chapter 1: General Introduction 17
1.27 MonetDB
Also known as the MonetDB Common component contains the database kernel, i.e. the
heart of MonetDB, and some generally useful libraries. This component is required.
1.28 clients
Also known as the MonetDB Client component contains a library which forms the basis for
communicating with the MonetDB server components, and some interface programs that
use this library to communicate with the server. This component is required.
1.29 MonetDB4
The deprecated (but still used) database server MonetDB4 Server. This component is still
required for the MonetDB XQuery (pathfinder) component. This is the old server which uses
MIL (the MonetDB Interface Language) as programming interface. This component is only
required if you need MIL or if you need the MonetDB XQuery component. This component
also works with the MonetDB SQL component, but that is not officially supported anymore
(it does work, however).
1.30 MonetDB5
The MonetDB5 Server component is the new database server. It uses MAL (the MonetDB
Algebra Language) as programming interface. This component is required if you need MAL
or if you need the MonetDB SQL component.
1.31 sql
Also known as MonetDB SQL, this component provides an SQL frontend to MonetDB4
and MonetDB5 (the former is deprecated). This component is required if you need SQL
support.
1.32 pathfinder
Also known as MonetDB XQuery, this component provides an XQuery query engine on top
of a relational database. You can store XML documents in the database and query these
documents using XQuery. This component is required if you need XML/XQuery support.
1.33 java
Also known as MonetDB Java, this component provides both the MonetDB JDBC driver
and the XRPC wrapper. This component is optional.
1.34 geom
The geom component provides a module for the MonetDB SQL frontend. This component
is optional.
1.35 testing
The testing component contains some files and programs we use for testing the MonetDB
suite. This component is optional.
Chapter 1: General Introduction 18
1.36 Prerequisites
In order to compile the MonetDB suite of programs, several other programs and libraries
need to be installed. Some further programs and libraries can be optionally installed to
enable optional features. The required programs and libraries are listed in this section, the
following section lists the optional programs and libraries.
1.38 Compiler
The suite can be compiled using one of the following compilers:
• Microsoft Visual Studio .NET 2003 (also known as Microsoft Visual Studio 7);
• Microsoft Visual Studio 2005 (also known as Microsoft Visual Studio 8);
• Intel(R) C++ Compiler 9.1 (which actually needs one of the above);
• Intel(R) C++ Compiler 10.1 (which also needs one of the Microsoft compilers).
Note that the pathfinder component can currently not be compiled with any of the
Microsoft compilers. It can be compiled with the Intel compiler.
Not supported anymore (but probably still possible) are the GNU C Compiler gcc under
Cygwin. Using that, it (probably still) is possible to build a version that runs using the
Cygwin DLLs, but also a version that uses the MinGW (Minimalist GNU for Windows)
package. This is not supported and not further described here.
1.39 Python
Python is needed for creating the configuration files that the compiler uses to determine
which files to compile. Python can be downloaded from http://www.python.org/. Just
download and install the Windows binary distribution.
On Windows64 you can use either the 32-bit or 64-bit version of Python.
1.40 Bison
Bison is a reimplementation of YACC (Yet Another Compiler Compiler), a program to
convert a grammar into working code.
A version of Bison for Windows can be gotten from the GnuWin32 project at
http://gnuwin32.sourceforge.net/. Click on the Download Packages link on the left and
then on Bison, and get the Setup file and install it.
1.41 Flex
Flex is a fast lexical analyzer generator.
A version of Flex for Windows can be gotten from the GnuWin32 project at
http://gnuwin32.sourceforge.net/. Click on the Download Packages link on the left and
then on Flex, and get the Setup file and install it.
Chapter 1: General Introduction 19
1.42 Pthreads
Get a Windows port of pthreads from ftp://sources.redhat.com/pub/pthreads-win32/. You
can download the latest pthreads-*-release.exe which is a self-extracting archive. Extract
it, and move or copy the contents of the Pre-built.2 folder to C:\Pthreads (so that you end
up with folders C:\Pthreads\lib and C:\Pthreads\include).
On Windows64, in a command interpreter, run nmake clean VC in the extracted
pthreads.2 folder with the Visual Studio environment set to the appropriate values, e.g.
by executing the command Open Visual Studio 2005 x64 Win64 Command Prompt. Then
copy the files pthreadVC2.dll and pthreadVC2.lib to C:\Pthreads\lib.
1.43 Diff
Diff is a program to compare two versions of a file and list the differences. This program
is not used during the build process, but only during testing. As such it is not a strict
prerequisite.
A version of Diff for Windows can be gotten from the GnuWin32 project at
http://gnuwin32.sourceforge.net/. Click on the Download Packages link on the left and
then on DiffUtils (note the name), and get the Setup file and install it.
1.44 PsKill
PsKill is a program to kill (terminate) processes. This program is only used during testing
to terminate tests that take too long.
PsKill is part of the Windows Sysinternals. Go to the Process Utilities, and get the
PsKill package. PsKill is also part of the PsTools package and the Sysinternals Suite, so
you can get those instead. Extract the archive, and make sure that the folder is in your
Path variable when you run the tests.
1.46 OpenSSL
The OpenSSL library is used during authentication of a MonetDB client program with the
MonetDB server. The OpenSSL library is required for the MonetDB5 component.
Download the source from http://www.openssl.org/. We used the latest stable version
(0.9.8k). Follow the instructions in the file INSTALL.W32 or INSTALL.W64.
1.47 libxml2
Libxml2 is the XML C parser and toolkit of Gnome.
This library is only a prerequisite for the pathfinder component.
The home of the library is http://xmlsoft.org/. But Windows binaries can be gotten
from http://www.zlatkovic.com/libxml.en.html. Click on Win32 Binaries on the right, and
download libxml2, iconv, and zlib. Install these in e.g. C:\.
Note that we hit a bug in version 2.6.31 of libxml2. See the bugreport. Use version
2.6.30 or 2.6.32.
On Windows64 you will have to compile libxml2 yourself (with its optional prerequisites
iconv and zlib, for which see below).
Edit the file win32\Makefile.msvc and change the one occurrence of zdll.lib to
zlib1.lib, and then run the following commands in the win32 subdirectory, substitut-
ing the correct locations for the iconv and zlib libraries:
cscript configure.js compiler=msvc prefix=C:\libxml2-2.6.30.win64 ^
include=C:\iconv-1.11.win64\include;C:\zlib-1.2.3.win64\include ^
lib=C:\iconv-1.11.win64\lib;C:\zlib-1.2.3.win64\lib iconv=yes zlib=yes
nmake /f Makefile.msvc
nmake /f Makefile.msvc install
After this, you may want to move the file libxml2.dll from the lib directory to the
bin directory.
mkdir C:\geos-3.0.win32
mkdir C:\geos-3.0.win32\lib
mkdir C:\geos-3.0.win32\bin
mkdir C:\geos-3.0.win32\include
mkdir C:\geos-3.0.win32\include\geos
copy geos_c_i.lib C:\geos-3.0.win32\lib
copy geos_c.dll C:\geos-3.0.win32\bin
copy headers C:\geos-3.0.win32\include
copy headers\geos C:\geos-3.0.win32\include\geos
copy ..\capi\geos_c.h C:\geos-3.0.win32\include
1.50 iconv
Iconv is a program and library to convert between different character encodings. We only
use the library.
The home of the program and library is http://www.gnu.org/software/libiconv/,
but Windows binaries can be gotten from the same site as the libxml2 library:
http://www.zlatkovic.com/libxml.en.html. Click on Win32 Binaries on the right, and
download iconv. Install in e.g. C:\.
On Windows64 you will have to compile iconv yourself. Get the source from the iconv
website and extract somewhere. Edit the file config.h.msvc and add the line:
#define EXEEXT ".exe"
Edit the file srclib\Makefile.msvc and add width.obj to the OBJECTS variable and
add:
width.obj: width.c; $(CC) $(INCLUDES) $(CFLAGS) -c width.c
to the file. Create a file windows\stdint.h with the contents:
typedef unsigned char uint8_t;
typedef unsigned short uint16_t;
typedef unsigned long uint32_t;
Create an empty file windows\unistd.h. Then build using the commands:
nmake -f Makefile.msvc NO_NLS=1 DLL=1 MFLAGS=-MD PREFIX=C:\iconv-1.11.win64
nmake -f Makefile.msvc NO_NLS=1 DLL=1 MFLAGS=-MD PREFIX=C:\iconv-1.11.win64 install
Fix the ICONV definitions in MonetDB\NT\winrules.msc so that they refer to the location
where you installed the library and call nmake with the extra parameter HAVE_ICONV=1.
1.51 zlib
Zlib is a compression library which is optionally used by both MonetDB and the iconv
library. The home of zlib is http://www.zlib.net/, but Windows binaries can be gotten
from the same site as the libxml2 library: http://www.zlatkovic.com/libxml.en.html. Click
on Win32 Binaries on the right, and download zlib. Install in e.g. C:\.
On Windows64 you will have to compile zlib yourself. Get the source from
the zlib website and extract somewhere. Open the Visual Studio 6 project file
Chapter 1: General Introduction 22
1.52 Perl
Perl is only needed to create an interface that can be used from a Perl program to commu-
nicate with a MonetDB server.
We have used ActiveState’s ActivePerl distribution (release 5.10.0.1003). Just install
the 32 or 64 bit version and compile the clients component with the additional nmake flags
HAVE_PERL=1 HAVE_PERL_DEVEL=1 HAVE_PERL_SWIG=1 (the latter flag only if SWIG is also
installed).
1.53 PHP
PHP is only needed to create an interface that can be used from a PHP program to com-
municate with a MonetDB server.
Download the Windows installer and source package of PHP 5 from
http://www.php.net/. Install the binary package and extract the sources some-
where (e.g. as a subdirectory of the binary installation).
In order to get MonetDB to compile with these sources a few changes had to be made
to the sources:
• In the file Zend\zend.h, move the line
#include <stdio.h>
down until just after the block where zend_config.h is included.
• In the file main\php_network.h, delete the line
#include "arpa/inet.h"
We have no support yet for Windows64.
1.55 Java
If you want to build the java component of the MonetDB suite, you need Java. Get Java
from http://java.sun.com/, but make sure you do not get the latest version. Get the Java
Chapter 1: General Introduction 23
Development Kit 1.5. Our current JDBC driver is not compatible with Java 1.6 yet, and
the XRPC wrapper is not compatible with Java 1.4 or older.
In addition to the Java Development Kit, you will also need Apache Ant which is re-
sponsible for the actual building of the driver.
Optionally:
• sql (requires MonetDB4 or MonetDB5–MonetDB5 is recommended)
• pathfinder (requires MonetDB4)
Apart from buildtools, all packages contain a subfolder NT which contains a few Windows-
specific source files, and which is the directory in which the Windows version is built. (On
Unix/Linux we recommend to build in a new directory which is not part of the source tree,
but on Windows we haven’t made this separation.)
1.61 Compiler
Make sure that the environment variables that your chosen compiler needs are set. A
convenient way of doing that is to use the batch files that are provided by the compilers:
• Microsoft Visual Studio .NET 2003 (also known as Microsoft Visual Studio 7):
call "%ProgramFiles%\Microsoft Visual Studio .NET 2003\Common7\Tools\vsvars32.bat"
• Microsoft Visual Studio 2005 (also known as Microsoft Visual Studio 8):
call "%ProgramFiles%\Microsoft Visual Studio 8\Common7\Tools\vsvars32.bat"
• Intel(R) C++ Compiler 10.1.013:
call "C:%ProgramFiles%\Intel\Compiler\C++\10.1.013\IA32\Bin\iclvars.bat"
When using the Intel compiler, you also need to set the CC and CXX variables:
set CC=icl -Qstd=c99 -GR- -Qsafeseh-
set CXX=icl -Qstd=c99 -GR- -Qsafeseh-
(These are the values for the 10.1 version, for 9.1 replace -Qstd=c99 with -Qc99.)
set PYTHONPATH=%CLIENTS_PREFIX%\share\MonetDB\python;%PYTHONPATH%
set PYTHONPATH=%MONETDB_PREFIX%\share\MonetDB\python;%PYTHONPATH%
Chapter 1: General Introduction 26
set PYTHONPATH=%SQL_PREFIX%\share\MonetDB\python;%PYTHONPATH%
1.64 Compilation
section of the registry. The code can be fixed by editing the generated installer (.msi file)
using e.g. the program orca from Microsoft. Open the installer in orca and locate the
table RegLocator. In the Type column, change the value from 2 to 18 and save the file.
Alternatively, use the following Python script to fix the .msi file:
# Fix a .msi (Windows Installer) file for a 64-bit registry search.
# Microsoft refuses to fix a bug in Visual Studio so that for a 64-bit
# build, the registry search will look in the 32-bit part of the
# registry instead of the 64-bit part of the registry. This script
# fixes the .msi to look in the correct part.
import msilib
import sys
import glob
def fixmsi(f):
db = msilib.OpenDatabase(f, msilib.MSIDBOPEN_DIRECT)
v = db.OpenView(’UPDATE RegLocator SET Type = 18 WHERE Type = 2’)
v.Execute(None)
v.Close()
db.Commit()
if __name__ == ’__main__’:
for f in sys.argv[1:]:
for g in glob.glob(f):
fixmsi(g)
1.67.1.1 Stability
With a (code-wise) complex system like MonetDB, modifying the source code — be it for
fixing bugs or for adding new features — always bears the risk of breaking or at least
altering some existing functionality. To facilitate the task of detecting such changes, small
test scripts together with their respective correct/expected ("stable") output are collected
within the CVS repository of MonetDB. Given the complexity of MonetDB, there is no
way to do anything close to "exhaustive" testing, hence, the idea is to continuously extend
the test collection. E.g., each developer should add some tests as soon as she/he adds
new functionality. Likewise, a test script should be added for each bug report to monitor
whether/when the bug is fixed, and to prevent (or at least detect) future occurrences of
the same bug. The collection consists for hundreds of test scripts, each covering many
micro-functionality tests.
To run all the tests and compare their current output to their stable output, a tool
called Mtest is included in the MonetDB code base. Mtest recursively walks through the
source tree, runs tests, and checks for difference between the stable and the current output.
As a result, Mtest creates the web interface that allows convenient access to the differences
encountered during testing. Each developer is supposed to run "Mtest" (respectively "make
check") on his/her favorite development platform and check the results before checking in
her/his changes. During the automatic daily tests, "make check" and "Mtest" are run on all
testing platforms and the TestWeb is generated to provide convenient access to the results.
1.67.1.2 Portability
Though Fedora Linux on AMD Athlon PC’s is our main development platform at CWI, we
do not limit our attention to this single platform. Supporting a broad range of hardware
and software platforms is an important concern.
Using standard configuration tools like automake, autoconf, and libtool, we have the
same code base compiling not only on various flavors of Unix (e.g., Linux, Cygwin, AIX,
IRIX, Solaris, MacOS X) but also on native Windows. Furthermore, the very code base
compiles with a wide spectrum of (C-) compilers, ranging from GNU’s gcc over several
native Unix compilers (IBM, SGI, Sun, Intel, Portland Group) to Microsoft’s Visual Studio
and Visual Studio .NET on Windows.
On the hardware side, we have (had) MonetDB running on "almost anything" from a
Intel StrongARM-based Linux PDA with 64 MB of flash memory to an SGI Origin2000
with 32 MIPS R12k CPU’s and a total of 64 GB of (shared) main memory.
depends too much on the available resources and urgency (= pressure) by our research needs
and clients.
• Cursor based processing, because the execution engine is not based on the iterator
model deployed in other engines. A simulation of the cursor based scheme would be
utterly expensive from a performance point of view.
• Multi-level transaction isolation levels. Coarse grain isolation is provided using table
level locks.
Such a simple characterization ignores the wide-spread differences that can be experi-
enced at each level. To illustrate, in D) and R) it makes a big difference whether the data
is already in the cache or still on disk. With E) it makes a big difference whether you are
comparing two integers, evaluation of a mathematical function, e.g., Gaussian, or a regular
expression evaluation on a string. As a result, intense optimization in one area may become
completely invisible due to being overshadowed by other cost factors.
The Version 5 infrastructure is designed to ease addressing each of these cost factors in
a well-defined way, while retaining the flexibility to combine the components needed for a
particular situation. It results in an architecture where you assemble the components for a
particular application domain and hardware platform.
The primary interface to the database kernel is still based on the exchange of text in the
form of queries and simply formatted results. This interface is designed for ease of inter-
pretation, versatility and is flexible to accommodate system debugging and application tool
development. Although a textual interface potentially leads to a performance degradation,
our experience with earlier system versions showed that the overhead can be kept within
acceptable bounds. Moreover, a textual interface reduces the programming effort otherwise
needed to develop test and application programs. The XML trend as the language for tool
interaction supports our decision.
The top layer consists of applications written in your favorite language. They provide
both specific functionality for a particular product, e.g., Proximity, and generic functional-
ity, e.g., the Aquabrowser or Dbvisualizer. The applications communicate with the server
using de-facto standard interface packaged, i.e., JDBC, ODBC, Perl, PHP, etc.
The middle layer consists of query language processors such as SQL and XQuery. The
former supports the core functionality of SQL’99 and extends into SQL’03. The latter
is based on the W3C standard and includes the XUpdate functionality. The query lan-
Chapter 1: General Introduction 33
guage processors each manage their own private catalog structure. Software bridges, e.g.,
import/export routines, are used to share data between language paradigms.
mguardian
monetdb
JDBC−PHP−PERL−PYTHON−ODBC−MAPI
RDF
MAL interpreter
compiler
XQuery
GDK layer
Interfaces
mserver5
compiler
SQL
Figure 2.1
_28 := bat.setWriteMode(_19);
bat.append(_28,_27,true);
...
MAL supports the full breath of computational paradigms deployed in a database set-
ting. It is language framework where the execution semantics is determined by the code
transformations and the final engine choosen.
The design and implementation of MAL takes the functionality offered previously a
significant step further. To name a few:
• All instructions are strongly typed before being executed.
• It supports polymorphic functions. They act as templates that produce strongly typed
instantiations when needed.
• Function style expressions where each assignment instruction can receive multiple tar-
get results; it forms a point in the dataflow graph.
• It supports co-routines (Factories) to build streaming applications.
• Properties are associated with the program code for ease of optimization and scheduling.
• It can be readily extended with user defined types and function modules.
The building blocks of scenarios are routines obeying a strict name signature. They
require exclusive access to the client record. Any specific information should be accessible
from there, e.g., access to a scenario specific state descriptor. The client scenario initializa-
tion and finalization brackets are xyzinitClient() and xyzexitClient().
The xyzparser(Client c) contains the parser for language XYZ and should fill the
MAL program block associated with the client record. The latter may have been initial-
ized with variables. Each language parser may require a catalog with information on the
translation of language specific datastructures into their BAT equivalent.
The xyzoptimizer(Client c) contains language specific optimizations using the MAL
intermediate code as a starting point.
The xyztactics(Client c) synchronizes the program execution with the state of the
machine, e.g., claiming resources, the history of the client or alignment of the request with
concurrent actions (e.g., transaction coordination).
The xyzengine(Client c) contains the applicable back-end engine. The default is the
MAL interpreter, which provides good balance between speed and ability to analysis its
behavior.
This report is helpful to determine possible instabilities and heavy loaded servers. In
this case, it indicates that our database exists, but that no server is running yet.
shell> monetdb start demo
starting database ’demo’... done
shell> monetdb status demo
name state uptime health last crash
demo running 1m 18s 100%, 0s -
You (the database administrator) can now establish a connection using any of the user
interfaces. The most common one is mclient, which provides a light-weight textual inter-
face. For example, the statements below illustrate a short session. The session is closed
using the mclient console command \q.
shell> mclient -lsql --database=demo
sql>CREATE USER "voc" WITH PASSWORD ’voc’ NAME ’VOC Explorer’ SCHEMA "sys";
sql>CREATE SCHEMA "voc" AUTHORIZATION "voc";
sql>ALTER USER "voc" SET SCHEMA "voc";
sql>\q
See for a more complete session VOC demo.
Once in a while a database should be closed for maintenance. This operation should be
issued with care, because it affects running application. The first step is to block clients to
establish new connections using the command:
shell> monetdb lock demo
The effect is that only the system administrator can gain access to the server. All other
users are warned using the message ’Database temporarily unavailable for maintenance’
upon an attempt to connect.
Step two is to connect as system administrator and inspect the state of all clients con-
nections.
shell> mclient -lsql --database=demo
sql>select * from clients;
If all seem dormant, the server can be shut down. More details are given in the next
section.
After maintenance has been completed, the database server can be opened for connec-
tions using ’monetdb release demo’.
For more details on merovingian and monetdb inspect their manual pages.
Chapter 1: General Introduction 38
physically separated from the database store itself, e.g. on different disks. The second line
of defense is to regularly create a database dump or full checkpoint. This is a consolidated
snapshot and should be stored away at a failure independent location, e.g. a vault. Since
a dump is a rather expensive operation, the third line of defense is to keep differential lists
from the last dump based on the update logs. It forms a basis to rollback to a known correct
state.
We are working on this topic
At the moment the best way to make a checkpoint is to make a database dump while
the database is under maintenance. Use the monetdb utility to lock the database before
dumping its contents.
possible. It is highly advised to stick to the Mapi interaction protocol. It gives a little more
protection against malicious behavior or unintended side-effects.
#include <embeddedclient.h>
#include <stdlib.h>
int
main()
{
Mapi dbh;
MapiHdl hdl = NULL;
int i;
if ((hdl = mapi_query(dbh, "create table emp (name varchar(20), age int)")) == NULL
mapi_error(dbh))
die(dbh, hdl);
if (mapi_close_handle(hdl) != MOK)
die(dbh, hdl);
i = 0;
while (mapi_fetch_row(hdl)) {
Chapter 1: General Introduction 41
i = i + atoi(age);
}
if (mapi_error(dbh))
die(dbh, hdl);
if (mapi_close_handle(hdl) != MOK)
die(dbh, hdl);
printf("The footprint is %d Mb \n", i);
mapi_disconnect(dbh);
return 0;
}
The embedded MonetDB engine is available as the library libembedded sql.a (and
libembedded mal.a) to be linked with a C-program. Provided the programming environ-
ment have been initialized properly, it suffices to prepare the embedded application using
gcc -g myprog.c -o myprog \
‘monetdb-sql-config --cflags --libs‘ \
‘monetdb-clients-config --cflags --libs‘ \
‘monetdb-config --cflags --libs‘ \
‘monetdb5-config --cflags --libs‘ \
-lMapi -lembeddedsql5
The configuration parameters for the server are read from its default location in the
file system. In an embedded setting this location may not be accessible. It requires calls
to mo add option() before you asks for the instantiation of the server code itself. The
code snippet below illustrate how our example is given hardwired knowledge on the desired
settings:
main(){
opt *set = NULL;
int setlen = 0;
...
if (!(setlen = mo_builtin_settings(&set)))
usage(prog);
...
/* needed to prevent the MonetDB config file from being used */
setlen = mo_add_option(&set, setlen, opt_config, "dbfarm", ".");
setlen = mo_add_option(&set, setlen, opt_config, "dbname", "demo");
...
setlen = mo_system_config(&set, setlen);
mid = embedded_mal(set, setlen);
For a complete picture see the sample program in the distribution.
in use. Therefore it makes sense to experiment with a minimal, but functionally complete
application to decide if the resources limitations are obeyed.
The minimal static footprint of MonetDB is about 16 Mb (+ ca 4Mb for SQL). After
module loading the space quickly grows to about 60Mb. This footprint should be reduced.
The embedded application world calls for many, highly specialized enhancements. It is
often well worth the effort to carve out the functionality needed from the MonetDB software
packages. The easiest solution to limit the functionality and reduce resource consumption
is to reduce the modules loaded. This requires patches to the startup scripts.
The benefit of an embedded database application also comes with limitations. The one
and foremost limitation of embedded MonetDB is that the first application accessing the
database effectively locks out any other concurrent use. Even in those situations where
concurrent applications merely read the database, or create privately held tables.
Chapter 2: Client Interfaces 43
2 Client Interfaces
Clients gain access to the Monet server through a internet connection or through its server
console. Access through the internet requires a client program at the source, which addresses
the default port of a running server. The functionality of the server console is limited. It is
a textual interface for expert use.
At the server side, each client is represented by a session record with the current sta-
tus, such as name, file descriptors, namespace, and local stack. Each client session has
a dedicated thread of control, which limits the number of concurrent users to the thread
management facilities of the underlying operating system. A large client base should be
supported using a single server-side client thread, geared at providing a particular service.
The number of clients permitted concurrent access is a compile time option. The console
is the first and is always present. It reads from standard input and writes to standard output.
Client sessions remain in existence until the corresponding communication channels
break or its retention timer expires The administrator and owner of a sesssion can ma-
nipulate the timeout with a system call.
Options are:
-d database | --database=database database to connect to
-e | --echo echo the query
-f kind | --format=kind specify output format {dm,xml} for XQuery, or {csv,ta
-H | --history load/save cmdline history (default off)
-h hostname | --host=hostname host to connect to
-i | --interactive read stdin after command line args
-l language | --language=lang {sql,xquery,mal,mil}
-L logfile | --log=logfile save client/server interaction
-P passwd | --passwd=passwd password
Chapter 2: Client Interfaces 44
The default mapi_port TCP port used is 50000. If this port happens to be in use on
the server machine (which generally is only the case if you run two MonetDB servers on
it), you will have to use the -p port do define the port in which the mserver is listening.
Otherwise, it may also be omitted. If there are more than one mserver running you must
also specify the database name -d database. In this case, if your port is set to the wrong
database, the connection will be always redirect to the correct one. Note that the default
port (and other default options) can be set in the server configuration file.
Within the context of each query language there are more options. They can be shown
usin the command \? or using the commandline.
For SQL there are several knobs to tune for a better rendering of result tables (\w).
program is started. Options given on the command line override the preferences file. The
.monetdb file syntax is <option>=<value> where option is one of the options host, port,
file, mode debug, or password. Note that the last one is perilous and therefore not available
as command line option. If no input file is given using the -f flag, an interactive session is
started on the terminal.
NOTE The JDBC protocol does not support the SQL DEBUG <query>, PROFILE
<query>, and TRACE <query> options. Use the mclient tool instead. OPTIONS
-h --host The hostname of the host that runs the MonetDB database. A port number
can be supplied by use of a colon, i.e. -h somehost:12345.
-p --port The port number to connect to.
-f --file A file name to use either for reading or writing. The file will be used for writing
when dump mode is used (-d –dump). In read mode, the file can also be an
URL pointing to a plain text file that is optionally gzip compressed.
-u --user The username to use when connecting to the database.
-d --database
Try to connect to the given database (only makes sense if connecting to a
DatabasePool, M5 or equivalent process).
-l --language
Use the given language, for example ’xquery’.
--help This screen.
--version
Display driver version and exit.
-e --echo Also outputs the contents of the input file, if any.
-q --quiet
Suppress printing the welcome header.
-D --dump Dumps the given table(s), or the complete database if none given.
EXTRA OPTIONS
-Xdebug Writes a transmission log to disk for debugging purposes. If a file name is given,
it is used, otherwise a file called monet<timestamp>.log is created. A given file
will never be overwritten; instead a unique variation of the file is used.
-Xembedded
Uses an "embedded" server instance. The argument to this option should be
in the form of path/to/mserver:dbname[:dbfarm[:dbinit]].
-Xhash Use the given hash algorithm during challenge response. Supported algorithm
names: SHA1, MD5, plain.
-Xoutput The output mode when dumping. Default is sql, xml may be used for an
experimental XML output.
-Xbatching
Indicates that a batch should be used instead of direct communication with
the server for each statement. If a number is given, it is used as batch size.
Chapter 2: Client Interfaces 47
I.e. 8000 would execute the contents on the batch after each 8000 read rows.
Batching can greatly speedup the process of restoring a database dump.
Chapter 3: MonetDB Assembly Language (MAL) 48
Variables are organized into two classes, starting with and without an underscore. The
latter are reserved as MAL parser tempoaries, whose name aligns with an entry in the
symbol table. In general they can not be used in MAL programs, but they may become
visible in MAL program listings or during debugging.
3.3 Instructions
A MAL instruction has purposely a simple format. It is syntactically represented by an
assignment, where an expression (function call) delivers results to multiple target variables.
The assignment patterns recognized are illustrated below.
(t1,..,t32) := module.fcn(a1,..,a32);
t1 := module.fcn(a1,..,a32);
t1 := v1 operator v2;
t1 := literal;
(t1,..,tn) := (a1,..,an);
Operators are grouped into user defined modules. Ommission of the module name is
interpreter as the user module.
Simple binary arithmetic operations are merely provided as a short-hand, e.g. the ex-
pression t:=2+2 is converted directly into t:= calc.+(2,2).
Target variables are optional. The compiler introduces temporary variables to hold the
result of the expression upon need. They won’t show up when you list the MAL program
unless it is used elsewhere.
For parsing simplicity, each instruction fits on a single line. Comments start with a
sharp ’#’ and continues to the end of the line. They are retained in the internal code
representation to ease debugging of compiler generated MAL programs.
The data structure to represent a MAL block is kept simple. It contains a sequence of
MAL statements and a symbol table. The MAL instruction record is a code byte string
overlaid with the instruction pattern, which contains references into the symbol tables and
administrative data for the interpreter.
This method leads to a large allocated block, which can be easily freed. Variable- and
statement- block together describe the static part of a MAL procedure. It carries enough
information to produce a listing and to aid symbolic debugging.
taken. Built-in controls exists for booleans and numeric values. The barrier block is opened
when the control variable holds true, when its numeric value >= 0, or when it is a non-empty
string. The nil value blocks entry in all cases.
Once inside the barrier you have an option to prematurely leave it at the exit statement
or to redo interpretation just after the corresponding barrier statement. Much like ’break’
and ’continue’ statements in the programming language C. The action is taken when the
condition is met.
The exit marks the exit for a block. Its optional assignment can be used to re-initialize
the barrier control variables or wrap-up any related administration.
The barrier blocks can be properly nested to form a hierarchy of basic blocks. The
control flow within and between blocks is simple enough to deal with during an optimizer
stage. The redo and leave statements mark the partial end of a block. Statements within
these blocks can be re-arranged according to the data-flow dependencies. The order of
partial blocks can not be changed that easily. It depends on the mutual exclusion of the
data flows within each partial block.
Common guarded blocks in imperative languages are the for-loop and if-then-else con-
structs. They can be simulated as follows.
Consider the statement for(i=1;i<10;i++) print(i). The (optimized) MAL block to
implement this becomes:
i:= 1;
barrier B:= i<10;
io.print(i);
i:= i+1;
redo B:= i<10;
exit B;
Translation of the statement if(i<1) print("ok"); else print("wrong"); becomes:
i:=1;
barrier ifpart:= i<1;
io.print("ok");
exit ifpart;
barrier elsepart:= i>=1;
io.print("wrong");
exit elsepart;
Note that both guarded blocks can be interchanged without affecting the outcome. More-
over, neither block would have been entered if the variable happens to be assigned nil.
The primitives are sufficient to model a wide variety of iterators, whose pattern look
like:
barrier i:= M.newIterator(T);
elm:= M.getElement(T,i);
...
leave i:= M.noMoreElements(T);
...
redo i:= M.hasMoreElements(T);
exit i:= M.exitIterator(T);
Chapter 3: MonetDB Assembly Language (MAL) 51
The semantics obeyed by the iterator implementations is as follows. The redo expression
updates the target variable i and control proceeds at the first statement after the barrier
when the barrier is opened by i. If the barrier could not be re-opened, execution proceeds
with the first statement after the redo. Likewise, the leave control statement skips to the
exit when the control variable i shows a closed barrier block. Otherwise, it continues with
the next instruction. Note, in both failed cases the control variable is possibly changed.
A recurring situation is to iterate over the elements in a BAT. This is supported by an
iterator implementation for BATs as follows:
barrier (idx,hd,tl):= bat.newIterator(B);
...
redo (idx,hd,tl):= bat.hasMoreElements(B);
exit (ids,hd,tl);
Where idx is an integer to denote the row in the BAT, hd and tl denote values of the
current element.
io.write("Welcome");
...
catch IOerror:str;
print("input error on reading password");
raise FATALerror:= "Can’t handle it";
exit IOerror;
Since catch is a flow control modifier it can be attached to any assignment statement.
This statement is executed whenever there is no exception outstanding, but will be ignored
when control is moved to the block otherwise.
3.6 Functions
MAL comes with a standard functional abstraction scheme. Functions are represented
by MAL instruction lists, enclosed by a function signature and end statement. The
function signature lists the arguments and their types. The end statement marks the end
of this sequence. Its argument is the function name.
An illustrative example is:
function user.helloWorld(msg:str):str;
io.print(msg);
msg:= "done";
return msg;
end user.helloWorld;
The module name ’user’ designates the collection to which this function belongs. A
missing module name is considered a reference to the current module, i.e. the last module
or atom context openend. All user defined functions are assembled in the module user by
default.
The functional abstraction scheme comes with several variations: commands, pat-
terns, and factories. They are discussed shortly.
Chapter 3: MonetDB Assembly Language (MAL) 53
3.6.2 C functions
The MAL function body can also be implemented with a C-function. They are introduced
to the MAL type checker by providing their signature and an address qualifier for linkage.
We distinguish both command and pattern C-function blocks. They differ in the
information accessible at run time. The command variant calls the underlying C-function,
passing pointers to the arguments on the MAL runtime stack. The pattern command is
passed pointers to the MAL definition block, the runtime stack, and the instruction itself.
It can be used to analyse the types of the arguments directly.
For example, the definitions below link the kernel routine BKCinsert bun with the
function bat.insert(). It does not fully specify the result type. The io.print() pattern
applies to any BAT argument list, provided they match on the head column type. Such a
polymorphic type list may only be used in the context of a pattern.
command bat.insert(b:bat[:any_1,:any_2], ht:any_1, tt:any_2)
:bat[:any_1,:any_2]
address BKCinsert_bun;
pattern io.print(b1:bat[:any_1,:any]...):int
address IOtable;
Chapter 3: MonetDB Assembly Language (MAL) 54
3.7 Factories
A convenient programming construct is the co-routine, which is specified as an ordinary
function, but maintains its own state between calls, and permits re-entry other than by the
first statement.
The random generator example is used to illustrate its definition and use.
factory random(seed:int,limit:int):int;
rnd:=seed;
lim:= limit;
barrier lim;
leave lim:= lim-1;
rnd:= rnd*125;
yield rnd:= rnd % 32676;
redo lim;
exit lim;
end random;
The first time this factory is called, a plant is created in the local system to handle the
requests. The plant contains the stack frame and synchronizes access.
In this case it initializes the generator. The random number is generated and yield
as a result of the call. The factory plant is then put to sleep. The second call received
by the factory wakes it up at the point where it went to sleep. In this case it will find a
redo statement and produces the next random number. Note that also in this case a seed
and limit value are expected, but they are ignored in the body. This factory can be called
upon to generate at most ’limit’ random numbers using the ’seed’ to initialize the generator.
Thereafter it is being removed, i.e. reset to the original state.
A cooperative group of factories can be readily constructed. For example, assume we
would like the random factories to respond to both random(seed,limit) and random().
This can be defined as follows:
factory random(seed:int,limit:int):int;
rnd:=seed;
lim:= limit;
barrier lim;
leave lim:= lim-1;
rnd:= rnd*125;
yield rnd:= rnd % 32676;
redo lim;
exit lim;
end random;
factory random():int;
barrier forever:=true;
yield random(0,0);
redo forever;
exit forever;
end random;
Chapter 3: MonetDB Assembly Language (MAL) 55
The co-routine concept researched in Monet 5 is the notion of a ’factory’, which consists of
’factory plants’ at possibly different locations and with different policies to handle requests.
Factory management is limited to its owner, which is derived from the module in which it
is placed. By default Admin is the owner of all modules.
The factory produces elements for multiple clients. Sharing the factory state or even
remote processing is up to the factory owner. They are set through properties for the factory
plant.
The default policy is to instantiate one shared plant for each factory. If necessary, the
factory can keep track of a client list to differentiate the states. A possible implementation
would be:
factory random(seed:int,clientid:int):int;
clt:= bat.new(:int,:int);
bat.insert(clt,clientid,seed);
barrier always:=true;
rnd:= algebra.find(clt,clientid);
catch rnd; #failed to find client
bat.insert(clt,clientid,seed);
rnd:= algebra.find(clt,clientid);
exit rnd;
rnd:= rnd * 125;
rnd:= rnd % 32676;
algebra.replace(clt,clientid,rnd);
yield rnd;
redo always;
exit always;
end random;
The operators to built client aware factories are, factories.getCaller(), which re-
turns a client index, factories.getModule() and factories.getFunction(), which
returns the identity of scope enclosed.
To illustrate, the client specific random generator can be shielded using the factory:
factory random(seed:int):int;
barrier always:=true;
clientid:= factories.getCaller();
yield user.random(seed, clientid);
redo always;
exit always;
end random;
Chapter 3: MonetDB Assembly Language (MAL) 56
exit outer;
# send last portion
chunk:= algebra.slice(L,i,cnt);
yielD chunk;
return nil;
end chunkStep;
So far we haven’t re-used the pattern that both legs are identical. This could be mod-
eled by a generic chunk factory. Choosing a new factory for each query steps reduces the
administrative overhead.
The code should be extended to also check validity of the BATs. It requires a check
against the last transaction identifier known.
The Factory concept is still rather experimental and many questions should be consid-
ered, e.g. What is the lifetime of a factory? Does it persists after all clients has disappeared?
What additional control do you need? Can you throw an exception to a Factory?
A local copy of an object can be obtained using the pattern ’take(name,[param])’, where
name denotes the variable of interest. The type of the receiving variable should match the
one known for the object. Whether an actual copy is produced or a reference to a shared
object is returned is defined by the box manager.
The object is given back to the box manager calling ’release(name)’. It may update the
content of the repository accordingly, release locks, and move the value to persistent store.
Whatever the semantics of the box requires. [The default implementation is a no-op]
Finally, the object manager can be requested to ’discard(name)’ a variable completely.
The default implementation is to reclaim the space in the box.
Concurrency control, replication services, as well as access to remote stores may be dele-
gated to a box manager. Depending on the intended semantics, the box manager may keep
track of the clients holding links to this members, provide a traditional 2-phase locking
scheme, optimistic control, or check-out/check-in scheme. In all cases, these management
issues are transparant to the main thread (=client) of control, which operates on a tempo-
rary snapshot. For the time being we realize the managers as critical code sections, i.e. one
client is permitted access to the box space at a time.
Fo example, consider the client function:
function myfcn():void;
b:bat[:oid,:int] := bbp.take("mytable");
c:bat[:int,:str] := sql.take("person","age");
d:= intersect(b,c);
io.print(d);
u:str:= client.take(user);
io.print(u);
client.release(user);
end function;
The function binds to a copy from the local persistent BAT space, much like bat-names
are resolved in earlier MonetDB versions. The second statement uses an implementation of
take that searches a variable of interest using two string properties. It illustrates that a box
manager is free to extend/overload the predefined scheme, which is geared towards storing
MAL variables.
The result bat c is temporary and disappears upon garbage collection. The variable u
is looked up as the string object user.
Note that BATs b and c need be released at some point. In general this point in time
does not coincide with a computational boundary like a function return. During a session,
several bats may be taken out of the box, being processed, and only at the end of a session
being released. In this example, it means that the reference to b and c is lost at the end of
the function (due to garbarge collection) and that subsequent use requires another take()
call. The box manager bbp is notified of the implicit release and can take garbage collection
actions.
The box may be inspected at several times during a scenario run. The first time is when
the MAL program is type-checked for the box operations. Typechecking a take() function
is tricky. If the argument is a string literal, the box can be queried directly for the objects’
type. If found, its type is matched against the lhs variable. This strategy fails in the
situation when at runtime the object is subsequently replaced by another typed-instance in
Chapter 3: MonetDB Assembly Language (MAL) 61
the box. We assume this not to happen and the exceptions it raises a valuable advice to
reconsider the programming style.
The type indicator for the destination variable should be provided to proceed with proper
type checking. It can resolve overloaded function selection.
Inspection of the Box can be encoded using an iterator at the MAL layer and relying on
the functionality of the box. However, to improve introspection, we assume that all box im-
plementations provide a few rudimentary functions, called objects(arglist) and dir(arglist).
The function objects() produces a BAT with the object names, possibly limited to those
identified by the arglist.
The world of boxes has not been explored deeply yet. It is envisioned that it could play
a role to import/export different objects, e.g., introduce xml.take() which converts an XML
document to a BAT, jpeg.take() similer for an image.
Nesting boxes is possible. It provides a simple containment scheme between boxes, but
in general will interfere with the semantics of each box.
Each box has (should) have an access control list, which names the users having permis-
sion to read/write its content. The first one to create the box becomes the owner. He may
grant/revoke access to the box to users on a selective basis.
with properties that ’should be obeyed, or implied’ by the actual arguments. It extends the
typing scheme used during compilation/optimization. Likewise, the return values can be
tagged with properties that ’at least’ exist upon function return.
function test(b:bat[:oid,:int]{count<1000}):bat[:oid,:int]{sorted}
#code block
end test
These properties are informative to optimizers. They can be enforced at runtime using
the operation optimizer.enforceRules() which injects calls into the program to check
them. An assertion error is raised if the property does not hold. The code snippet
z:= user.test(b);
is translated into the following code block;
mal.assert(b,"count","<",1000);
z:= user.test(b);
mal.assert(z,"sorted");
How to propagate properties? Property inspection and manipulation is strongly linked
with the operators of interest. Optimizers continuously inspect and update the properties,
while kernel operators should not be bothered with their existence. Property propagation
is strongly linked with the actual operator implementation. We examine a few recurring
cases.
V:=W; Both V and W should be type compatible, otherwise the compiler will already
complain.(Actually, it requires V.type()==W.type() and ~V.isaConstant()) But what hap-
pens with all others? What is the property propagation rule for the assignment? Several
cases can be distinguished:
I) W has a property P, unknown to V. II) V has a propery P, unknown to W. III) V
has property P, and W has property Q, P and Q are incompatible. IV) V and W have a
property P, but its value disaggrees.
case I). If the variable V was not initialized, we can simply copy or share the properties.
Copying might be too expensive, while shareing leads to managing the dependencies. case
II) It means that V is re-assigned a value, and depending on its type and properties we may
have to ’garbage collect/finalize’ it first. Alternatively, it could be interpreted as a property
that will hold after assignment which is not part of the right-hand side expression. case III)
if P and Q are type compatible, it means an update of the P value. Otherwise, it should
generates an exception. case IV) this calls for an update of V.P using the value of W.P.
How this should be done is property specific.
Overall, the policy would be to ’disgard’ all knowledge from V first and then copy the
properties from W.
[Try 1] V:= fcn(A,B,C) and signature fcn(A:int,B:int,C:int):int The signature provides
several handles to attach properties. Each formal parameter could come with a list of
’desirable/necessary’ properties. Likewise, the return values have a property set. This leads
to the extended signature function fcn(A:T,....,B:T): (C:T...D:T) where each Pi denotes a
property set. Properties P1..Pn can be used to select the proper function variant. At its
worst, several signatures of fcn() should be inspected at runtime to find one with matching
properties. To enable analysis and optimization, however, it should be clear that once the
function is finished, the properties Pk..Pm exist.
Chapter 3: MonetDB Assembly Language (MAL) 64
properties.set(B,,2315);
barrier properties.has(B,);
exit;
These example illustrate that the property manipulations are executed throug patterns,
which also accept a stack frame.
Sample problem with dropping properties:
B := bbp.new(int,int);
barrier tst:= randomChoice()
I := properties.drop(B,);
exit tst;
Chapter 3: MonetDB Assembly Language (MAL) 65
The variable names and types are kept in the stack to ease debugging. The underlying
string value need not be garbage collected. Runtime storage for variables are allocated on
the stack of the interpreter thread. The physical stack is often limited in size, which calls
for safeguarding their value and garbage collection before returning. A malicious procedure
or implementation will lead to memory leakage.
A system command (linked C-routine) may be interested in extending the stack. This
is precluded, because it could interfere with the recursive calling sequence of procedures.
To accommodate the (rare) case, the routine should issue an exception to be handled by
the interpreter before retrying. All other errors are turned into an exception, followed by
continuing at the exception handling block of the MAL procedure.
Chapter 5: The MAL Optimizer 70
in replacing the right-hand side expression with a result variable. This pollutes the code
block with simple assignments e.g. V:=T. Within the descendant flow the occurrence of
V could be replaced by T, provided V is never assigned a new value. Approach: literal
constants within a MAL block are already recognized and replaced by a single variable.
Impact: medium.
Common Term Optimizer Goal: to reduce the amount of work by avoiding calculation of
the same operation twice. Rationale: to simplify code generation for front-ends, they do not
have to remember the subexpressions already evaluated. It is much easier to detect at the
MAL level. Approach: simply walk through the instruction sequence and locate identical
patterns. (Enhance is with semantic equivalent instructions) Impact: High Prereq: Alias
Removal
Dead Code Removal Goal: to remove all instructions whose result is not used Rationale:
due to sloppy coding or alternative execution paths dead code may appear. Als XML
Pathfinder is expected to produce a large number of simple assignments. Approach: Every
instruction should produce a value used somewhere else. Impact: low
Heuristic Rule Rewrites Goal: to reduce the volume as quick as possible. Rationale:
most queries are focussed on a small part of the database. To avoid carrying too many
intermediates, the selection should be performed as early as possible in the process. This
assumes that selectivity factors are known upfront, which in turn depends on histogram of
the value distribution. Approach: locate selections and push them back/forth through the
flow graph. Impact: high
Join Path Optimizer Goal: to reduce the volume produced by a join sequence Rationale:
join paths are potentially expensive operations. Ideally the join path is evaluated starting
at the smallest component, so as to reduce the size of the intermediate results. Approach:
to successfully reduce the volume we need to estimate their processing cost. This calls for
statistics over the value distribution, in particular, correlation histograms. If statistics are
not available upfront, we have to restore to an incremental algorithm, which decides on the
steps using the size of the relations. Impact: high
Operator Sort Goal: to sort the dataflow graph in such a way as to reduce the cost,
or to assure locality of access for operands. Rationale: A simple optimizer is to order the
instructions for execution by permutation of the query components Approach: Impact:
Singleton Set Goal: to replace sets that are known to produce precisely one tuple.
Rationale: Singleton sets can be represented by value pairs in the MAL program, which
reduces to a scalar expression. Approach: Identify a set variable for replacement. Impact:
Range Propagation Goal: look for constant ranges in select statements and propagate
them through the code. Rationale: partitioned tables and views may give rise to expressions
that contain multiple selections over the same BAT. If their arguments are constant, the
result of such selects can sometimes be predicted, or the multiple selections can be cascaded
into a single operation. Impact: high, should be followed by alias removal and dead code
removal
Result Cacher Goal: to reduce the processing cost by keeping track of expensive to
compute intermediate results Rationale: Approach: result caching becomes active after an
instruction has been evaluated. The result can be cached as long as its underlying operands
remain unchanged. Result caching can be made transparent to the user, but affects the
other quer optimizers. Impact: high
Chapter 5: The MAL Optimizer 72
Iterator Strength Reduction Goal: to reduce the cost of iterator execution by moving
instructions out of the loop. Rationale: although iteration at the MAL level should be
avoided due to the inherent low performance compared to built-in operators, it is not
forbidden. In that case we should confine the iterator block to the minimal work needed.
Approach: inspect the flowgraph for each iterator and move instructions around. Impact:
low
Accumulator Evaluation Goal: to replace operators with cheaper ones. Rationale: based
on the actual state of the computation and the richness of the supporting libraries there
may exists alternative routes to solve a query. Approach: Operator rewriting depends on
properties. No general technique. The first implementation looks at calculator expressions
such as they appear frequently in the RAM compiler. Impact: high Prerequisite: should be
called after common term optimizer to avoid clashes. Status: Used in the SQL optimizer.
Code Inliner Goal: to reduce the calling depth of the interpreter and to obtain a
better starting point for code squeezing Rationale: substitution of code blocks (or macro
expansion) leads to longer linear code sequences. This provides opportunities for squeezing.
Moreover, at runtime building and managing a stackframe is rather expensive. This should
be avoided for functions called repeatedly. Impact: medium Status: Used in the SQL
optimizer to handle SQL functions.
Code Outliner Goal: to reduce the program size by replacing a group with a single
instruction Rationale: inverse macro expansion leads to shorter linear code sequences. This
provides opportunities for less interpreter overhead, and to optimize complex, but repetative
instruction sequences with a single hardwired call Approach: called explicitly to outline a
module (or symbol) Impact: medium
Garbage Collector Goal: to release resources as quickly as possible Rationale: BATs
referenced from a MAL program keep resources locked. Approach: In cooperation with a
resource scheduler we should identify those that can be released quickly. It requires a forced
gargabe collection call at the end of the BAT’s lifespan. Impact: large Status: Implemented.
Algorithm based on end-of-life-span analysis.
Foreign Key replacements Goal: to improve multi-attribute joins over foreign key con-
straints Rationale: the code produced by the SQL frontend involves foreign key constraints,
which provides many opportunities for speedy code using a join index. Impact: large Status:
Implemented in the SQL strategic optimizer.
of MAL program blocks. These trails can be inspected for a posteriori analysis, at least in
terms of some statistics on the properties of the MAL program structures automatically.
Alternatively, the trail may be pruned and re-optimized when appropriate from changes in
the environment.
The rule applied for all optimizers is to not-return before checking the state of the MAL
program, and to assure the dataflow and variable scopes are properly set. It costs some
performance, but the difficulties that arise from optimizer interference are very hard to
debug. One of the easiest pitfalls is to derive an optimized version of a MAL function while
it is already referenced by or when polymorphic typechecking is required afterwards.
The optimizer routines have access to the client context, the MAL block, and the program
counter where the optimizer call was found. Each optimizer should remove itself from the
MAL block.
The optimizer repeatedly runs through the program until no optimizer call is found.
Note, all optimizer instructions are executed only once. This means that the instruction
can be removed from further consideration. However, in the case that a designated function
is selected for optimization (e.g., commonTerms(user,qry)) the pc is assumed 0. The first
instruction always denotes the signature and can not be removed.
To safeguard against incomplete optimizer implementations it is advisable to perform
an optimizerCheck at the end. It takes as arguments the number of optimizer actions taken
and the total cpu time spent. The body performs a full flow and type check and re-initializes
the lifespan administration. In debugging mode also a copy of the new block is retained for
inspection.
optimizer.accumulators();
If variable t2 is a temporary variable and not used any further in the program block, we
can re-use its storage space and propagate its alias through the remainder of the code.
batcalc.*(t2,64,t2);
t4:= batcalc.+(t2,t1,t2);
The implementation is straight forward. It only deals with the arithmetic operations
available in batcalc right now. This set will be gradually be extended. The key decision
is to determine whether we may overwrite any of the arguments. This is hard to detect
at compile time, e.g. the argument may be the result of a binding operation or represent
a view over a persistent BAT. Therefore, the compiler injects the call algebra.reuse(),
which avoids overwriting persistent BATs by taking a copy.
_54 := nil;
_75 := _67;
_67 := nil;
_83 := _75;
_75 := nil;
few heuristic cost estimators. However, it ensures that empty results are only tagged with
rows=0 if the estimate is accurate, otherwise it assumes at least one result row. This
property makes it possible to safely pass the result of the cost estimation to the emptySet
optimizer for code reduction.
may conclude that variable V31 becomes empty and simply injects a ’dead’ variable by
dropping the assignment statement. This makes other code dead as well.
V30 := algebra.select( V7, 10,100);
V31 := algebra.select(V30,-1,5);
V32 := aggr.sum(V31);
io.print(V32);
[implementation pending]
This block can be further optimized using alias propagation and dead code removal. The
final block becomes:
V1 := bat.new(:oid,:int);
V7 := bat.new(:oid,:int);
V16 := algebra.markH(V7);
V17 := algebra.join(V16,V7);
bat.append(V1,V17);
Chapter 5: The MAL Optimizer 84
During empty set propagation, new candidates may appear. For example, taking the
intersection with an empty set creates a target variable that is empty too. It becomes an
immediate target for optimization. The current implementation is conservative. A limited
set of instructions is considered. Any addition to the MonetDB instruction set would call
for assessment on their effect.
The current algorithm is straight forward. After each instruction, we check whether its
BAT arguments are needed in the future. If not, we inject a garbage collection statement to
release them, provided there are no other reasons to retain it. This should be done carefully,
because the instruction may be part of a loop. If the variable is defined inside the loop, we
can safely remove it.
...
t2:= algebra.join(b,d);
z2:= algebra.join(a,t2);
The joinpath would merge them into
z1:= algebra.joinPath(a,b,c);
...
z2:= algebra.joinPath(a,b,d);
which are handle by a heuristic looking at the first two argments and re-uses a materi-
alized join.
_13:= algebra.join(a,b);
z1:= algebra.join(_13,c);
...
z2:= algebra.join(_13,d);
An alternative is to make recognition of the common re-useable paths an integral part
of the joinPath body.
x3:= algebra.join(a,b);
r3:= bat.reverse(x3);
j1:= join(c,r3);
rb:= bat.reverse(b);
ra:= bat.reverse(a);
j1:= algebra.joinpath(c,rb,ra);
As a final step in the speed up of the joinpath we consider clustering large operands if
that is expected to improve IO behavior.
pattern optimizer.orcam(targetmod:str,targetfcn:str):void
address OPTorcam
comment "Inverse macro processor for current function";
pattern optimizer.orcam(mod:str,fcn:str,targetmod:str,targetfcn:str):void
address OPTorcam
comment "Inverse macro, find pattern and replace with a function call.";
T3:= algebra.join(C,D);
scheduler.choice("getVolume",T1,T2,T3);
T4:= algebra.join(T1,C);
T5:= algebra.join(A,T2);
T6:= algebra.join(T2,D);
T7:= algebra.join(B,T3);
T8:= algebra.join(C,D);
scheduler.choice("getVolume",T4,T5,T6,T7,T8);
T9:= algebra.join(T4,D);
T10:= algebra.join(T5,D);
T11:= algebra.join(A,T6);
T12:= algebra.join(A,T7);
T13:= algebra.join(T1,T8);
scheduler.choice("getVolume",T9,T10,T11,T12,T13);
answer:= scheduler.pick(T9, T10, T11, T12, T13);
The scheduler.choice() operator calls a builtin getVolume for each target variable
and expects an integer-valued cost. In this case it returns the total number of bytes uses as
arguments.
The target variable with the lowest cost is chosen for execution and remaining variables
are turned into a temporary NOOP operation.(You may want to re-use the memo) They
are skipped by the interpreter, but also in subsequent calls to the scheduler. It reduces the
alternatives as we proceed in the plan.
A built-in naive cost function is used. It would be nice if the user could provide a private
cost function defined as a pattern with a polymorphic argument for the target and a :lng
result. Its implementation can use the complete context information to make a decision.
For example, it can trace the potential use of the target variable in subsequent statements
to determine a total cost when this step is taken towards the final result.
A complete plan likely includes other expressions to prepare or use the target variables
before reaching the next choice point. It is the task of the choice operator to avoid any
superfluous operation.
The MAL block should be privately owned by the caller, which can be assured with
scheduler.isolation().
A refinement of the scheme is to make cost analysis part of the plan as well. Then you
don’t have to include a hardwired cost function.
Acost:= aggr.count(A);
Bcost:= aggr.count(B);
Ccost:= aggr.count(C);
T1cost:= Acost+Bcost;
T2cost:= Bcost+Ccost;
T3cost:= Ccost+Dcost;
scheduler.choice(T1cost,T1, T2cost,T2, T3cost,T3);
T1:= algebra.join(A,B);
T2:= algebra.join(B,C);
T3:= algebra.join(C,D);
...
Chapter 5: The MAL Optimizer 89
s := mat.pack(_33,_34,_35);
io.print(s);
For the join we have to generate all possible combinations, not knowing anything about
the properties of the components. The current heuristic is to limit expansion to a single
argument. This leads to
b := mat.pack(m0,m1,m2);
_39 := algebra.join(b,c0);
_40 := algebra.join(b,c1);
j := mat.new(_39,_40);
The drawback of the scheme is the potential explosion in MAL statements. A challenge
of the optimizer is to find the minimum by inspection of the properties of the MAT elements.
For example, it might attempt to partially pack elements before proceding. This would be
a runtime scheduling decision.
Alternatively, the system could use MAT iterators to avoid packing at the cost of more
complex program analysis afterwards.
ji:= bat.new(:oid,:int);
barrier b:= mat.newIterator(m0,m1,m2);
barrier c:= mat.newIterator(c0,c1);
ji := algebra.join(b,c);
bat.insert(j,ji);
redo c:= mat.newIterator(c0,c1);
redo b:= mat.newIterator(m0,m1,m2);
exit c;
exit b;
z:= algebra.markT(r,o);
rr:= bat.reverse(z);
s := bat.reverse(r);
t := bat.reverse(s);
io.print(t);
optimizer.peephole();
which is translated by the peephole optimizer into:
r:bat[:int,:int] := bat.new(:int,:int);
rr := algebra.markH(r);
io.print(r);
Another example is the combination of a BAT partition operation followed by a re-
construction without using the partitions individually.
T2 := bat.new(:int,:int);
d := algebra.select(T4,0,5);
T4 := bat.new(:int,:int);
Any valid MAL routine can be overlayed with a tree (graph) view based on the flow
dependencies, but not all MAL programs can be derived from a simple tree. For example,
the code snippet above when interpreted as a linear sequence can not be represented unless
the execution order itself becomes an operator node itself.
However, since we haven’t added or changed the original MAL program, the routine
qep.propagate produces the orginial program, where the linear order has priority. If,
however, we had entered new instructions into the tree, they would have been placed in
close proximity of the other tree nodes.
Special care is given to the flow-of-control blocks, because to produce a query plan section
that can not easily be moved around. [give dot examples]
times. Re-use of (partial) results is used in those cases where a zooming-in or navigational
application is at stake.
The Recycler optimizer and module extends this with a middle out approach. They
exploit the materialize-all-intermediate approach of MonetDB by deciding to keep a hold
on them as long as deemed beneficial.
The approach taken is to mark the instructions in a MAL program using the recycler
optimizer call, such that their result is retained in a global recycle cache hardwired in
the MAL interpreter. Instructions become subject to the Recycler if at least one of its
arguments is a BAT and all others are either constants or variables already known in the
Recycler.
Upon execution, the recycler is called from the inner loop of the MAL interpreter to
first check for an up-to-date result to be picked up at no cost. Otherwise, it evaluates the
instruction and calls upon policy functions to decide if it is worthwhile to keep.
The Recycler comes with a few policy controlling operators to experiment with its effect
in concrete settings. The retain policy controls when to keep results around, the reuse policy
looks after exact duplicate instructions or uses semantical knowledge on MAL instructions
to detect potential reuse gain (e.g. reuse select results). And finally, the cache policy looks
after the storage space for the intermediate result pool. The details are described in the
recycle module.
pattern optimizer.recycle():str
address OPTrecycle;
pattern optimizer.recycle(mod:str, fcn:str):str
address OPTrecycle
comment "Replicator code injection";
/* #define DEBUG_OPT_RECYCLER */
The variables are all checked for being eligible as a variable subject to recycling control.
A variable may only be assigned a value once. The function is a sql.bind(-,-,-,0) or all
arguments are already recycle enabled or constant.
The arguments of the function cannot be recycled. They change with each call. This
does not mean that the instructions using them can not be a target of recycling.
Just looking at a kept result target is not good enough. You have to sure that the
arguments are also the same. This rules out function arguments.
The recycler is targeted towards a query only database. The best effect is obtained for
a single-user mode (sql debug=32 ) when the delta-bats are not processed which allows
longer instruction chains to be recycled. Update statements are not recycled. They trigger
cleaning of the recycle cache at the end of the query. Only intermediates derived from
Chapter 5: The MAL Optimizer 95
the updated columns are invalidated. Separate update instructions in queries, such as
bat.append implementing ’OR’, are monitored and also trigger cleaning the cache.
#include "mal_config.h"
#include "opt_recycler.h"
#include "mal_instruction.h"
static int
OPTrecycleImplementation(Client cntxt, MalBlkPtr mb, MalStkPtr stk, InstrPtr p)
{
int i, j, cnt, actions = 0;
Lifespan span;
InstrPtr *old, q;
int limit, updstmt = 0;
char *recycled;
short app_sc = -1,app_tbl = -1;
(void) cntxt;
(void) stk;
/* watch out, instructions may introduce new variables */
limit= mb->stop;
old = mb->stmt;
span = setLifespan(mb);
if ( span == NULL)
return 0;
recycled= GDKzalloc(sizeof(char)*mb->vtop*2);
if ( recycled == NULL)
return 0;
newMalBlkStmt(mb, mb->ssize);
pushInstruction(mb,old[0]);
p = old[i];
if (hasSideEffects(p,TRUE) || isUnsafeFunction(p)){
if( getModuleId(p)== recycleRef ){ /*don’t inline recycle instr. */
freeInstruction(p);
continue;
}
pushInstruction(mb,p);
/* update instructions are not recycled but monitored*/
if( isUpdateInstruction(p)){
if (getModuleId(p) == batRef &&
(getArgType(mb,p,1)==TYPE_bat
|| isaBatType(getArgType(mb, p,1)))){
recycled[getArg(p,1)]= 0;
q= newFcnCall(mb,"recycle","reset");
pushArgument(mb,q, getArg(p,1));
actions++;
}
if (getModuleId(p) == sqlRef){
if (getFunctionId(p) == appendRef){
app_sc = getArg(p,1);
app_tbl = getArg(p,2);
} else {
q= newFcnCall(mb,"recycle","reset");
pushArgument(mb,q, getArg(p,1));
pushArgument(mb,q, getArg(p,2));
if (getFunctionId(p) == updateRef)
pushArgument(mb,q, getArg(p,3));
}
actions++;
}
}
continue;
}
if (p->barrier && p->token != CMDcall){
/* never save a barrier unless it is a command and side-effect free */
pushInstruction(mb,p);
continue;
}
continue;
}
if(getModuleId(p) == pcreRef) {
if (( getFunctionId(p)== selectRef && recycled[getArg(p,2)]) ||
( getFunctionId(p)== uselectRef && recycled[getArg(p,2)])){
p->recycle = REC_MAX_INTEREST;
actions ++;
if (getLastUpdate(span, getArg(p,0)) == i)
recycled[getArg(p,0)] = 1;
}
else if( getFunctionId(p)== likeuselectRef && recycled[getArg(p,1)]) {
q = copyInstruction(p);
getArg(q,0)= newTmpVariable(mb,TYPE_any);
setFunctionId(q, likeselectRef);
q->recycle = REC_MAX_INTEREST;
recycled[getArg(q,0)] = 1;
pushInstruction(mb,q);
getArg(p,1) = getArg(q,0);
setFunctionId(p,markTRef);
setModuleId(p,algebraRef);
p->argc = 2;
p->recycle = REC_MAX_INTEREST;
actions ++;
if (getLastUpdate(span, getArg(p,0)) == i)
recycled[getArg(p,0)] = 1;
}
}
Chapter 5: The MAL Optimizer 99
The sql.bind instructions should be handled carefully The delete and update BATs should
not be recycled, because they may lead to view dependencies that later interfere with the
transaction commits.
if (getModuleId(p)== sqlRef &&
(((getFunctionId(p)==bindRef || getFunctionId(p) == putName("bind_idxbat",11)) &&
getVarConstant(mb, getArg(p,4)).val.ival != 0) ||
getFunctionId(p)== binddbatRef) ) {
recycled[getArg(p,0)]=0;
p->recycle = REC_NO_INTEREST; /* this instruction is not monitored */
}
pushInstruction(mb,p);
}
GDKfree(span);
GDKfree(old);
GDKfree(recycled);
mb->recycle = actions > 0;
return actions;
}
b:= bat.new(:int,:int);
bat.insert(b,1,2);
c{singleton}:= algebra.select(b,0,4);
d:= algebra.markH(c);
io.print(d);
optimizer.singleton();
is translated by into the code block
b := bat.new(:int,:int);
bat.insert(b,1,2);
c{singleton} := algebra.select(b,0,4);
(_15,_16):= bat.unpack(c{singleton});
d := bat.pack(nil,_16);
io.print(d);
barrier go := true;
j := "not moved";
k := j;
io.print(i);
redo go:= false;
exit go;
z:= j;
Application is only applicable to loops and not to guarded blocks in general, because
execution of a statement outside the guarded block consumes processing resources which
may have been prohibited by the block condition.
For example, it doesn’t make sense to move creation of objects outside the barrier.
Chapter 6: The MAL Debugger 103
The debugger mode is left with a <return>. Any subsequent MAL instruction re-activates
the debugger to await for commands. The default operation is to step through the execution
using the ’next’ (’n’) or ’step’ (’s) commands, as shown below.
mal>user.test(1);
# user.test(1);
mdb>n
# io.print(i);
mdb>
[ 1 ]
# i := calc.*(i,2);
mdb>
# b := bat.new(:int,:int);
mdb>
The last instruction shown is next to be executed. The result can be shown using a
print statement, which contains the location of the variable on the stack frame, its name,
its value and type. The complete stack frame becomes visible with ’values’ (’v’) command:
# bat.insert(b,1,i);
mdb>
# io.print(b);
mdb>v
#Stack for ’test’ size=32 top=11
#[0] test = nil:str
#[1] i = 4:int
#[2] _2 = 0:int unused
#[3] _3 = 2:int constant
#[4] b = <tmp_1226>:bat[:int,:int] count=1 lrefs=1 refs=0
#[5] _5 = 0:int type variable
#[6] _6 = nil:bat[:int,:int] unused
#[7] _7 = 1:int constant
#[8] _8 = 0:int unused
#[9] _9 = "ok":str constant
The variables marked ’unused’ have been introduced as temporary variables, but which
are not referenced in the remainder of the program. It also illustrates basic BAT properties,
a complete description of which can be obtained using the ’info’ (’i’) command. A sample
of the BAT content can be printed passing tuple indices, e.g. ’print b 10 10’ prints the
second batch of ten tuples.
Chapter 6: The MAL Debugger 106
mdb>c
[ 3 ]
# 26 usec# 0 0# io.print(i=3)
# 6 usec# 0 0# i := calc.*(i=6, _3=2)
# 10 usec# 0 0# b := bat.new(_5=0, _6=0)
# 7 usec# 0 8# bat.insert(b=<tmp_167>bat[:int,:int]{1}, _8=1, i=6)
#-----------------#
# h t # name
# int int # type
#-----------------#
[ 1, 6 ]
# 41 usec# 0 8# io.print(b=<tmp_167>bat[:int,:int]{1})
# 7 usec# 0 0# return test := "ok";
# 211 usec# 0 0# user.test(_2=3)
mal> io.print(b);
mal> mdb.setTimer(false);
mal> return test:= "ok";
mal>end test;
mal>user.test(1);
# 6 usec# mdb.setTimer(_3=true)
[ 1 ]
# 43 usec# io.print(i=1)
# 5 usec# i := calc.*(i=2, _5=2)
# 24 usec# b := bat.new(_7=0, _8=0)
# 10 usec# bat.insert(b=<tmp_1226>, _10=1, i=2)
#-----------------#
# h t # name
# int int # type
#-----------------#
[ 1, 2 ]
# 172 usec# io.print(b=<tmp_1226>)
# 261 usec# user.test(_2=1)
It is also possible to activate the debugger from within a program using mdb.start().
It remains in this mode until you either issue a quit command, or the command mdb.stop()
instruction is encountered. The debugger is only activated when the user can direct its
execution from the client interface. Otherwise, there is no proper input channel and the
debugger will run in trace mode.
The program listing functionality of the debugger is also captured in the MAL
debugger module. The current code block can be listed using mdb.list() and
mdb.List(). An arbitrary code block can be shown with mdb.list(module,function) and
mdb.List(module,function). A BAT representation of the current function is return by
mdb.getDefinition().
The symbol table and stack content, if available, can be shown with the operations
mdb.var() and mdb.list(module,function) Access to the stack frames may be helpful in
the context of exception handling. The operation mdb.getStackDepth() gives the depth
and individual elements can be accessed as BATs using mdb.getStackFrame(n). The
top stack frame is accessed using mdb.getStackFrame().
#-------------------------------------------------#
[ 0, "Thu Feb 7 15:57:08 2008", "0" ]
[ 1, "Thu Feb 7 15:57:11 2008", "0" ]
Locate the process you are interested in and obtain its identifier, say N (the first column
in the list above). The next step is to gracefully put the running process into debugging
mode without jeopardizing the application running.
mal> mdb.setTrap(1);
#process 1 put to sleep
mal> mdb.grab();
As soon as the next MAL instruction of process N starts the target process is put to
sleep and you can access the context for debugging. The control ends when you leave the
debugger with a ’quit’ command.
Chapter 7: The MAL Profiler 111
profiler.openStream("/tmp/MonetDBevents");
profiler.start();
b:= bbp.new(:int,:int);
bat.insert(b,1,15);
bat.insert(b,2,4);
Chapter 7: The MAL Profiler 112
bat.insert(b,3,9);
io.print(b);
profiler.stop();
profiler.closeStream();
In this example, we are interested in all functions name insert and print. A wildcard
can be used to signify any name, e.g. no constraints are put on the module in which the
operations are defined. Several profiler components are ignored, shown by commenting out
the code line.
Execution of the sample leads to the creation of a file with the following content. The
ticks are measured in micro-seconds.
# time, ticks, stmt # name
[ "15:17:56", 12, "_27 := bat.insert(<tmp_15>{3},1,15);" ]
[ "15:17:56", 2, "_30 := bat.insert(<tmp_15>{3},2,4);" ]
[ "15:17:56", 2, "_33 := bat.insert(<tmp_15>{3},3,9);" ]
[ "15:17:56", 245, "_36 := io.print(<tmp_15>{3});", ]
Event selector:
a =aggregates
e =event
f =function
o =operation called
T =time
t =ticks
c =cpu statistics
m =memory resources
i =io resources
b =bytes read/written
d =diskspace needed
s =statement
Chapter 7: The MAL Profiler 114
p =pgfaults,cntxtswitches
Ideally, the stream of events should be piped into a 2D graphical tool, like xosview
(Linux). A short term solution is to generate a gnuplot script to display the numerics
organized as time lines. With a backup of the event lists give you all the information
needed for a descent post-mortem analysis.
A convenient way to watch most of the SQL interaction you may use the command:
stethoscope +tis algebra.* bat.* group.* sql.* aggr.*
Chapter 8: The MAL Modules 115
software package being distributed. It merely requires a different direction for the mal init
property. The scheme also isolates the functionality embedded in modules from inadvertise
use on non-compliant databases.
Unlike previous versions of MonetDB, modules can not be unloaded. Dynamic libraries
are always global and, therefore, it is best to load them as part of the server initialization
phase.
command close():void
address CMDbbpclose comment "Close the bbp box.";
command destroy():void
address CMDbbpdestroy comment "Destroy the box";
pattern take(name:str) :bat[:any_1,:any_2]
address CMDbbptake comment "Load a particular bat.";
pattern deposit(name:str,v:bat[:any_1,:any_2]) :void
address CMDbbpdeposit comment "Enter a new bat into the bbp box.";
pattern deposit(name:str,loc:str) :bat[:any_1,:any_2]
address CMDbbpbindDefinition comment "Relate a logical name to a physical
BAT in the buffer pool.";
pattern commit():void
address CMDbbpReleaseAll comment "Commit updates for this client.";
pattern releaseAll():void
address CMDbbpReleaseAll comment "Commit updates for this client.";
pattern release(name:str,val:bat[:any_1,:any_2]) :void
address CMDbbprelease comment "Commit updates and release this BAT.";
pattern release(b:bat[:any_1,:any_2]):void
address CMDbbpreleaseBAT comment "Remove the BAT from further consid-
eration";
pattern destroy(b:bat[:any_1,:any_2]):void
address CMDbbpdestroyBAT1 comment "Schedule a BAT for removal at ses-
sion end.";
pattern destroy(b:bat[:any_1,:any_2],immediate:bit)
address CMDbbpdestroyBAT comment "Schedule a BAT for removal at session
end or immediately.";
pattern toString(name:str):str
address CMDbbptoStr comment "Get the string representation of an element
in the box.";
pattern discard(name:str):void
address CMDbbpdiscard comment "Remove the BAT from the box.";
pattern iterator(nme:str):lng
address CMDbbpiterator comment "Locate the next element in the box.";
pattern prelude():void
address CMDbbpprelude comment "Initialize the bbp box.";
pattern bind(name:str):bat[:any_1,:any_2]
address CMDbbpbind comment "Locate the BAT using its logical name";
pattern bind(head:str,tail:str):bat[:any_1,:any_2]
address CMDbbpbind2 comment "Locate the BAT using the head and tail
names in the BAT buffer pool");
Chapter 8: The MAL Modules 119
pattern bind(idx:int):bat[:any_1,:any_2]
address CMDbbpbindindex comment "Locate the BAT using its BBP index in
the BAT buffer pool";
pattern getObjects():bat[:int,:str]
address CMDbbpGetObjects comment "View of the box content.";
command getHeadType() :bat[:int,:str]
address CMDbbpHeadType comment "Map a BAT into its head type";
command getTailType() :bat[:int,:str]
address CMDbbpTailType comment "Map a BAT into its tail type";
command getNames() :bat[:int,:str]
address CMDbbpNames comment "Map BAT into its bbp name";
command getRNames() :bat[:int,:str]
address CMDbbpRNames comment "Map a BAT into its bbp physical name";
command getName( b:bat[:any_1,:any_2]):str
address CMDbbpName comment "Map a BAT into its internal name";
command getCount() :bat[:int,:lng]
address CMDbbpCount comment "Create a BAT with the cardinalities of all
known BATs";
command getRefCount() :bat[:int,:int]
address CMDbbpRefCount comment "Create a BAT with the (hard) reference
counts";
command getLRefCount() :bat[:int,:int]
address CMDbbpLRefCount comment "Create a BAT with the logical reference
counts";
command getLocation() :bat[:int,:str]
address CMDbbpLocation comment "Create a BAT with their disk locations";
command getHeat() :bat[:int,:int]
address CMDbbpHeat comment "Create a BAT with the heat values";
command getDirty() :bat[:int,:str]
address CMDbbpDirty comment "Create a BAT with the dirty/ diffs/clean
status";
command getStatus() :bat[:int,:str]
address CMDbbpStatus comment "Create a BAT with the disk/load status";
command getKind():bat[:int,:str]
address CMDbbpKind comment "Create a BAT with the persistency status";
command getRefCount(b:bat[:any_1,:any_2]) :int
address CMDgetBATrefcnt comment "Utility for debugging MAL interpreter";
command getLRefCount(b:bat[:any_1,:any_2]) :int
address CMDgetBATlrefcnt comment "Utility for debugging MAL interpreter";
Chapter 8: The MAL Modules 120
8.5 Constants
The const module provides a box abstraction store for global constants. Between sessions,
the value of the constants is saved on disk in the form of a simple MAL program, which is
scanned and made available by opening the box. A future implementation should provide
transaction support over the box, which would permit multiple clients to exchange (scalar)
information easily.
The default constant box is initialized with session variables, such as ’user’,’dbname’,
’dbfarm’, and ’dbdir’. These actions are encapsulated in the prelude routine called.
A box should be opened before being used. It is typically used to set-up the list of
current users and to perform authorization. The constant box is protected with a simple
authorization scheme, prohibiting all updates unless issued by the system administrator.
module const;
pattern open():void
address CSTopen comment "Locate and open the constant box.";
pattern close():void
address CSTclose comment "Close the constant box.";
pattern destroy():void
address CSTdestroy comment "Destroy the box.";
pattern take(name:str):any_1
address CSTtake comment "Take a variable out of the box.";
pattern deposit(name:str,val:any_1) :void
address CSTdeposit comment "Add a variable to the box.";
pattern releaseAll():void
address CSTreleaseAll comment "Release all variables in the box.";
pattern release(name:str) :void
address CSTrelease comment "Release a constant value.";
pattern release(name:any_1):void
address CSTrelease comment "Release a constant value.";
pattern toString(name:any_1):str
address CSTtoString comment "Get the string representation of an element in
the box.";
pattern discard(name:any_1) :void
address CSTdiscard comment "Release the const from the box.";
pattern newIterator()(:lng,:str)
address CSTnewIterator comment "Locate next element in the box.";
Chapter 8: The MAL Modules 121
pattern hasMoreElements()(:lng,:str)
address CSThasMoreElements comment "Locate next element in the box.";
pattern bat.getTail(b:bat[:any_2,:any_1],i:lng):any_1
address CHPgetTail comment "return the BUN tail value using the cursor.";
pattern setListing(flag:int):int
address CLTsetListing comment "Turn on/off echo of MAL instructions: 2 -
show mal instruction, 4 - show details of type resolutoin, 8 - show binding
information.";
pattern setHistory(s:str)
address CLTsetHistory comment "Designate console history file for readline.";
pattern getId():int
address CLTgetClientId comment "Return a number that uniquely represents
the current client.";
pattern getInfo( ):bat[:str,:str]
address CLTInfo comment "Pseudo bat with client attributes.";
pattern getScenario():str
address CLTgetScenario comment "Retrieve current scenario name.";
pattern setScenario(msg:str):str
address CLTsetScenario comment "Switch to other scenario handler, return
previous one.";
pattern quit():void
address CLTquit comment "Terminate the client session.";
pattern quit(idx:int):void
address CLTquit comment "Terminate the session for a single client using a
soft error. It is the privilige of the console user.";
Administrator operations
command getLogins( ):bat[:int,:str]
address CLTLogin comment "Pseudo bat of client login time.";
command getLastCommand( ):bat[:int,:str]
address CLTLastCommand comment "Pseudo bat of client’s last command
time.";
command getActions( ):bat[:int,:int]
address CLTActions comment "Pseudo bat of client’s command counts.";
command getTime( ):bat[:int,:lng]
address CLTTime comment "Pseudo bat of client’s total time usage(in usec).";
command getUsers( ):bat[:int,:str]
address CLTusers comment "Pseudo bat of users logged in.";
pattern stop(id:int)
address CLTstop comment "Stop the query execution at the next eligble state-
ment.";
pattern suspend(id:int):void
address CLTsuspend comment "Put a client process to sleep for some time. It
will simple sleep for a second at a time, until the awake bit has been set in its
descriptor";
Chapter 8: The MAL Modules 124
command wakeup(id:int):void
address CLTwakeup comment "Wakeup a client process";
command shutdown(forced:bit):void
address CLTshutdown comment "Close all client connections. If forced=false
the clients are moved into FINISHING mode, which means that the process
stops at the next cycle of the scenario. If forced=true all client processes are
immediately killed";
8.10 Inspection
This module introduces a series of commands that provide access to information stored
within the interpreter data structures. It’s primary use is debugging. In all cases, the
pseudo BAT operation is returned that should be garbage collected after being used.
The main performance drain would be to use a pseudo BAT directly to successively
access it components. This can be avoided by first assigning the pseudo BAT to a variable.
module inspect;
command getWelcome():str
address INSPECTgetWelcome comment "Return the server message of the day
string";
pattern getDefinition(mod:str,fcn:str) :bat[:str,:str]
address INSPECTgetDefinition comment "Returns a string representation of a
specific function.";
Chapter 8: The MAL Modules 125
pattern getTypeIndex(v:any_1):int
address INSPECTtypeIndex comment "Return the type index of a variable.
For BATs, return the type index for its tail.";
pattern equalType(l:any, r:any):bit
address INSPECTequalType comment "Return true if both operands are of the
same type";
command getAtomNames():bat[:int,:str]
address INSPECTatom names comment "Collect a BAT with the atom
names.";
command getAtomSuper():bat[:int,:str]
address INSPECTatom sup names comment "Collect a BAT with the atom
names.";
command getAtomSizes():bat[:int,:int]
address INSPECTatom sizes comment "Collect a BAT with the atom sizes.";
command getEnvironment():bat[:str,:str]
address INSPECTgetEnvironment comment "Collect the environment
variables.";
command assert(v:bit,term:str):void
address MALassertBit;
command assert(v:sht,term:str):void
address MALassertSht;
command assert(v:int,term:str):void
address MALassertInt;
command assert(v:lng,term:str):void
address MALassertLng;
command assert(v:str,term:str):void
address MALassertStr;
command assert(v:oid,term:str):void
address MALassertOid;
pattern assert(v:any_1,pname:str,oper:str,val:any_2):void
address MALassertTriple comment "Assertion test.";
pattern assertSpace(depth:int)
address safeguardStack comment "Ensures that the current call does not con-
sume more than depth*vtop elements on the stack.";
pattern dataflow():int
address MALstartDataflow comment "The current guarded block is executed
using dataflow control. ";
pattern register(m:str,f:str,code:str,help:str):void
address CMDregisterFunction comment"Compile the code string and register
it as a MAL function.";
pattern setMemoryTrace(flg:bit):void
address CMDsetMemoryTrace comment "Set the flag to trace the memory foot-
print";
pattern setThreadTrace(flg:bit):void
address CMDsetThreadTrace comment "Set the flag to trace the interpreter
threads";
pattern setTimerTrace(flg:bit):void
address CMDsetTimerTrace comment "Set the flag to trace the execution
time";
pattern setIOTrace(flg:bit):void
address CMDsetIOTrace comment "Set the flag to trace the IO";
pattern call(s:str):void
address CMDcallString comment "Evaluate a MAL string program.";
pattern call(s:bat[:oid,:str]):void
address CMDcallBAT comment "Evaluate a program stored in a BAT.";
pattern source(f:str):void
address CMDevalFile comment "Merge the instructions stored in the file with
the current program.";
Chapter 8: The MAL Modules 130
pattern setCatch(b:bit):void
address MDBsetCatch comment "Turn on/off catching exceptions";
pattern setThread(b:bit):void
address MDBsetThread comment "Turn on/off thread identity for debugger";
pattern setTimer(b:bit):void
address MDBsetTimer comment "Turn on/off performance timer for debugger";
pattern setMemoryTrace(b:bit):void
address MDBsetBigfoot comment "Turn on/off memory foot print tracer for
debugger";
pattern setFlow(b:bit):void
address MDBsetFlow comment "Turn on/off memory flow debugger";
pattern setMemory(b:bit):void
address MDBsetMemory comment "Turn on/off memory statistics tracing.";
pattern setIO(b:bit):void
address MDBsetIO comment "Turn on/off io statistics tracing";
pattern setCount(b:bit):void
address MDBsetCount comment "Turn on/off bat count statistics tracing";
command getDebug():int
address MDBgetDebug comment "Get the kernel debugging bit-set. See the
MonetDB configuration file for details";
command setDebug(flg:str):int
address MDBsetDebugStr comment "Set the kernel debugging bit-set and re-
turn its previous value. The recognized options are: threads, memory, proper-
ties, io, transactions, modules, algorithms, estimates, xproperties";
command setDebug(flg:int):int
address MDBsetDebug comment "Set the kernel debugging bit-set and return
its previous value.";
command getException(s:str):str
address MDBgetExceptionVariable comment "Extract the variable name from
the exception message";
command getReason(s:str):str
address MDBgetExceptionReason comment "Extract the reason from the ex-
ception message";
command getContext(s:str):str
address MDBgetExceptionContext comment "Extract the context string from
the exception message";
pattern list():void
address MDBlist comment "Dump the current routine on standard out.";
pattern listMapi():void
address MDBlistMapi comment "Dump the current routine on standard out
with Mapi prefix.";
Chapter 8: The MAL Modules 132
pattern list(M:str,F:str):void
address MDBlist3 comment "Dump the routine M.F on standard out.";
pattern List():void
address MDBlistDetail comment "Dump the current routine on standard out.";
pattern List(M:str,F:str):void
address MDBlist3Detail comment "Dump the routine M.F on standard out.";
pattern var():void
address MDBvar comment "Dump the symboltable of current routine on stan-
dard out.";
pattern var(M:str,F:str):void
address MDBvar3 comment "Dump the symboltable of routine M.F on standard
out.";
pattern lifespan(M:str,F:str):void
address MDBlifespan comment "Dump the current routine lifespan information
on standard out.";
pattern grab():void
address mdbGrab comment "Call debugger for a suspended process.";
pattern trap():void
address mdbTrap comment "A suspended process for debugging.";
pattern dot(M:str,F:str,s:str):void
address MDBshowFlowGraph comment "Dump the data flow of the function
M.F in a format recognizable by the command ’dot’ on the file s";
pattern getStackDepth():int
address MDBStkDepth comment "Return the depth of the calling stack.";
pattern getStackFrame(i:int):bat[:str,:str]
address MDBgetStackFrameN;
pattern getStackFrame():bat[:str,:str]
address MDBgetStackFrame comment "Collect variable binding of current (n-
th) stack frame.";
pattern getStackTrace():bat[:void,:str]
address MDBStkTrace;
pattern dump()
address MDBdump comment "Dump instruction, stacktrace, and stack";
pattern getDefinition():bat[:void,:str]
address MDBgetDefinition comment "Returns a string representation of the
current function with typing information attached";
Chapter 8: The MAL Modules 133
A cleaner and simplier interface for distributed processing is available in the module
remote.
module mapi;
command listen():int
address SERVERlisten default comment "Start a Mapi server with the default
settings.";
command listen(port:int):int
address SERVERlisten port comment "Start a Mapi listener on the port
given.";
command listen(port:int, maxusers:int):int
address SERVERlisten2 comment "Start a Mapi listener.";
command listen(port:int, maxusers:int, cmd:str):int
address SERVERlisten3 comment "Start the Mapi listener on <port> for
<maxusers>. For a new client connection MAL procedure <cmd>(Stream s in,
Stream s out) is called.If no <cmd> is specified a new client thread is forked.";
command stop():void
address SERVERstop comment "Terminate connection listeners.";
command suspend():void
address SERVERsuspend comment "Suspend accepting connections.";
command resume():void
address SERVERresume comment "Resume connection listeners.";
command malclient(in:streams, out:streams):void
address SERVERclient comment "Start a Mapi client for a particular stream
pair.";
command trace(mid:int,flag:int):void
address SERVERtrace comment "Toggle the Mapi library debug tracer.";
pattern reconnect(host:str, port:int, usr:str, passwd:str,lang:str):int
address SERVERreconnectWithoutAlias comment "Re-establish connection
with a remote mserver.";
pattern reconnect(host:str, port:int, db_alias:str, usr:str,
passwd:str,lang:str):int
address SERVERreconnectAlias comment "Re-establish connection with a re-
mote mserver.";
command reconnect(mid:int):void
address SERVERreconnect comment "Re-establish a connection.";
pattern connect(host:str, port:int, usr:str, passwd:str,lang:str):int
address SERVERconnect comment "Establish connection with a remote
mserver.";
command disconnect(dbalias:str):int
address SERVERdisconnectWithAlias comment "Close connection with a re-
mote Mserver.";
Chapter 8: The MAL Modules 135
command disconnect():int
address SERVERdisconnectALL comment "Close connections with all remote
Mserver.";
command setAlias(dbalias:str)
address SERVERsetAlias comment "Give the channel a logical name.";
command lookup(dbalias:str):int
address SERVERlookup comment "Retrieve the connection identifier.";
command disconnect(mid:int):void
address SERVERdisconnect comment "Terminate the session.";
command destroy(mid:int):void
address SERVERdestroy comment "Destroy the handle for an Mserver.";
command ping(mid:int):int
address SERVERping comment "Test availability of an Mserver.";
command query(mid:int, qry:str):int
address SERVERquery comment "Sent the query for execution";
command query_handle(mid:int, qry:str):int
address SERVERquery handle comment "Sent the query for execution.";
pattern query_array(mid:int, qry:str, arg:str...):int
address SERVERquery array comment "Sent the query for execution replacing
’ ?’ by arguments.";
command prepare(mid:int, qry:str):int
address SERVERprepare comment "Prepare a query for execution.";
command finish(hdl:int):int
address SERVERfinish comment "Remove all remaining answers.";
command get_field_count(hdl:int):int
address SERVERget field count comment "Return number of fields.";
command get_row_count(hdl:int):int
address SERVERget row count comment "Return number of rows.";
command fetch_row(hdl:int):int
address SERVERrows affected comment "Return number of affected rows.";
command fetch_row(hdl:int):int
address SERVERfetch row comment "Retrieve the next row for analysis.";
command fetch_all_rows(hdl:int):int
address SERVERfetch all rows comment "Retrieve all rows into the cache.";
command fetch_field(hdl:int,fnr:int):str
address SERVERfetch field str comment "Retrieve a single field.";
command fetch_field(hdl:int,fnr:int):int
address SERVERfetch field int comment "Retrieve a single int field.";
command fetch_field(hdl:int,fnr:int):lng
address SERVERfetch field lng comment "Retrieve a single lng field.";
Chapter 8: The MAL Modules 136
command fetch_field(hdl:int,fnr:int):sht
address SERVERfetch field sht comment "Retrieve a single sht field.";
command fetch_field(hdl:int,fnr:int):void
address SERVERfetch field void comment "Retrieve a single void field.";
command fetch_field(hdl:int,fnr:int):oid
address SERVERfetch field oid comment "Retrieve a single void field.";
command fetch_field(hdl:int,fnr:int):chr
address SERVERfetch field chr comment "Retrieve a single chr field.";
command fetch_field_array(hdl:int):bat[:int,:str]
address SERVERfetch field bat comment "Retrieve all fields for a row.";
command fetch_line(hdl:int):str
address SERVERfetch line comment "Retrieve a complete line.";
command fetch_reset(hdl:int):int
address SERVERfetch reset comment "Reset the cache read line.";
command next_result(hdl:int):int
address SERVERnext result comment "Go to next result set.";
command error(mid:int):int
address SERVERerror comment "Check for an error in the communication.";
command getError(mid:int):str
address SERVERgetError comment "Get error message.";
command explain(mid:int):str
address SERVERexplain comment "Turn the error seen into a string.";
pattern put(mid:int, nme:str, val:any_1):void
address SERVERput comment "Send a value to a remote site.";
pattern put(nme:str, val:any_1):str
address SERVERputLocal comment "Prepare sending a value to a remote site.";
pattern rpc(key:int,qry:str...):any
address SERVERmapi rpc single row comment "Sent a simple query for exe-
cution and fetch result.";
pattern rpc(key:int,qry:str):bat[:any_1,:any_2]
address SERVERmapi rpc bat;
command rpc(key:int,qry:str):void
address SERVERquery comment "Sent a simple query for execution.";
pattern
bind(key:int,rschema:str,rtable:str,rcolumn:str,i:int):bat[:any_1,:any_2]
address SERVERbindBAT comment "Bind a remote variable to a local one.";
pattern bind(key:int,rschema:str,rtable:str,i:int):bat[:any_1,:any_2]
address SERVERbindBAT comment "Bind a remote variable to a local one.";
pattern bind(key:int,remoteName:str):bat[:any_1,:any_2]
address SERVERbindBAT comment "Bind a remote variable to a local one.";
mapi.listen();
Chapter 8: The MAL Modules 137
The partition manager also supports hash-based partitioning. Its argument is the num-
ber of hash bucket bits.
bpm.derivePartition(B,A);
The properties of the partitioned BATs are particularly useful during query optimization.
However, it only works if the BAT identifier can be determined at compile time. For SQL
it can be simply looked up in the catalog as part of a preparatory optimizer step.
To illustrate, the same problem handled by an optimizer that produces the plan based
on a known number of partitions:
In this translation Ri also gets the properties of the BATs. It is now up to the mat
optimizer to decide about further plan expansion or an iterator approach.
Chapter 8: The MAL Modules 139
The replace operator works on the assumption that the head of Rold and Rnew is
unique.
It remains possible to retrieve a partition and directly insert elements, but then it is up
to the compiler to ensure that the boundery conditions are met.
Note that a symbolic optimizer can reduce this plan to a small snippet.
The rationale for the update approach is that re-distribution of temporary results are
hidden behind the bpm.insert() interface. The only decision that should be taken by the
optimizer is the fragmentation criteria for the temporary results.
For temporary results the range bounds need not be stored in the BPM catalog. Instead,
the mat approach could be used to reduce the plan size.
pattern clrFilter(mod:str,fcn:str):void
address CMDclrFilterProfiler comment "Clear the performance trace bit of the
selected functions.";
pattern clrFilter(v:any):void
address CMDsetFilterVariable comment "Stop tracing the variable" ;
pattern setStartPoint(mod:str,fcn:str):void
address CMDstartPointProfiler comment "Start performance tracing at
mod.fcn";
pattern setEndPoint(mod:str,fcn:str)
address CMDendPointProfiler comment "End performance tracing after
mod.fcn";
pattern start():void
address CMDstartProfiler comment "Start performance tracing";
command noop():void
address CMDnoopProfiler comment "Fetch any pending performance events";
pattern stop():void
address CMDstopProfiler comment "Stop performance tracing";
command reset():void
address CMDclearTrace comment "Clear the profiler traces";
command dumpTrace():void
address CMDdumpTrace comment "List the events collected";
command getTrace(e:str):bat[:int,:any_1]
address CMDgetTrace comment "Get the trace details of a specific event";
pattern getEvent()(:lng,:lng,:lng)
address CMDgetEvent comment "Retrieve the performance indicators of the
previous instruction";
command cleanup():void
address CMDcleanup comment "Remove the temporary tables for profiling";
command getDiskReads():lng
address CMDgetDiskReads comment "Obtain the number of physical reads";
command getDiskWrites():lng
address CMDgetDiskWrites comment "Obtain the number of physical reads";
command getUserTime():lng
address CMDgetUserTime comment "Obtain the user timing information.";
command getSystemTime():lng
address CMDgetSystemTime comment "Obtain the user timing information.";
pattern getFootprint():lng
address CMDgetFootprint comment "Get the memory footprint and reset it";
pattern getMemory():lng
address CMDgetMemory comment "Get the amount of memory claimed and
reset it";
Chapter 8: The MAL Modules 143
using the create() function, which also appears in the output of list(). Connections added
using create() are currently not persistent over server restarts, but can be removed using
destroy(). They are marked "user".
Connections are activated using a call to connect(), which returns a handle that the
user has to use in successive calls to remote functions. A call to disconnect() closes the
connection for the given handle.
The first argument to the primary functions is the name of the connection the operation
has to be performed on. A connection has to be created before it can be used, otherwise a
MALexception is being raised. During creation of such connection, details like credentials,
hostname and database can be given. Currently, connections are stored in memory and not
made persistent. This could be changed in the future to allow connections created to be
remembered over server restarts.
module remote;
# module loading and unloading funcs
command prelude():void
address RMTprelude comment "Initialise the remote module.";
command epilogue():void
address RMTepilogue comment "Release the resources held by the remote mod-
ule.";
# global connection management functions
command create(dbname:str, host:str, port:int):str
address RMTcreate comment "Create a user-defined connection to a server.";
command destroy(dbname:str):void
address RMTdestroy comment "Destroy a previously user-defined connection
to a server.";
command getList()(list:bat[:oid,:str], kind:bat[:oid,:str])
address RMTgetList comment "List available databases for use with connect()
and their kind (self, user, local, remote).";
# session local connection instantiation functions
command connect(dbname:str, user:str, passwd:str):str
address RMTconnect comment "Returns a newly created connection for db-
name, user name and password.";
command connect(dbname:str, user:str, passwd:str, scen:str):str
address RMTconnectScen comment "Returns a newly created connection for
dbname, user name, password and scenario.";
command disconnect(dbname:str):void
address RMTdisconnect comment "Disconnects the connection for dbname.";
# core transfer functions
pattern get(conn:str, ident:str):any
address RMTget comment "Retrieves a copy of remote object ident.";
Chapter 8: The MAL Modules 145
8.20.1 Implementation
typedef struct connection { MT Lock lock; /* lock to avoid interference */ str
name; /* the handle for this connection */ Mapi mconn; /* the Mapi handle
Chapter 8: The MAL Modules 146
connect of ’%s’: %s", conn, mapi error str(c->mconn)); /* TODO: throw away
connection? */ } *ret = c; mal unset lock(c->lock, "remote.<findconn>");
#endif
/** * Helper function to return a connection matching a given string, or an
* error if it does not exist. Since this function is internal, it * doesn’t check
the argument conn, as it should have been checked * already. * NOTE: this
function acquires the mal remoteLock before accessing conns */ static INLINE
str RMTfindconn(connection *ret, str conn) { connection c;
/* just make sure the return isn’t garbage */ *ret = NULL;
mal set lock(mal remoteLock, "remote.<findconn>"); /* protect c */
c = conns; while (c != NULL) { if (strcmp(c->name, conn) == 0)
{ *ret = c; mal unset lock(mal remoteLock, "remote.<findconn>");
return(MAL SUCCEED); } c = c->next; } mal unset lock(mal remoteLock,
"remote.<findconn>"); throw(MAL, "remote.<findconn>", OPERA-
TION FAILED " No such active connection ’%s’", conn); }
/** * Little helper function that returns a GDKmalloced string containing a
* valid identifier that is supposed to be unique in the connection’s * remote
context. The generated string depends on the module and * function the caller
is in. It is off-hand predictable what id it * will generate, in the form of
rmt <mod> <func> <retvar> <type>. This * alligns the remote variable stack
with the local one, and allows for * reassigning with a different value. * The
encoding of the type allows for ease of type checking later on. */ static INLINE
str RMTgetId(str *ret, MalBlkPtr mb, InstrPtr p, int arg) { char buf[BUFSIZ];
InstrPtr f; char *mod; char *func; char *var; str rt;
assert (p->retc);
var = getArgName(mb, p, arg); f = getInstrPtr(mb, 0); /* top level function
*/ mod = getModuleId(f); if (mod == NULL) mod = "user"; func = getFunc-
tionId(f); rt = getTypeIdentifier(getArgType(mb,p,arg));
snprintf(buf, BUFSIZ, "rmt %s %s %s %s", mod, func, var, rt);
GDKfree(rt); *ret = GDKstrdup(buf); return(MAL SUCCEED); }
/** * Helper function to execute a query over the given connection, * returning
the result handle. If communication fails in one way or * another, an error is
returned. Since this function is internal, it * doesn’t check the input arguments
func, conn and query, as they * should have been checked already. * NOTE: this
function assumes a lock for conn is set */ static INLINE str RMTquery(MapiHdl
*ret, str func, Mapi conn, str query) { MapiHdl mhdl;
*ret = NULL; mhdl = mapi query(conn, query); if (mhdl) { if
(mapi result error(mhdl) != NULL) { str err = createException(
getExceptionType(mapi result error(mhdl)), func, "%s", getExceptionMes-
sage(mapi result error(mhdl))); mapi close handle(mhdl); return(err); } } else
{ if (mapi error(conn) != MOK) { throw(IO, func, OPERATION FAILED
": an error occurred on connection: %s", mapi error str(conn)); } else {
throw(MAL, func, OPERATION FAILED ": remote function invocation
didn’t return a result"); } }
*ret = mhdl; return(MAL SUCCEED); }
Chapter 8: The MAL Modules 153
/* get a free, typed identifier for the remote host */ RMTgetId(&tmp, mb, pci,
2); /* allocate on the stack as not to leak when we error lateron */ ident =
alloca(sizeof(char) * (strlen(tmp) + 1)); memcpy(ident, tmp, strlen(tmp) + 1);
GDKfree(tmp); /* FIXME, this is inefficient... */
/* depending on the input object generate actions to store the *
object remotely*/ if (type == TYPE any || isAnyExpression(type))
{ mal unset lock(c->lock, "remote.put"); throw(MAL, "remote.put",
OPERATION FAILED " cannot deal with ’%s’ type", getTypeName(type));
} else if (isaBatType(type)) { BATiter bi; /* naive approach using bat.new()
and bat.insert() calls */ char *head, *tail; char qbuf[BUFSIZ + 1]; /* FIXME:
this should be dynamic */ int bid; BAT *b = NULL; BUN p, q; str headv,
tailv;
head = ATOMname(getHeadType(type)); tail = ATOMname(getTailType(type));
bid = *(int *)value; if (bid != 0 && (b = BATdescriptor(bid)) ==
NULL){ mal unset lock(c->lock, "remote.put"); throw(MAL, "remote.put",
RUNTIME OBJECT MISSING); }
qbuf[BUFSIZ] = ’ 0’; snprintf(qbuf, BUFSIZ, "%s := bat.new(:%s, :%s, "
BUNFMT ");", ident, head, tail, (bid == 0 ? 0 : BATcount(b))); #ifdef
DEBUG REMOTE stream printf(cntxt->fdout, "#remote.put:%s:%s n", c-
>name, qbuf); #endif if ((tmp = RMTquery(&mhdl, "remote.put", c->mconn,
qbuf)) != MAL SUCCEED) { mal unset lock(c->lock, "remote.put"); return
tmp; } mapi close handle(mhdl);
/* b can be NULL if bid == 0 (only type given, ugh) */ if (b) {
headv = tailv = NULL; bi = bat iterator(b); BATloop(b, p, q) {
ATOMformat(getHeadType(type), BUNhead(bi, p), &headv); ATOMfor-
mat(getTailType(type), BUNtail(bi, p), &tailv); snprintf(qbuf, BUFSIZ,
"bat.insert(%s, %s:%s, %s:%s);", ident, headv, head, tailv, tail); #ifdef
DEBUG REMOTE stream printf(cntxt->fdout, "#remote.put:%s:%s n",
c->name, qbuf); #endif if ((tmp = RMTquery(&mhdl, "remote.put",
c->mconn, qbuf)) != MAL SUCCEED) { mal unset lock(c->lock, "re-
mote.put"); return tmp; } /* we leak headv and tailv here if an exception
is thrown */ mapi close handle(mhdl); } GDKfree(headv); GDKfree(tailv);
BBPunfix(b->batCacheid); } } else { str val = NULL; char qbuf[BUFSIZ
+ 1]; /* FIXME: this should be dynamic */ if (ATOMvarsized(type)) {
ATOMformat(type, *(str *)value, &val); } else { ATOMformat(type, value,
&val); } snprintf(qbuf, BUFSIZ, "%s := %s:%s; n", ident, val, ATOM-
name(type)); qbuf[BUFSIZ] = ’ 0’; GDKfree(val); #ifdef DEBUG REMOTE
stream printf(cntxt->fdout, "#remote.put:%s:%s n", c->name, qbuf);
#endif if ((tmp = RMTquery(&mhdl, "remote.put", c->mconn, qbuf)) !=
MAL SUCCEED) { mal unset lock(c->lock,"remote.put"); return tmp; }
mapi close handle(mhdl); } mal unset lock(c->lock, "remote.put");
/* return the identifier */ v = getArgReference(stk, pci, 0); v->vtype =
TYPE str; v->val.sval = GDKstrdup(ident); return(MAL SUCCEED); }
remote export str RMTregisterInternal(Client cntxt, str conn, str mod, str fcn);
Chapter 8: The MAL Modules 156
/** * stores the given <mod>.<fcn> on the remote host. * An error is returned
if the function is already known at the remote site. * The implementation is
based on serialisation of the block into a string * followed by remote parsing.
*/ str RMTregisterInternal(Client cntxt, str conn, str mod, str fcn) { str tmp,
qry, msg; connection c; char buf[BUFSIZ]; MapiHdl mhdl = NULL; Symbol
sym;
(void)cntxt;
if (conn == NULL || strcmp(conn, (str)str nil) == 0) throw(ILLARG, "re-
mote.register", ILLEGAL ARGUMENT " Connection name is NULL or nil");
/* find local definition */ sym = findSymbol(cntxt->nspace, putName(mod,
strlen(mod)), putName(fcn, strlen(fcn))); if (sym == NULL) throw(MAL, "re-
mote.register", ILLEGAL ARGUMENT " Function ’%s.%s’ not found", mod,
fcn);
/* lookup conn */ rethrow("remote.put", tmp, RMTfindconn(&c, conn));
/* this call should be a single transaction over the channel*/ mal set lock(c-
>lock, "remote.register");
/* check remote definition */ snprintf(buf, BUFSIZ, "inspect.getSignature(
"%s ", "%s ");", mod, fcn); #ifdef DEBUG REMOTE stream printf(cntxt-
>fdout, "#remote.register:%s:%s n", c->name, buf); #endif msg =
RMTquery(&mhdl, "remote.register", c->mconn, buf); if (msg ==
MAL SUCCEED) { mal unset lock(c->lock, "remote.register"); throw(MAL,
"remote.register", OPERATION FAILED " Function ’%s.%s’ already defined
on the remote site", mod, fcn); } if (mhdl) mapi close handle(mhdl);
qry = function2str(sym->def, LIST MAL STMT); #ifdef DEBUG REMOTE
stream printf(cntxt->fdout, "#remote.register:%s:%s n", c->name, qry); #en-
dif msg = RMTquery(&mhdl, "remote.register", c->mconn, qry); if (mhdl)
mapi close handle(mhdl);
mal unset lock(c->lock, "remote.register"); return msg; }
remote export str RMTregister(Client cntxt, MalBlkPtr mb, MalStkPtr stk,
InstrPtr pci);
str RMTregister(Client cntxt, MalBlkPtr mb, MalStkPtr stk, InstrPtr pci) {
str conn = *(str*) getArgReference(stk, pci, 1); str mod = *(str*) getArgRef-
erence(stk, pci, 2); str fcn = *(str*) getArgReference(stk, pci, 3); (void)mb;
return RMTregisterInternal(cntxt, conn, mod, fcn); }
remote export str RMTexec(Client cntxt, MalBlkPtr mb, MalStkPtr stk, In-
strPtr pci);
/** * exec executes the function with its given arguments on the remote * host,
returning the function’s return value. exec is purposely kept * very spartan.
All arguments need to be handles to previously put() * values. It calls the
function with the given arguments at the remote * site, and returns the handle
which stores the return value of the * remotely executed function. This return
value can be retrieved using * a get call. It does not (yet) handle multiple
return arguments. */ str RMTexec(Client cntxt, MalBlkPtr mb, MalStkPtr
Chapter 8: The MAL Modules 157
stk, InstrPtr pci) { str conn, mod, func, tmp; int i, len; connection c= NULL;
char qbuf[BUFSIZ+1]; /* FIXME: make this dynamic */ MapiHdl mhdl;
(void)cntxt; (void)mb;
for (i = 0; i < pci->retc; i++) { tmp = *(str *)getArgReference(stk, pci,
i); if (tmp == NULL || strcmp(tmp, (str)str nil) == 0) throw(ILLARG,
"remote.exec", ILLEGAL ARGUMENT ": return value %d is NULL
or nil", i); } conn = *(str*) getArgReference(stk, pci, i++); if (conn ==
NULL || strcmp(conn, (str)str nil) == 0) throw(ILLARG, "remote.exec",
ILLEGAL ARGUMENT ": connection name is NULL or nil"); mod =
*(str*) getArgReference(stk, pci, i++); if (mod == NULL || strcmp(mod,
(str)str nil) == 0) throw(ILLARG, "remote.exec", ILLEGAL ARGUMENT
": module name is NULL or nil"); func = *(str*) getArgReference(stk, pci,
i++); if (func == NULL || strcmp(func, (str)str nil) == 0) throw(ILLARG,
"remote.exec", ILLEGAL ARGUMENT ": function name is NULL or nil");
/* lookup conn */ rethrow("remote.exec", tmp, RMTfindconn(&c, conn));
/* this call should be a single transaction over the channel*/ mal set lock(c-
>lock,"remote.exec");
len = 0;
/* use previous defined remote objects to keep result */ if (pci->retc > 1)
qbuf[len++] = ’(’; for (i = 0; i < pci->retc; i++) len += snprintf(&qbuf[len],
BUFSIZ - len, "%s%s", (i > 0 ? ", " : ""), *(str *) getArgReference(stk, pci,
i));
if (pci->retc > 1 && len < BUFSIZ) qbuf[len++] = ’)’;
/* build the function invocation string in qbuf */ len += snprintf(&qbuf[len],
BUFSIZ - len, " := %s.%s(", mod, func);
/* handle the arguments to the function */ assert(pci->argc - pci->retc >= 3);
/* conn, mod, func, ... */
/* put the arguments one by one, and dynamically build the * invocation string
*/ for (i = 3; i < pci->argc - pci->retc; i++) { len += snprintf(&qbuf[len], BUFSIZ
- len, "%s%s", (i > 3 ? ", " : ""), *((str *)getArgReference(stk, pci, pci->retc
+ i))); }
/* finish end execute the invocation string */ len += snprintf(&qbuf[len],
BUFSIZ - len, ");"); #ifdef DEBUG REMOTE stream printf(cntxt-
>fdout,"#remote.exec:%s:%s n",c->name,qbuf); #endif tmp =
RMTquery(&mhdl, "remote.exec", c->mconn, qbuf); if( mhdl)
mapi close handle(mhdl); mal unset lock(c->lock,"remote.exec"); return tmp;
}
#endif
The statistics are management by a Box, which gives a controlled environment to manage
a collection of BATs and system variables.
BATs have to be deposit into the statistics box separately, because the costs attached
maintaining them are high. The consistency of the statistics box is partly the responsibility
of the upper layers. There is no automatic triggering when the BATs referenced are heavily
modified or are being destroyed. They disappear from the statistics box the first time an
invalid access is attempted or during system reboot.
The staleness of the information can be controlled in several ways. The easiest, and
most expensive, is to assure that the statistics are updated when you start the server.
Alternative, you can set a expiration interval, which will update the information only when
it is considered expired. This test will be triggered either at server restart or your explicit
call to update the statistics tables. The statistics table is commited each time you change
it.
A forced update can be called upon when the front-end expects the situation to be
changed drastically.
The statistics table is mostly used internally, but once in a while you need a dump for
closed inspection. in your MAL program for inspection. Just use the BBP bind operation
to locate them in the buffer pool.
module statistics;
pattern open():void
address STATopen comment "Locate and open the statistics box";
pattern close():void
address STATclose comment "Close the statistics box ";
pattern destroy():void
address STATdestroy comment "Destroy the statistics box";
pattern take(name:any_1):any_2
address STATtake comment "Take a variable out of the statistics box";
pattern deposit(name:str) :void
address STATdepositStr comment "Enter a new BAT into the statistics box";
pattern deposit(name:bat[:any_1,:any_2]) :void
address STATdeposit comment "Enter a new BAT into the statistics box";
pattern releaseAll():void
address STATreleaseAll comment "Release all variables in the box";
pattern release(name:str) :void
address STATreleaseStr comment "Release a single BAT from the box";
pattern release(name:bat[:any_1,:any_2]):void
address STATrelease comment "Release a single BAT from the box";
pattern toString(name:any_1):str
address STATtoString comment "Get the string representation of an element
in the box";
Chapter 8: The MAL Modules 159
The output operation is for ordered output. A bat (possibly form the collection) gives
the order. For each element in the order bat the values in the bats are searched, if all are
found they are output in the datafile, with the given separators.
The scripts from the tablet.mil file are all there too for backward compatibility with the
old Mload format files.
The load format loads the format file, since the old format file was in a table format it
can be loaded with the load command.
The result from load format can be used with load data to load the data into a set of
new bats.
These bats can be made persistent with the make persistent script or merge with existing
bats with the merge data script.
The dump format scripts dump a format file for a given set of to be dumped bats. These
bats can be dumped with dump data.
module tablet;
command load( names:bat[:oid,:str], seps:bat[:oid,:str],
types:bat[:oid,:str], datafile:str, nr:int ) :bat[:str,:bat] address CMDtablet load
comment "Load a bat using specific format.";
command input( names:bat[:oid,:str], seps:bat[:oid,:str],
types:bat[:oid,:str], s:streams, nr:int ) :bat[:str,:bat] address CMDtablet input
comment "Load a bat using specific format.";
command dump(names:bat[:oid,:str], seps:bat[:oid,:str],
bats:bat[:oid,:bat], datafile:str, nr:int) :void address CMDtablet dump com-
ment "Dump the bat in ASCII format";
command output(order:bat[:any_1,:any_2], seps:bat[:oid,:str],
bats:bat[:oid,:bat], s:streams) :void address CMDtablet output comment "Send
the bat to an output stream.";
pattern display(v:any...):int
address TABdisplayRow comment "Display a formatted row";
pattern display(v:bat[:any_1,:any]...):int
address TABdisplayTable comment "Display a formatted table";
pattern page(b:bat[:any_1,:any]...):int
address TABpage comment "Display all pages at once without header";
pattern header(b:any...):int
address TABheader comment "Display the minimal header for the table";
pattern setProperties(prop:str):int
address TABsetProperties comment "Define the set of properties";
pattern dump(s:streams,b:bat[:any,:any]...):int
address TABdump comment "Print all pages with header to a stream";
pattern setFormat(b:any...):void
address TABsetFormat comment "Initialize a new reporting structure.";
Chapter 8: The MAL Modules 164
pattern finish():void
address TABfinishReport comment "Free the storage space of the report de-
scriptor";
pattern setStream(s:streams):void
address TABsetStream comment "Redirect the output to a stream.";
pattern setPivot(b:bat[:void,:oid]) :void
address TABsetPivot comment "The pivot bat identifies the tuples of interest.
The only requirement is that all keys mentioned in the pivot tail exist in all
BAT parameters of the print comment. The pivot also provides control over
the order in which the tuples are produced.";
pattern setDelimiter(sep:str):void
address TABsetDelimiter comment "Set the column separator.";
pattern setTableBracket(lbrk:str,rbrk:str)
address TABsetTableBracket comment "Format the brackets around a table";
pattern setRowBracket(lbrk:str,rbrk:str)
address TABsetRowBracket comment "Format the brackets around a row";
Set the column properties
pattern setColumn(idx:int, v:any_1)
address TABsetColumn comment "Bind i-th output column to a variable";
pattern setName(idx:int, nme:str)
address TABsetColumnName comment "Set the display name for a given col-
umn";
pattern setBracket(idx:int,lbrk:str,rbrk:str)
address TABsetColumnBracket comment "Format the brackets around a field";
pattern setNull(idx:int, fmt:str)
address TABsetColumnNull comment "Set the display format for a null value
for a given column";
pattern setWidth(idx:int, maxwidth:int)
address TABsetColumnWidth comment "Set the maximal display witdh for a
given column. All values exceeding the length are simple shortened without
any notice.";
pattern setPosition(idx:int,f:int,i:int)
address TABsetColumnPosition comment "Set the character position to use for
this field when loading according to fixed (punch-card) layout.";
pattern setDecimal(idx:int,s:int,p:int)
address TABsetColumnDecimal comment "Set the scale and precision for nu-
meric values";
pattern setTryAll()
address TABsetTryAll comment "Skip error lines and assemble an error report";
Chapter 8: The MAL Modules 165
pattern commit(c:any...)
address TRNtrans commit comment "Commit changes in certain BATs.";
pattern abort(c:any...)
address TRNtrans abort comment "Abort changes in certain BATs.";
pattern clean(c:any...)
address TRNtrans clean comment "Declare a BAT clean without flushing to
disk.";
command prev(b:bat[:any_1,:any_2]):bat[:any_1,:any_2]
address TRNtrans prev comment "The previous stae of this BAT";
command alpha(b:bat[:any_1,:any_2]) :bat[:any_1,:any_2]
address TRNtrans alpha comment "List insertions since last commit.";
command delta(b:bat[:any_1,:any_2]) :bat[:any_1,:any_2]
address TRNtrans delta comment "List deletions since last commit.";
• HASH routines for manipulating GDK’s built-in linear-chained hash tables, for accel-
erating lookup searches on BATs.
• TM routines that provide basic transaction management primitives.
• TRG routines that provided active database support. [DEPRECATED]
• ALIGN routines that implement BAT alignment management.
The Binary Association Table (BAT) is the lowest level of storage considered in the
Goblin runtime system [Goblin] . A BAT is a self-descriptive main-memory structure that
represents the binary relationship between two atomic types. The association can be defined
over:
void: virtual-OIDs: a densely ascending column of OIDs (takes zero-storage).
bit: Booleans, implemented as one byte values.
chr: A single character (8 bits integers). DEPRECATED for storing text (Unicode
not supported).
bte: Tiny (1-byte) integers (8-bit integers).
sht: Short integers (16-bit integers).
int: This is the C int type (32-bit).
oid: Unique long int values uses as object identifier. Highest bit cleared always.
Thus, oids-s are 31-bit numbers on 32-bit systems, and 63-bit numbers on 64-
bit systems.
wrd: Machine-word sized integers (32-bit on 32-bit systems, 64-bit on 64-bit systems).
ptr: Memory pointer values. DEPRECATED. Can only be stored in transient BATs.
flt: The IEEE float type.
dbl: The IEEE double type.
lng: Longs: the C long long type (64-bit integers).
str: UTF-8 strings (Unicode). A zero-terminated byte sequence.
bat: Bat descriptor. This allows for recursive adminstered tables, but severely com-
plicates transaction management. Therefore, they CAN ONLY BE STORED
IN TRANSIENT BATs.
This model can be used as a back-end model underlying other -higher level- models,
in order to achieve better performance and data independence in one go. The relational
model and the object-oriented model can be mapped on BATs by vertically splitting every
table (or class) for each attribute. Each such a column is then stored in a BAT with type
bat[oid,attribute], where the unique object identifiers link tuples in the different BATs.
Relationship attributes in the object-oriented model hence are mapped to bat[oid,oid] tables,
being equivalent to the concept of join indexes [Valduriez87] .
The set of built-in types can be extended with user-defined types through an ADT
interface. They are linked with the kernel to obtain an enhanced library, or they are
dynamically loaded upon request.
Chapter 8: The MAL Modules 168
Types can be derived from other types. They represent something different than that
from which they are derived, but their internal storage management is equal. This feature
facilitates the work of extension programmers, by enabling reuse of implementation code,
but is also used to keep the GDK code portable from 32-bits to 64-bits machines: the oid
and ptr types are derived from int on 32-bits machines, but is derived from lng on 64 bits
machines. This requires changes in only two lines of code each.
To accelerate lookup and search in BATs, GDK supports one built-in search accelerator:
hash tables. We choose an implementation efficient for main-memory: bucket chained hash
[LehCar86,Analyti92] . Alternatively, when the table is sorted, it will resort to merge-scan
operations or binary lookups.
BATs are built on the concept of heaps, which are large pieces of main memory. They can
also consist of virtual memory, in case the working set exceeds main-memory. In this case,
GDK supports operations that cluster the heaps of a BAT, in order to improve performance
of its main-memory.
8.25.1 Rationale
The rationale for choosing a BAT as the building block for both relational and object-
oriented system is based on the following observations:
• - Given the fact that CPU speed and main-memory increase in current workstation
hardware for the last years has been exceeding IO access speed increase, traditional
disk-page oriented algorithms do no longer take best advantage of hardware, in most
database operations.
Instead of having a disk-block oriented kernel with a large memory cache, we choose
to build a main-memory kernel, that only under large data volumes slowly degrades to
IO-bound performance, comparable to traditional systems [boncz95,boncz96] .
• - Traditional (disk-based) relational systems move too much data around to save on
(main-memory) join operations.
The fully decomposed store (DSM [Copeland85)] assures that only those attributes of
a relation that are needed, will have to be accessed.
• - The data management issues for a binary association is much easier to deal with than
traditional struct-based approaches encountered in relational systems.
• - Object-oriented systems often maintain a double cache, one with the disk-based rep-
resentation and a C pointer-based main-memory structure. This causes expensive con-
versions and replicated storage management. GDK does not do such ‘pointer swizzling’.
It used virtual-memory (mmap()) and buffer management advice (madvise()) OS prim-
itives to cache only once. Tables take the same form in memory as on disk, making the
use of this technique transparent [oo7] .
A RDBMS or OODBMS based on BATs strongly depends on our ability to efficiently
support tuples and to handle small joins, respectively.
The remainder of this document describes the Goblin Database kernel implementation
at greater detail. It is organized as follows:
GDK Interface:
It describes the global interface with which GDK sessions can be started and
ended, and environment variables used.
Chapter 8: The MAL Modules 169
GDK Extensibility:
Atoms can be defined using a unified ADT interface. There is also an interface
to extend the GDK library with dynamically linked object code.
GDK Utilities:
Memory allocation and error handling primitives are provided. Layers built on
top of GDK should use them, for proper system monitoring. Thread manage-
ment is also included here.
Transaction Management:
For the time being, we just provide BAT-grained concurrency and global trans-
actions. Work is needed here.
BAT Alignment:
Due to the mapping of multi-ary datamodels onto the BAT model, we expect
many correspondences among BATs, e.g. bat(oid,attr1),.. bat(oid,attrN) ver-
tical decompositions. Frequent activities will be to jump from one attribute
to the other (‘bunhopping’). If the head columns are equal lists in two BATs,
merge or even array lookups can be used instead of hash lookups. The alignment
interface makes these relations explicitly manageable.
In GDK, complex data models are mapped with DSM on binary tables. Usually,
one decomposes N-ary relations into N BATs with an oid in the head column,
and the attribute in the tail column. There may well be groups of tables that
have the same sets of oids, equally ordered. The alignment interface is intended
to make this explicit. Implementations can use this interface to detect this
situation, and use cheaper algorithms (like merge-join, or even array lookup)
instead.
BAT Iterators:
Iterators are C macros that generally encapsulate a complex for-loop. They
would be the equivalent of cursors in the SQL model. The macro interface
(instead of a function call interface) is chosen to achieve speed when iterating
main-memory tables.
Passing values between the library routines and the enclosing C program is primarily
through value pointers of type ptr. Pointers into the BAT storage area should only be used
for retrieval. Direct updates of data stored in a BAT is forbidden. The user should adhere
to the interface conventions to guarantee the integrity rules and to maintain the (hidden)
auxiliary search structures.
The update operations come in three flavors. Element-wise updates can use BUNins,
BUNappend, BUNreplace, BUNdel, and BUNdelHead. The batch update operations are
BATins, BATappend and BATdel.
Only experts interested in speed may use BUNfastins, since it skips most consistency
checks, does not update search accelerators, and does not maintain properties such as the
hsorted and tsorted flags. Beware!
The routine BUNfnd provides fast access to a single BUN providing a value for the head
of the binary association. A very fast shortcut for BUNfnd if the selection type is known
to be integer or OID, is provided in the form of the macro BUNfndOID.
To select on a tail, one should use the reverse view obtained by BATmirror.
The routines BUNhead and BUNtail return a pointer to the first and second value in
an association, respectively. To guard against side effects on the BAT, one should normally
copy this value into a scratch variable for further processing.
Behind the interface we use several macros to access the BUN fixed part and the variable
part. The BUN operators always require a BAT pointer and BUN identifier.
• BAThtype(b) and BATttype(b) find out the head and tail type of a BAT.
• BUNfirst(b) returns a BUN pointer to the first BUN as a BAT.
• BUNlast(b) returns the BUN pointer directly after the last BUN in the BAT.
• BUNsize(b) gives the size in bytes of each BUN.
• BUNhead(b, p) and BUNtail(b, p) return pointers to the head-value and tail-value in
a given BUN.
• BUNhloc(b, p) and BUNtloc(b, p) do the same thing, but knowing in advance that the
head-atom resp. tail-atom of a BAT is fixed size.
• BUNhvar(b, p) and BUNtvar(b, p) do the same thing, but knowing in advance that
the head-atom resp. tail-atom of a BAT is variable sized.
The integrity properties to be maintained for the BAT are controlled separately. A key
property indicates that duplicates in the association dimension are not permitted. The BAT
is turned into a set of associations using BATset. Key and set properties are orthogonal
integrity constraints. The strongest reduction is obtained by making the BAT a set with
key restrictions on both dimensions.
The persistency indicator tells the retention period of BATs. The system support three
modes: PERSISTENT, TRANSIENT, and SESSION. The PERSISTENT BATs are au-
tomatically saved upon session boundary or transaction commit. TRANSIENT BATs are
removed upon transaction boundary. SESSION BATs are removed at the end of a session.
They are normally used to maintain temporary results. All BATs are initially TRANSIENT
unless their mode is changed using the routine BATmode.
The BAT properties may be changed at any time using BATkey, BATset, and BATmode.
Valid BAT access properties can be set with BATsetaccess and BATgetaccess:
BAT READ, BAT APPEND, and BAT WRITE. BATs can be designated to be read-only.
In this case some memory optimizations may be made (slice and fragment bats can point
to stable subsets of a parent bat). A special mode is append-only. It is then allowed to
insert BUNs at the end of the BAT, but not to modify anything that already was in there.
A BAT created by BATnew is considered temporary until one calls the routine BATsave
or BATmode. This routine reserves disk space and checks for name clashes in the BAT
directory. It also makes the BAT persistent. The empty BAT is initially marked as ordered
on both columns. Failure to read or write the BAT results in a NULL, otherwise it returns
the BAT pointer.
MonetDB now has a mmap trim thread that takes care of flushing the memory
mapped regions when MonetDB starts to consume too much main memory. Heaps
(that are randomly accessed) can be excluded from this mechanism, by pinning them.
BATmmap pin/unpin do this for all heaps of a BAT.
8.27.11 Printing
int BATprintf (stream *f, BAT *b)
int BATmultiprintf (stream *f, int argc, BAT *b[], int printoid,
int order, int printorderby)
The functions to convert BATs into ASCII and the reverse use internally defined for-
mats. They are primarily meant for ease of debugging and to a lesser extent for output
processing. Printing a BAT is done essentially by looping through its components, printing
each association. If an index is available, it will be used. The BATmultiprintf command
assumes a set of BATs with corresponding oid-s in the head columns. It performs the
multijoin over them, and prints the multi-column result on the file.
BATsort, but sorts the BAT itself, rather than returning a copy (BEWARE: this operation
destroys the delta information. TODO:fix). The BATrevert puts all the live BUNs of a
BAT in reverse order. It just reverses the sequence, so this does not necessarily mean that
they are sorted in reverse order!
• The ATOMcmp() operation computes two atomic values. Its parameters are pointers
to atomic values.
• The ATOMlen() operation computes the byte length for a value. ‘val’ is a direct pointer
to the atom value. Its return value should be an integer between 0 and ’mask’.
• The ATOMdel() operation deletes a var-sized atom from its heap ‘hp’. The integer
byte-index of this value in the heap is pointed to by ‘val src’.
• The ATOMput() operation inserts an atom ‘src val’ in a BUN at ‘dst pos’. This
involves copying the fixed sized part in the BUN. In case of a var-sized atom, this fixed
sized part is an integer byte-index into a heap of var-sized atoms. The atom is then
also copied into that heap ‘hp’.
• The ATOMfix() and ATOMunfix() operations do bookkeeping on the number of ref-
erences that a GDK application maintains to the atom. In MonetDB, we use this to
count the number of references directly, or through BATs that have columns of these
atoms. The only operator for which this is currently relevant is BAT. The operators
return the POST reference count to the atom. BATs with fixable atoms may not be
stored persistently.
• The ATOMfromstr() parses an atom value from string ‘s’. The memory allocation pol-
icy is the same as in ATOMget(). The return value is the number of parsed characters.
• The ATOMprint() prints an ASCII description of the atom value pointed to by ‘val’
on file descriptor ‘fd’. The return value is the number of parsed characters.
• The ATOMformat() is similar to ATOMprint(). It prints an atom on a newly allocated
string. It must later be freed with GDKfree. The number of characters written is
returned. This is minimally the size of the allocated buffer.
• The ATOMdup() makes a copy of the given atom. The storage needed for this is
allocated and should be removed by the user.
These wrapper functions correspond closely to the interface functions one has to provide
for a user-defined atom. They basically (with exception of ATOMput(), ATOMprint() and
ATOMformat()) just have the atom id parameter prepended to them.
The hash data structures are currently maintained during update operations.
A BAT can be redistributed over n buckets using a hash function with BAThashsplit.
The return value is a list of BAT pointers. Similarly, a range partitioning based is supported.
Error messages can also be collected in a user-provided buffer, instead of being echoed
to a stream. This is a thread-specific issue; you want to decide on the error mechanism on a
thread-specific basis. This effect is established with GDKsetbuf. The memory (de)allocation
of this buffer, that must at least be 1024 chars long, is entirely by the user. A pointer to
this buffer is kept in the pseudo-variable GDKerrbuf. Normally, this is a NULL pointer.
The GDKembedded variable is a property set in the configuration file to indicate that
the kernel is only allowed to run as a single process. This can be used to remove all locking
overhead. The actual state of affairs is maintained in GDKprotected, which is set when
locking is required, e.g. when multiple threads become active.
The kernel maintains a central table of all active threads. They are indexed by their tid.
The structure contains information on the input/output file descriptors, which should be
set before a database operation is started. It ensures that output is delivered to the proper
client. The Thread structure should be ideally made directly accessible to each thread.
This speeds up access to tid and file descriptors.
Its parameter is a BAT-of-BATs (in the tail); the persistence status of that BAT is
committed. We assume here that the calling thread has exclusive access to these bats. An
error is reported if you try to partially commit an already committed persistent BAT (it
needs the rollback mechanism).
HASHlooploc (BAT *b; Hash *h, size t idx; ptr value, BUN w)
HASHloopvar (BAT *b; Hash *h, size t idx; ptr value, BUN w)
SORTloop (BAT *b,p,q,tl,th,s)
The BATloop() looks like a function call, but is actually a macro. The following example
gives an indication of how they are to be used:
void
print_a_bat(BAT *b)
{
BATiter bi = bat_iterator(b);
BUN p, q;
BATloop(b, p, q)
printf("Element %3d has value %d\n",
*(int*) BUNhead(bi, p), *(int*) BUNtail(bi, p));
}
printf("%s\n==================\n", author);
HASHloop(b, (b)->H->hash, i, author)
printf("%s\n", ((str) BUNtail(b, i));
}
Note that for optimization purposes, we could have used a HASHloop str instead, and
also a BUNtvar instead of a BUNtail (since we know the tail-type of author books is string,
hence variable-sized). However, this would make the code less general.
to atom corresponding to the minimum (included) and maximum (included) bound in the
selected range of BUNs. A nil-value means that there is no bound. The ’s’ finally is an
integer denoting the bunsize, used for speed.
#define SORTloop(b,p,q,tl,th)
if (!(BATtordered(b)&1)) GDKerror("SORTloop: BAT not sorted. n");
else for (p = (ATOMcmp((b)->ttype,tl,ATOMnilptr((b)->ttype))?
SORTfndfirst(b,tl):BUNfirst(b)),
q = (ATOMcmp((b)->ttype,th,ATOMnilptr((b)->ttype))?
SORTfndlast(b,th):BUNlast(b)); p < q; p++)
/* OIDDEPEND */
#if SIZEOF_OID == SIZEOF_INT
#define SORTfnd_oid(b,v) SORTfnd_int(b,v)
#define SORTfndfirst_oid(b,v) SORTfndfirst_int(b,v)
#define SORTfndlast_oid(b,v) SORTfndlast_int(b,v)
sortloop[?.10](oid,int,oid,simple,&oid_nil)
#else
#define SORTfnd_oid(b,v) SORTfnd_lng(b,v)
#define SORTfndfirst_oid(b,v) SORTfndfirst_lng(b,v)
#define SORTfndlast_oid(b,v) SORTfndlast_lng(b,v)
sortloop[?.10](oid,lng,oid,simple,&oid_nil)
#endif
#if SIZEOF_WRD == SIZEOF_INT
#define SORTfnd_wrd(b,v) SORTfnd_int(b,v)
#define SORTfndfirst_wrd(b,v) SORTfndfirst_int(b,v)
#define SORTfndlast_wrd(b,v) SORTfndlast_int(b,v)
sortloop[?.10](wrd,int,wrd,simple,&wrd_nil)
#else
#define SORTfnd_wrd(b,v) SORTfnd_lng(b,v)
#define SORTfndfirst_wrd(b,v) SORTfndfirst_lng(b,v)
#define SORTfndlast_wrd(b,v) SORTfndlast_lng(b,v)
sortloop[?.10](wrd,lng,wrd,simple,&wrd_nil)
#endif
#define SORTloop_bit(b,p,q,tl,th) SORTloop_chr(b,p,q,tl,th)
For each BAT we maintain its dimensions as separately accessible properties. They can
be used to improve query processing at higher levels.
The routine BATsunique considers both dimensions in the double elimination it per-
forms; it produces a set. The routine BATtunique considers only the head column, and
produces a unique head column.
BATs that satisfy the set property can be further processed with the set operations
BATsunion, BATsintersect, and BATsdiff. The same operations are also available in ver-
sions that only look at the head column:BATkunion, BATkdiff, and BATkintersect (which
shares its implementation with BATsemijoin).
The kernel code modules are encapsulated with MAL wrappers. A synopsis of their
functionality is described below. The signature details can be found in the appendix.
8.39.1 Wrapping
The remainder contains the wrapper code over the version 4
8.40 InformationFunctions
In most cases we pass a BAT identifier, which should be unified with a BAT descriptor.
Upon failure we can simply abort the function.
The logical head type :oid is mapped to a TYPE void with sequenceBase. It represents
the old fashioned :vid
str
BKCnewBAT(int *res, int *ht, int *tt, BUN *cap)
{
BAT *b;
return MAL_SUCCEED;
}
throw(MAL, "bat.new", GDK_EXCEPTION);
}
str
BKCattach(int *ret, int *tt, str *heapfile)
{
BAT *b;
str
BKCdensebat(int *ret, wrd *size)
{
BAT *b;
str
BKCreverse(int *ret, int *bid)
{
BAT *b, *bn = NULL;
CMDreverse(&bn, b);
BBPreleaseref(b->batCacheid);
if (bn) {
*ret = bn->batCacheid;
BBPkeepref(bn->batCacheid);
return MAL_SUCCEED;
}
Chapter 8: The MAL Modules 191
str
BKCmirror(int *ret, int *bid)
{
BAT *b, *bn = NULL;
str
BKCrevert(int *ret, int *bid)
{
BAT *b, *bn;
str
BKCorder(int *ret, int *bid)
{
BAT *b,*bn;
bn= BATorder(b);
if(bn==NULL ){
BBPkeepref(*ret= b->batCacheid);
throw(MAL, "bat.order", GDK_EXCEPTION);
}
BBPkeepref(*ret= b->batCacheid);
return MAL_SUCCEED;
}
str
BKCorder_rev(int *ret, int *bid)
{
BAT *b,*bn;
(void) ret;
if ((b = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.order_rev", RUNTIME_OBJECT_MISSING);
}
bn= BATorder_rev(b);
if(bn==NULL ){
BBPkeepref(*ret= b->batCacheid);
throw(MAL, "bat.order_rev", GDK_EXCEPTION);
}
BBPkeepref(*ret= b->batCacheid);
return MAL_SUCCEED;
}
Insertions into the BAT may involve void types (=no storage required) These cases
should actually be captured during BUNins, because they may emerge internally as well.
void_insertbun ::=
if (b->@1type == TYPE_void && *(oid*) @1 != oid_nil &&
*(oid*) @1 != (b->@1seqbase + BUNgetpos(b, BUNlast(b))))
{
printf("val " OIDFMT " seqbase " OIDFMT " pos " BUNFMT " n", *(oid*)@1,
b->@1seqbase, BUNgetpos(b, BUNlast(b)) );
throw(MAL, "bat.insert", OPERATION_FAILED " Insert non-nil values in a void colum
}
char *
BKCinsert_bun(int *r, int *bid, ptr h, ptr t)
{
BAT *i,*b;
int param=0;
(void) r;
char *
BKCinsert_bun_force(int *r, int *bid, ptr h, ptr t, bit *force)
{
BAT *i,*b;
int param=0;
(void) r;
str
BKCinsert_bat(int *r, int *bid, int *sid)
{
BAT *i,*b, *s;
int param=0;
BBPreleaseref(i->batCacheid);
throw(MAL, "bat.insert", GDK_EXCEPTION);
}
BBPreleaseref(s->batCacheid);
BBPkeepref(*r=b->batCacheid);
BBPreleaseref(i->batCacheid);
return MAL_SUCCEED;
}
str
BKCinsert_bat_force(int *r, int *bid, int *sid, bit *force)
{
BAT *i,*b, *s;
int param=0;
str
BKCreplace_bun(int *r, int *bid, ptr h, ptr t)
{
BAT *i,*b;
int param=0;
derefStr[?.1](b,t,t)
if (BUNreplace(b, h, t, 0) == NULL) {
BBPreleaseref(b->batCacheid);
throw(MAL, "bat.replace", GDK_EXCEPTION);
}
BBPkeepref(*r=b->batCacheid);
BBPreleaseref(i->batCacheid);
return MAL_SUCCEED;
}
str
BKCreplace_bat(int *r, int *bid, int *sid)
{
BAT *i, *b, *bn, *s;
int param=0;
str
BKCreplace_bun_force(int *r, int *bid, ptr h, ptr t, bit *force)
{
BAT *b, *bn;
}
derefStr[?.1](b,h,h)
derefStr[?.1](b,t,t)
bn= BUNreplace(b, h, t, *force);
BBPreleaseref(b->batCacheid);
if(bn && bn->batCacheid != b->batCacheid)
throw(MAL, "bat.replace", OPERATION_FAILED "Different BAT returned");
BBPkeepref(*r=bn->batCacheid);
return MAL_SUCCEED;
}
str
BKCreplace_bat_force(int *r, int *bid, int *sid, bit *force)
{
BAT *b, *bn, *s;
char *
BKCdelete_bun(int *r, int *bid, ptr h, ptr t)
{
BAT *b, *bn;
return MAL_SUCCEED;
}
char *
BKCdelete(int *r, int *bid, ptr h)
{
BAT *b, *bn;
str
BKCdelete_all(int *r, int *bid)
{
BAT *b, *bn;
str
BKCdelete_bat_bun(int *r, int *bid, int *sid)
{
BAT *b, *bn, *s;
bn=BATdel(b, s,FALSE);
BBPreleaseref(s->batCacheid);
if(bn && bn->batCacheid != b->batCacheid)
throw(MAL, "bat.delete_bat_buns", OPERATION_FAILED "Different BAT returned");
BBPkeepref(*r=bn->batCacheid);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCdelete_bat(int *r, int *bid, int *sid)
{
BAT *i,*b, *s;
int param=0;
str
BKCdestroy_bat(bit *r, str *input)
{
CMDdestroy(r, *input);
return MAL_SUCCEED;
}
char *
Chapter 8: The MAL Modules 199
char *
BKCdestroy(signed char *r, int *bid)
{
BAT *b;
(void) r;
if ((b = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.destroy", RUNTIME_OBJECT_MISSING);
}
*bid = 0;
BATmode(b, TRANSIENT);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
if (delta) {
for (r = d->batInserted; r < BUNlast(d); r++) {
oid delid = *(oid *) BUNtail(di, r);
BUN
void_insert_delta(BAT *b, BAT *u)
{
BATiter ui = bat_iterator(u);
BUN nr = 0;
BUN r;
BUN
void_replace_delta(BAT *b, BAT *u)
{
BATiter ui = bat_iterator(u);
BUN nr = 0;
BUN r;
char *
BKCappend_wrap(int *r, int *bid, int *uid)
{
BAT *b, *i, *u;
int param=0;
str
BKCappend_val_wrap(int *r, int *bid, ptr u)
{
BAT *i,*b;
int param=0;
derefStr[?.1](b,t,u)
CMDsetaccess(&i,b,¶m);
BUNappend(i, u,FALSE);
BBPkeepref(*r=i->batCacheid);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCappend_reverse_val_wrap(int *r, int *bid, ptr u)
{
BAT *i,*b;
int param=0;
CMDsetaccess(&i,b,¶m);
derefStr[?.1](i,t,u)
BUNappend(BATmirror(i), u,FALSE);
BBPkeepref(*r=i->batCacheid);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
Chapter 8: The MAL Modules 202
char *
BKCappend_force_wrap(int *r, int *bid, int *uid, bit *force)
{
BAT *b,*i, *u;
int param=0;
str
BKCappend_val_force_wrap(int *r, int *bid, ptr u, bit *force)
{
BAT *b,*i;
int param=0;
CMDsetaccess(&i,b,¶m);
derefStr[?.1](i,t,u)
BUNappend(i, u, *force);
BBPkeepref(*r=i->batCacheid);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCbun_inplace(int *r, int *bid, oid *id, ptr t)
{
BAT *o;
(void) r;
Chapter 8: The MAL Modules 203
str
BKCbun_inplace_force(int *r, int *bid, oid *id, ptr t, bit *force)
{
BAT *o;
(void) r;
if ((o = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.inplace", RUNTIME_OBJECT_MISSING);
}
void_inplace5(o, *id, t, *force);
BBPreleaseref(o->batCacheid);
return MAL_SUCCEED;
}
str
BKCbat_inplace(int *r, int *bid, int *rid)
{
BAT *o, *d;
(void) r;
if ((o = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.inplace", RUNTIME_OBJECT_MISSING);
}
if ((d = BATdescriptor(*rid)) == NULL) {
BBPreleaseref(o->batCacheid);
throw(MAL, "bat.inplace", RUNTIME_OBJECT_MISSING);
}
void_replace_bat5(o, d,FALSE);
BBPreleaseref(o->batCacheid);
BBPreleaseref(d->batCacheid);
return MAL_SUCCEED;
}
str
BKCbat_inplace_force(int *r, int *bid, int *rid, bit *force)
{
BAT *o, *d;
(void) r;
Chapter 8: The MAL Modules 204
char *
BKCgetAlpha(int *r, int *bid)
{
BAT *b, *c;
char *
BKCgetDelta(int *r, int *bid)
{
BAT *b, *c;
str
BKCgetCapacity(lng *res, int *bid)
Chapter 8: The MAL Modules 205
{
CMDcapacity(res, bid);
return MAL_SUCCEED;
}
str
BKCgetHeadType(str *res, int *bid)
{
CMDhead(res, bid);
return MAL_SUCCEED;
}
str
BKCgetTailType(str *res, int *bid)
{
CMDtail(res, bid);
return MAL_SUCCEED;
}
str
BKCgetRole(str *res, int *bid)
{
BAT *b;
str
BKCsetkey(int *res, int *bid, bit *param)
{
BAT *b;
str
Chapter 8: The MAL Modules 206
str
BKCisaSet(int *res, int *bid)
{
BAT *b;
str
BKCsetSorted(bit *res, int *bid)
{
BAT *b;
str
BKCisSorted(bit *res, int *bid)
{
BAT *b;
}
*res = BATordered(b) ? 1 : 0;
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCisSortedReverse(bit *res, int *bid)
{
BAT *b;
We must take care of the special case of a nil column (TYPE void,seqbase=nil) such nil
columns never set hkey (and BUNins will never invalidate it if set) yet a nil column of a
BAT with <= 1 entries does not contain doubles => return TRUE.
str
BKCgetKey(bit *ret, int *bid)
{
BAT *b;
str
BKCpersists(int *r, int *bid, bit *flg)
{
BAT *b;
*r = 0;
return MAL_SUCCEED;
}
str
BKCsetPersistent(int *r, int *bid)
{
bit flag= TRUE;
return BKCpersists(r,bid, &flag);
}
str
BKCisPersistent(bit *res, int *bid)
{
BAT *b;
str
BKCsetTransient(int *r, int *bid)
{
BAT *b;
str
BKCisTransient(bit *res, int *bid)
{
BAT *b;
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
accessMode_export ::=
bat5_export str BKCset@1(int *res, int *bid) ;
bat5_export str BKChas@1(bit *res, int *bid);
accessMode ::=
str BKCset@1(int *res, int *bid) {
BAT *b, *bn = NULL;
int param=@2;
if( (b= BATdescriptor(*bid)) == NULL ){
throw(MAL, "bat.set@1", RUNTIME_OBJECT_MISSING);
}
CMDsetaccess(&bn,b,¶m);
BBPkeepref(*res=bn->batCacheid);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str BKChas@1(bit *res, int *bid) {
BAT *b;
if( (b= BATdescriptor(*bid)) == NULL ){
throw(MAL, "bat.set@1", RUNTIME_OBJECT_MISSING);
}
*res = BATgetaccess(b)==’@3’;
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
accessMode_export[?.2](WriteMode,0,w)
accessMode_export[?.2](ReadMode,1,r)
accessMode_export[?.2](AppendMode,2,a)
accessMode[?.2](WriteMode,0,w)
accessMode[?.2](ReadMode,1,r)
accessMode[?.2](AppendMode,2,a)
str
BKCaccess(int *res, int *bid, int *m)
{
BAT *b, *bn = NULL;
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCsetAccess(int *res, int *bid, str *param)
{
BAT *b, *bn = NULL;
int m;
int oldid;
oldid= b->batCacheid;
bn = BATsetaccess(b, m);
if ((bn)->batCacheid == b->batCacheid) {
BBPkeepref(bn->batCacheid);
} else {
BBPreleaseref(oldid);
BBPfix(bn->batCacheid);
BBPkeepref(bn->batCacheid);
}
*res = bn->batCacheid;
return MAL_SUCCEED;
}
str
BKCgetAccess(str *res, int *bid)
{
BAT *b;
Chapter 8: The MAL Modules 211
str
BKCbatdisksize(lng *tot, int *bid){
BAT *b;
if ((b = BATdescriptor(*bid)) == NULL) {
Chapter 8: The MAL Modules 212
str
BKCbatvmsize(lng *tot, int *bid){
BAT *b;
if ((b = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.getDiskSize", RUNTIME_OBJECT_MISSING);
}
CMDbatvmsize(tot,b);
BBPreleaseref(*bid);
return MAL_SUCCEED;
}
str
BKCbatsize(lng *tot, int *bid){
BAT *b;
if ((b = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.getDiskSize", RUNTIME_OBJECT_MISSING);
}
CMDbatsize(tot,b, FALSE);
BBPreleaseref(*bid);
return MAL_SUCCEED;
}
str
BKCgetStorageSize(lng *tot, int *bid)
{
BAT *b;
if (!isVIEW(b)) {
BUN cnt = BATcount(b);
str
BKCgetStorageSize_str(lng *tot, str batname)
{
int bid = BBPindex(batname);
if (bid == 0)
throw(MAL, "bat.getStorageSize", RUNTIME_OBJECT_MISSING);
return BKCgetStorageSize(tot, &bid);
}
str
BKCsetColumn(int *r, int *bid, str *tname)
{
BAT *b;
str dummy;
GDKfree(dummy);
BBPreleaseref(b->batCacheid);
*r =0;
return MAL_SUCCEED;
}
str
BKCsetColumns(int *r, int *bid, str *hname, str *tname)
{
BAT *b;
str
BKCsetName(int *r, int *bid, str *s)
{
BAT *b;
bit res, *rp = &res;
str
BKCgetBBPname(str *ret, int *bid)
{
Chapter 8: The MAL Modules 216
BAT *b;
str
BKCunload(bit *res, str *input)
{
CMDunload(res, *input);
return MAL_SUCCEED;
}
str
BKCisCached(int *res, int *bid)
{
BAT *b;
str
BKCload(int *res, str *input)
{
bat bid = BBPindex(*input);
*res = bid;
if (bid) {
BBPincref(bid,TRUE);
return MAL_SUCCEED;
}
throw(MAL, "bat.unload", ILLEGAL_ARGUMENT " File name missing");
}
str
BKChot(int *res, str *input)
{
(void) res; /* fool compiler */
Chapter 8: The MAL Modules 217
BBPhot(BBPindex(*input));
return MAL_SUCCEED;
}
str
BKCcold(int *res, str *input)
{
(void) res; /* fool compiler */
BBPcold(BBPindex(*input));
return MAL_SUCCEED;
}
str
BKCcoldBAT(int *res, int *bid)
{
BAT *b;
(void) res;
(void) bid; /* fool compiler */
if ((b = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.isCached", RUNTIME_OBJECT_MISSING);
}
BBPcold(b->batCacheid);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCheat(int *res, str *input)
{
int bid = BBPindex(*input);
if (bid) {
*res = BBP_lastused(bid) & 0x7fffffff;
}
throw(MAL, "bat", PROGRAM_NYI);
}
str
BKChotBAT(int *res, int *bid)
{
BAT *b;
(void) res;
(void) bid; /* fool compiler */
if ((b = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.isCached", RUNTIME_OBJECT_MISSING);
Chapter 8: The MAL Modules 218
}
BBPhot(b->batCacheid);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCsave(bit *res, str *input)
{
CMDsave(res, *input);
return MAL_SUCCEED;
}
str
BKCsave2(int *r, int *bid)
{
BAT *b;
if (b && BATdirty(b))
BBPsave(b);
BBPreleaseref(b->batCacheid);
*r = 0;
return MAL_SUCCEED;
}
str
BKCmmap(int *res, int *bid, int *hbns, int *tbns, int *hhp, int *thp)
{
BAT *b, *bn = NULL;
str
BKCmmap2(int *res, int *bid, int *mode)
{
return BKCmmap(res, bid, mode, mode, mode, mode);
}
str
BKCmadvise(int *res, int *bid, int *hbns, int *tbns, int *hhp, int *thp)
{
BAT *b;
str
BKCmadvise2(int *res, int *bid, int *mode)
{
return BKCmadvise(res, bid, mode, mode, mode, mode);
}
str
BKCaccbuild_std(int *ret, int *bid, int *acc)
{
(void) bid;
(void) acc;
Chapter 8: The MAL Modules 220
*ret = TRUE;
throw(MAL, "Accelerator", PROGRAM_NYI);
}
str
BKCsetHash(bit *ret, int *bid, bit *prop)
{
BAT *b;
(void) ret;
(void) prop; /* fool compiler */
if ((b = BATdescriptor(*bid)) == NULL) {
throw(MAL, "bat.setHash", RUNTIME_OBJECT_MISSING);
}
BAThash(b, 0);
BBPreleaseref(b->batCacheid);
return MAL_SUCCEED;
}
str
BKCsetSequenceBase(int *r, int *bid, oid *o)
{
BAT *b;
str
BKCsetSequenceBaseNil(int *r, int *bid, oid *o)
{
oid ov = oid_nil;
(void) o;
return BKCsetSequenceBase(r, bid, &ov);
}
str
BKCgetSequenceBase(oid *r, int *bid)
{
BAT *b;
Chapter 8: The MAL Modules 221
The module also implements the operators +, -, * and /. The rules for the
return types operators is as follows. If one of the input types is a floating point
the result will be a floating point. The largest type of the input types is taken.
The max and min functions return the maximum and minimum of the two
input parameters.
[unary operators]
This module also implements the unary abs() function, which calculates the
absolute value of the given input parameter, as well as the - unary operator.
The inv unary operation calculates the inverse of the input value. An error
message is given when the input value is zero.
[bitwise operators]
For integers there are some additional operations. The % operator implements
the congruent modulo operation. The << and >> are the left and right bit
shift. The or, and, xor and not for integers are implemented as bitwise boolean
operations.
[boolean operators]
The or, and, xor and not for the bit atomic type in MIL (this corresponds to
what is normally called boolean) are implemented as the logic operations.
[random numbers]
This module also contains the rand and srand functions. The srand () function
initializes the random number generator using a seed value. The subsequent
calls to rand () are pseudo random numbers (with the same seed the sequence
can be repeated).
The general interpretation for the NIL value is "unknown". This semantics mean that
any operation that receives at least one NIL value, will produce a NIL value in the output
for sure.
The only exception to this rule are the "==" and "!=" equality test routines (it would
otherwise become rather difficult to test whether a value is nil).
8.55.1 Algorithms
There are several approaches to build a cross table. The one chosen here is aimed at
incremental construction, such that re-use of intermediates becomes possible. Starting with
the first dimension, a BAT is derived to represent the various groups, called a GRP BAT
or cross-table BAT.
histogram operation can be used to obtain the counts of each data cube. Other aggregation
operations using the MIL set aggregate construct (bat) can be used as well; note for
instance that histogram == (b.reverse()).
The Monet interface module specification is shown below. Ideally we should defined
stronger type constraints, e.g. command group.new(attr:bat[,:any 1]
The group macro is split along three dimensions:
[type:] Type specific implementation for selecting the right hash function and
data size etc.;
[clustered:] The select the appropriate algorithm, i.e., with or without taking ad-
vantage of an order of values in the parent groups;
[physical Values , choosing between a fixed predefined and a custom hashmask.
properties:] Custom allows the user to determine the size of the hashmask (and
indirectly the estimated size of the result). The hashmask is 2n − 1
where n is given by the user, or 1023 otherwise, and the derived result
size is 4 . . . 2n .
Further research should point out whether fitting a simple statistical model (possibly a
simple mixture model) can help choose these parameters automatically; the current idea
is that the user (which could be a domain-specific extension of the higher-level language)
knows the properties of the data, especially for IR in which the standard grouping settings
differ significantly from the original datamining application.
This file system may reside on the same hardware as the database server and therefore
the writes are done to the same disk, but could also reside on another system and then the
changes are flushed through the network. The logger works under the assumption that it
is called to safeguard updates on the database when it has an exclusive lock on the latest
version. This lock should be guaranteed by the calling transaction manager first.
Finding the updates applied to a BAT is relatively easy, because each BAT contains a
delta structure. On commit these changes are written to the log file and the delta manage-
ment is reset. Since each commit is written to the same log file, the beginning and end are
marked by a log identifier.
A server restart should only (re)process blocks which are completely written to disk. A
log replay therefore ends in a commit or abort on the changed bats. Once all logs have been
read, the changes to the bats are made persistent, i.e. a bbp sub-commit is done.
if (mapi_close_handle(hdl) != MOK)
die(dbh, hdl);
if ((hdl = mapi_query(dbh, "insert into emp values(’John’, 23)")) == NULL
|| mapi_error(dbh) != MOK)
die(dbh, hdl);
mapi_close_handle(hdl);
if (mapi_error(dbh) != MOK)
die(dbh, hdl);
if ((hdl = mapi_query(dbh, "insert into emp values(’Mary’, 22)")) == NULL
|| mapi_error(dbh) != MOK)
die(dbh, hdl);
mapi_close_handle(hdl);
if (mapi_error(dbh) != MOK)
die(dbh, hdl);
if ((hdl = mapi_query(dbh, "select * from emp")) == NULL
|| mapi_error(dbh) != MOK)
die(dbh, hdl);
while (mapi_fetch_row(hdl)) {
char *nme = mapi_fetch_field(hdl, 0);
char *age = mapi_fetch_field(hdl, 1);
printf("%s is %s\n", nme, age);
}
if (mapi_error(dbh) != MOK)
die(dbh, hdl);
mapi_close_handle(hdl);
if (mapi_error(dbh) != MOK)
die(dbh, hdl);
mapi_destroy(dbh);
return 0;
}
The mapi_connect() operation establishes a communication channel with a running
server. The query language interface is either "sql", "mil" or "xquery".
Errors on the interaction can be captured using mapi_error(), possibly followed by a
request to dump a short error message explanation on a standard file location. It has been
abstracted away in a macro.
Provided we can establish a connection, the interaction proceeds as in many similar
application development packages. Queries are shipped for execution using mapi_query()
and an answer table can be consumed one row at a time. In many cases these functions
suffice.
The Mapi interface provides caching of rows at the client side. mapi_query() will load
tuples into the cache, after which they can be read repeatedly using mapi_fetch_row() or
directly accessed (mapi_seek_row()). This facility is particularly handy when small, but
stable query results are repeatedly used in the client program.
Chapter 9: Application Programming Interfaces 229
To ease communication between application code and the cache entries, the user can bind
the C-variables both for input and output to the query parameters, and output columns,
respectively. The query parameters are indicated by ’ ?’ and may appear anywhere in the
query template.
The Mapi library expects complete lines from the server as answers to query actions.
Incomplete lines leads to Mapi waiting forever on the server. Thus formatted printing is
discouraged in favor of tabular printing as offered by the table.print() commands.
The following action is needed to get a working program. Compilation of the application
relies on the monetdb-config program shipped with the distribution. It localizes the include
files and library directories. Once properly installed, the application can be compiled and
linked as follows:
cc sample.c ‘monetdb-clients-config --cflags --libs‘ -lMapi -o sample
./sample
It assumes that the dynamic loadable libraries are in public places. If, however, the
system is installed in your private environment then the following option can be used on
most ELF platforms.
cc sample.c ‘monetdb-clients-config --cflags --libs‘ -lMapi -o sample \
‘monetdb-clients-config --libs | sed -e’s:-L:-R:g’‘
./sample
The compilation on Windows is slightly more complicated. It requires more attention
towards the location of the include files and libraries.
• MapiHdl mapi stream query(Mapi mid, const char *Command, int windowsize)
Send the request for processing and fetch a limited number of tuples (determined by
the window size) to assess any erroneous situation. Thereafter, prepare for continual
reading of tuples from the stream, until an error occurs. Each time a tuple arrives, the
cache is shifted one.
• MapiHdl mapi prepare(Mapi mid, const char *Command)
Move the query to a newly allocated query handle (which is returned). Possibly interact
with the back-end to prepare the query for execution.
• MapiMsg mapi execute(MapiHdl hdl)
Ship a previously prepared command to the backend for execution. A single answer is
pre-fetched to detect any runtime error. MOK is returned upon success.
• MapiMsg mapi execute array(MapiHdl hdl, char **argv)
Similar to mapi\_execute but replacing the placeholders for the string values provided.
• MapiMsg mapi finish(MapiHdl hdl)
Terminate a query. This routine is used in the rare cases that consumption of the tuple
stream produced should be prematurely terminated. It is automatically called when a
new query using the same query handle is shipped to the database and when the query
handle is closed with mapi_close_handle().
• MapiMsg mapi virtual result(MapiHdl hdl, int columns, const char **columnnames,
const char **columntypes, const int *columnlengths, int tuplecount, const char ***tu-
ples)
Submit a table of results to the library that can then subsequently be accessed as if
it came from the server. columns is the number of columns of the result set and must
be greater than zero. columnnames is a list of pointers to strings giving the names of
the individual columns. Each pointer may be NULL and columnnames may be NULL
if there are no names. tuplecount is the length (number of rows) of the result set. If
tuplecount is less than zero, the number of rows is determined by a NULL pointer in
the list of tuples pointers. tuples is a list of pointers to row values. Each row value is a
list of pointers to strings giving the individual results. If one of these pointers is NULL
it indicates a NULL/nil value.
9.1.9 Errors
• MapiMsg mapi error(Mapi mid)
Return the last error code or 0 if there is no error.
• char *mapi error str(Mapi mid)
Return a pointer to the last error message.
• char *mapi result error(MapiHdl hdl)
Return a pointer to the last error message from the server.
• MapiMsg mapi explain(Mapi mid, FILE *fd)
Write the error message obtained from mserver to a file.
Chapter 9: Application Programming Interfaces 235
9.1.10 Parameters
• MapiMsg mapi bind(MapiHdl hdl, int fldnr, char **val)
Bind a string variable with a field in the return table. Upon a successful subsequent
mapi\_fetch\_row() the indicated field is stored in the space pointed to by val. Re-
turns an error if the field identified does not exist.
• MapiMsg mapi bind var(MapiHdl hdl, int fldnr, int type, void *val)
Bind a variable to a field in the return table. Upon a successful subsequent
mapi\_fetch\_row(), the indicated field is converted to the given type and stored in
the space pointed to by val. The types recognized are { MAPI\_TINY, MAPI\_UTINY,
MAPI\_SHORT, MAPI\_USHORT, MAPI_INT, MAPI_UINT, MAPI_LONG, MAPI_ULONG,
MAPI_LONGLONG, MAPI_ULONGLONG, MAPI_CHAR, MAPI_VARCHAR, MAPI_FLOAT,
MAPI_DOUBLE, MAPI_DATE, MAPI_TIME, MAPI_DATETIME }. The binding operations
should be performed after the mapi execute command. Subsequently all rows being
fetched also involve delivery of the field values in the C-variables using proper
conversion. For variable length strings a pointer is set into the cache.
• MapiMsg mapi bind numeric(MapiHdl hdl, int fldnr, int scale, int precision, void *val)
Bind to a numeric variable, internally represented by MAPI INT Describe the location
of a numeric parameter in a query template.
• MapiMsg mapi clear bindings(MapiHdl hdl)
Clear all field bindings.
• MapiMsg mapi param(MapiHdl hdl, int fldnr, char **val)
Bind a string variable with the n-th placeholder in the query template. No conversion
takes place.
• MapiMsg mapi param type(MapiHdl hdl, int fldnr, int ctype, int sqltype, void *val)
Bind a variable whose type is described by ctype to a parameter whose type is described
by sqltype.
• MapiMsg mapi param numeric(MapiHdl hdl, int fldnr, int scale, int precision, void
*val)
Bind to a numeric variable, internally represented by MAPI INT.
• MapiMsg mapi param string(MapiHdl hdl, int fldnr, int sqltype, char *val, int *sizeptr)
Bind a string variable, internally represented by MAPI VARCHAR, to a parameter.
The sizeptr parameter points to the length of the string pointed to by val. If sizeptr
== NULL or *sizeptr == -1, the string is NULL-terminated.
• MapiMsg mapi clear params(MapiHdl hdl)
Clear all parameter bindings.
Chapter 9: Application Programming Interfaces 236
9.1.11 Miscellaneous
• MapiMsg mapi setAutocommit(Mapi mid, int autocommit)
Set the autocommit flag (default is on). This only has an effect when the language is
SQL. In that case, the server commits after each statement sent to the server.
• MapiMsg mapi\ setAlgebra(Mapi mid, int algebra)
Tell the backend to use or stop using the algebra-based compiler.
• MapiMsg mapi cache limit(Mapi mid, int maxrows)
A limited number of tuples are pre-fetched after each execute(). If maxrows is neg-
ative, all rows will be fetched before the application is permitted to continue. Once
the cache is filled, a number of tuples are shuffled to make room for new ones, but
taking into account non-read elements. Filling the cache quicker than reading leads to
an error.
• MapiMsg mapi cache shuffle(MapiHdl hdl, int percentage)
Make room in the cache by shuffling percentage tuples out of the cache. It is sometimes
handy to do so, for example, when your application is stream-based and you process
each tuple as it arrives and still need a limited look-back. This percentage can be set
between 0 to 100. Making shuffle= 100% (default) leads to paging behavior, while
shuffle==1 leads to a sliding window over a tuple stream with 1% refreshing.
• MapiMsg mapi cache freeup(MapiHdl hdl, int percentage)
Forcefully shuffle the cache making room for new rows. It ignores the read counter, so
rows may be lost.
• char * mapi quote(const char *str, int size)
Escape special characters such as \n, \t in str with backslashes. The returned value is
a newly allocated string which should be freed by the caller.
• char * mapi unquote(const char *name)
The reverse action of mapi_quote(), turning the database representation into a C-
representation. The storage space is dynamically created and should be freed after
use.
• MapiMsg mapi output(Mapi mid, char *output)
Set the output format for results send by the server.
• MapiMsg mapi stream into(Mapi mid, char *docname, char *colname, FILE *fp)
Stream a document into the server. The name of the document is specified in docname,
the collection is optionally specified in colname (if NULL, it defaults to docname), and
the content of the document comes from fp.
• MapiMsg mapi profile(Mapi mid, int flag)
Set the profile flag to time commands send to the server.
• MapiMsg mapi trace(Mapi mid, int flag)
Set the trace flag to monitor interaction of the client with the library. It is primarilly
used for debugging Mapi applications.
• int mapi get trace(Mapi mid)
Return the current value of the trace flag.
Chapter 9: Application Programming Interfaces 237
}
}
}
{
# get values of the first column from each row:
my $row = $dbh->selectcol_arrayref(’print(b);’);
print "head[$_]: $row->[$_]\n" for 0 .. 1;
}
{
my @row = $dbh->selectrow_array(’print(b);’);
print "field[0]: $row[0]\n";
print "field[1]: $row[1]\n";
}
{
my $row = $dbh->selectrow_arrayref(’print(b);’);
print "field[0]: $row->[0]\n";
print "field[1]: $row->[1]\n";
}
$dbh->disconnect;
print "\nFinished\n";
<head>
<title>MonetDB Query</title>
</head>
<body>
<?php
if ( isset($_POST[’query’]) )
{
$db = monetdb_connect(’sql’, ’localhost’, 50000, ’monetdb’,
’monetdb’)
or die(’Failed to connect to MonetDB<br>’);
$sql = stripslashes($_POST[’query’]);
$res = monetdb_query($sql);
while ( $row = monetdb_fetch_assoc($res) )
{
print "<pre>\n";
print_r($row);
print "</pre>\n";
}
}
</html>
More examples can be found in the sources.
The PHP module is aligned with the PostgreSQL implementation. A synopsis of the
operations provided:
• proto resource monetdb connect([string host [, string port [, string username [, string
password [, string language]]]]]) Open a MonetDB connection
• proto resource monetdb pconnect([string host [, string port [, string username [, string
password [, string language]]]]]) Open a persistent MonetDB connection
• proto bool monetdb close([resource connection]) Close a MonetDB connection
• proto string monetdb dbname([resource connection]) Get the database name
• proto string monetdb last error([resource connection]) Get the error message string
• proto string monetdb host([resource connection]) Returns the host name associated
with the connection
Chapter 9: Application Programming Interfaces 241
• proto array monetdb version([resource connection]) Returns an array with client, pro-
tocol and server version (when available)
• proto bool monetdb ping([resource connection]) Ping database. If connection is bad,
try to reconnect.
• proto resource monetdb query([resource connection,] string query) Execute a query
• proto resource monetdb query params([resource connection,] string query, array
params) Execute a query
• proto resource monetdb prepare([resource connection,] string stmtname, string query)
Prepare a query for future execution
• proto resource monetdb execute([resource connection,] string stmtname, array params)
Execute a prepared query
• proto int monetdb num rows(resource result) Return the number of rows in the result
• proto int monetdb num fields(resource result) Return the number of fields in the result
• proto int monetdb affected rows(resource result) Returns the number of affected tuples
• proto string pg last notice(resource connection) Returns the last notice set by the
back-end
• proto string monetdb field name(resource result, int field number) Returns the name
of the field
• proto string monetdb field table(resource result, int field number) Returns the name
of the table field belongs to
• proto string monetdb field type(resource result, int field number) Returns the type of
the field
• proto int monetdb field num(resource result, string field name) Returns the field num-
ber of the named field
• proto mixed monetdb fetch result(resource result, [int row number,] mixed field name)
Returns values from a result identifier
• proto array monetdb fetch row(resource result [, int row [, int result type]]) Get a row
as an enumerated array
• proto array monetdb fetch assoc(resource result [, int row]) Fetch a row as an assoc
array
• proto array monetdb fetch array(resource result [, int row [, int result type]]) Fetch a
row as an array
• proto object monetdb fetch object(resource result [, int row [, string class name [,
NULL|array ctor params]]]) Fetch a row as an object
• proto bool monetdb result seek(resource result, int offset) Set internal row offset
• proto int monetdb field prtlen(resource result, [int row,] mixed field name or number)
Returns the printed length
• proto int monetdb field is null(resource result, [int row,] mixed field name or number)
Test if a field is NULL
• proto bool monetdb free result(resource result) Free result memory
• proto string monetdb escape string(string data) Escape string for text/char type
• proto int monetdb connection status(resource connnection) Get connection status
Chapter 9: Application Programming Interfaces 242
9.4.1 Installation
The unix configure process normally tries to detect if you have Python including developer
packages installed and builds the Python module only if you have it. With the –with-python
option you could tell ’configure’ where to find the Python installation.
Chapter 9: Application Programming Interfaces 243
When the build process is complete you should have a Python library di-
rectory under your MonetDB prefix directory. Usually this is something like
prefix/lib(64)/python2.4/site-packages. The exact location is revealed by executing
monetdb-clients-config --pythonlibdir . Now add this directory to your
PYTHONPATH.
For windows setups the story is a little bit more complex. TODO
c = x.cursor()
c.execute(’select * from tables’)
print c.fetchone()
# print c.fetchall()
x.close()
/**
* This example assumes there exist tables a and b filled with some data.
* On these tables some queries are executed and the JDBC driver is tested
* on it’s accuracy and robustness against ’users’.
*
* @author Fabian Groffen
*/
public class MJDBCTest {
public static void main(String[] args) throws Exception {
// make sure the driver is loaded
Class.forName("nl.cwi.monetdb.jdbc.MonetDriver");
Connection con = DriverManager.getConnection("jdbc:monetdb://localhost/database", "m
Statement st = con.createStatement();
ResultSet rs;
st.setMaxRows(5);
// we ask the database for 22 rows, while we set the JDBC driver to
// 5 rows, this shouldn’t be a problem at all...
rs = st.executeQuery("select * from a limit 22");
// read till the driver says there are no rows left
for (int i = 0; rs.next(); i++) {
System.out.print("[" + rs.getString("var1") + "]");
System.out.print("[" + rs.getString("var2") + "]");
System.out.print("[" + rs.getInt("var3") + "]");
System.out.println("[" + rs.getString("var4") + "]");
}
// this close is not needed, as the Statement will close the last
// ResultSet around when it’s closed
// again, if that can take some time, it’s nicer to close immediately
// the reason why these closes are commented out here, is to test if
// the driver really cleans up it’s mess like it should
//rs.close();
The ODBC driver for MonetDB is included in the Windows installer and Linux RPMs.
The source can be found in the SQL CVS tree.
To help you setup your system to use the ODBC driver with MonetDB, two how-tos are
available, one for Windows users and one for Linux/UNIX users.
In Excel, select from the drop down menu, first Data, then Get External Data, and finally
New Database Query...
If MonetDB was installed correctly, there should be an entry MonetDB in the dialog
box that opens. Select it and click on OK.
Chapter 9: Application Programming Interfaces 249
In the wizard that opens, scroll down in the list on the left hand side and select voyages.
Then click on the button labeled > and then on Next >.
Chapter 9: Application Programming Interfaces 250
A new dialog window opens. Click on OK to insert the data into the current Excel
worksheet.
Chapter 9: Application Programming Interfaces 253
That’s all.
As Superuser, start the unixODBC configuration program ODBCConfig and select the
Drivers tab.
Chapter 9: Application Programming Interfaces 255
On this tab, click on the button labeled Add... and fill in the fields as follows.
Name MonetDB
Description
ODBC Driver for MonetDB SQL Server
Chapter 9: Application Programming Interfaces 256
Driver <path-to-MonetDB>/lib(64)/libMonetODBC.so
Setup <path-to-MonetDB>/lib(64)/libMonetODBCs.so
Don’t change the other fields. When done, click on the check mark in the top left corner
of the window. The first window should now contain an entry for MonetDB. Click on OK
On the User DSN tab click on the Add... button. A new window pops up in which you
have to select the ODBC driver. Click on the entry for MonetDB and click on OK.
Chapter 9: Application Programming Interfaces 258
Name MonetDB
Description
Default MonetDB Data Source
Chapter 9: Application Programming Interfaces 259
Host localhost
Port 50000
User monetdb
Password monetdb
Don’t change the other fields. When done, click on the check mark in the top left corner
of the window. The first window should now contain an entry for MonetDB. Click on OK
Appendix A: Instruction Summary 260
crackers.deleteAVL crackers.fmcreateMapcrackers.insertionsPartiallyForget
crackers.deletionsOnNeed crackers.joinselect crackers.pmtselect
crackers.deletionsOnNeedGradually crackers.joinuselect crackers.positionproject
crackers.deletionsOnNeedGraduallyRipple crackers.mapCount crackers.printAVLTree int
crackers.djoinselect crackers.fmremoveMapcrackers.markedproject
crackers.dproject crackers.fullAlignment crackers.printCrackerBAT
date.!= date.<= date.> date.date
date.< date.== date.>= date.isnil
daytime.!= daytime.<= daytime.> daytime.isnil
daytime.< daytime.== daytime.>=
factories.getArrival factories.getDeparture factories.shutdown
factories.getCaller factories.getOwners factories.getPlants
group.avg group.max group.prelude group.size
group.count group.min group.refine group.sum
group.derive group.new group.refine reverse group.variance
identifier.identifier identifier.prelude
inet.!= inet.> inet.host inet.new
inet.< inet.>= inet.hostmask inet.setmasklen
inet.<< inet.>> inet.isnil inet.text
inet.<<= inet.>>= inet.masklen
inet.<= inet.abbrev inet.netmask
inet.= inet.broadcast inet.network
inspect.equalType inspect.getComment inspect.getSignature inspect.getStatistics
inspect.getAddress inspect.getDefinition inspect.getType
inspect.getAddresses inspect.getEnvironment inspect.getTypeIndex
inspect.getAtomNames inspect.getFunction inspect.getSignatures
inspect.getAtomSizes inspect.getKind inspect.getSize inspect.getTypeName
inspect.getAtomSuper inspect.getModule inspect.getSource inspect.getWelcome
io.data io.import io.prompt io.stdout
io.export io.print io.stderr io.table
io.ftable io.printf io.stdin
language.assert language.newRange language.setIOTrace language.source
language.assertSpace language.nextElementlanguage.setMemoryTrace
language.call language.raise language.setThreadTrace
language.dataflow language.register language.setTimerTrace
lock.create lock.set lock.try
lock.destroy lock.tostr lock.unset
mal.multiplex
manual.completion manual.help manual.search manual.summary
manual.createXML manual.index manual.section
mapi.bind mapi.fetch field array mapi.query array
mapi.connect mapi.fetch line mapi.listen ssl mapi.query handle
mapi.connect ssl mapi.fetch reset mapi.lookup mapi.reconnect
mapi.destroy mapi.fetch row mapi.malclient mapi.resume
mapi.disconnect mapi.finish mapi.next result mapi.rpc
mapi.error mapi.getError mapi.ping mapi.setAlias
mapi.explain mapi.get field count mapi.prepare mapi.stop
Appendix A: Instruction Summary 264
The table below summarizes the commentary lines encountered in the system associated
with a MAL kernel modules.
algebra.hashsplit Split a BAT on tail column according (hash-value MOD buckets). Re-
turns a recursive BAT, containing the fragments in the tail, their bucket
number in the head.
algebra.indexjoin Hook directly into the index implementation of the join.
algebra.intersect
algebra.join Returns all BUNs, consisting of a head-value from ’left’ and a tail-value
from ’right’ for which there are BUNs in ’left’ and ’right’ with equal
tail- resp. head-value (i.e. the join columns are projected out).
algebra.joinPath internal routine to handle join paths. The type analysis is rather tricky.
algebra.kdifference Returns the difference taken over only the *head* columns of
two BATs. Results in all BUNs of ’left’ that are *not* in
’right’. It does *not* do double-elimination over the ’left’ BUNs.
If you want this, use: ’kdifference(left.kunique,right.kunique)’ or:
’kdifference(left,right).kunique’.
algebra.kintersect Returns the intersection taken over only the *head* columns of
two BATs. Results in all BUNs of ’left’ that are also in
’right’. Does *not* do double- elimination over the ’left’ BUNs.
If you want this, use: ’kintersect(kunique(left),kunique(right))’ or:
’kunique(kintersect(left,right))’.
algebra.kunion Returns the union of two BATs; looking at head-columns only. Re-
sults in all BUNs of ’left’ that are not in ’right’, plus all BUNs of
’right’. *no* double-elimination is done. If you want this, do: ’ku-
nion(left.kunique,right.kunique)’ or: ’sunion(left,right).kunique’.
algebra.kunique Select unique tuples from the input BAT. Double elimination is done
only looking at the head column. The result is a BAT with property
hkeyed() == true.
algebra.leftfetchjoin
Hook directly into the left fetch join implementation.
algebra.leftjoin
algebra.like Selects all elements that have ’substr’ as in the tail.
algebra.markH Produces a new BAT with fresh unique dense sequense of OIDs in the
head that starts at base (i.e. [base,..base+b.count()-1] ).
algebra.markT Produces a BAT with fresh unique OIDs in the tail starting at 0@0.
algebra.mark grp "grouped mark": Produces a new BAT with per group a locally unique
dense ascending sequense of OIDs in the tail. The tail of the first BAT
(b) identifies the group that each BUN of b belongs to. The second
BAT (g) represents the group extend, i.e., the head is the unique list
of group IDs from b’s tail. The third argument (s) gives the base value
for the new OID sequence of each group.
algebra.materialize Materialize the void column
algebra.merge Merge head and tail into a single value
algebra.mergejoin Hook directly into the merge implementation of the join.
algebra.number Produces a new BAT with identical head column, and consecutively
increasing integers (start at 0) in the tail column.
Appendix B: Instruction Help 269
algebra.outerjoin Returns all the result of a join, plus the BUNS formed NIL in the
tail and the head-values of ’outer’ whose tail-value does not match an
head-value in ’inner’.
algebra.position Returns the position of the value pair It returns an error if ’val’ does
not exist.
algebra.project Fill the tail column with a constant taken from the aligned BAT.
algebra.rangesplit Split a BAT on tail column in ’ranges’ equally sized consecutive ranges.
Returns a recursive BAT, containing the fragments in the tail, the
higher-bound of the range in the head. The higher bound of the last
range is ’nil’.
algebra.reuse Reuse a temporary BAT if you can. Otherwise, allocate enough storage
to accept result of an operation (not involving the heap)
algebra.revert Returns a BAT copy with buns in reverse order
algebra.sample Produce a random selection of size ’num’ from the input BAT.
algebra.sdifference Returns the difference taken over *both* columns of two BATs. Re-
sults in all BUNs of ’left’ that are *not* in ’right’. Does *not* do
double-elimination over the ’left’ BUNs. If you want this, use: ’sdiffer-
ence(left.sunique,right.sunique)’ or: ’sdifference(left,right).sunique’.
algebra.select Select all BUNs of a BAT with a certain tail value. Selection on NIL
is also possible (it should be properly casted, e.g.:int(nil)).
algebra.selectH
algebra.selectNotNil Select all not-nil values
algebra.semijoin Returns the intersection taken over only the *head* columns of
two BATs. Results in all BUNs of ’left’ that are also in
’right’. Does *not* do double-elimination over the ’left’ BUNs.
If you want this, use: ’kintersect(kunique(left),kunique(right))’ or:
’kunique(kintersect(left,right))’.
algebra.sintersect Returns the intersection taken over *both* columns of two BATs. Re-
sults in all BUNs of ’left’ that are also in ’right’. Does *not* do
double-elimination over the ’left’ BUNs, If you want this, use: ’sinter-
sect(sunique(left),sunique(right))’ or: ’sunique(sintersect(left,right))’.
algebra.slice Return the slice with the BUNs at position x till y.
algebra.sort Returns a BAT copy sorted on the head column.
algebra.sortHT Returns a lexicographically sorted copy on head,tail.
algebra.sortReverse Returns a BAT copy reversely sorted on the tail column.
algebra.sortReverseTail
Returns a BAT copy reversely sorted on the tail column.
algebra.sortTH Returns a lexicographically sorted copy on tail,head.
algebra.sortTail Returns a BAT copy sorted on the tail column.
algebra.split Split head into two values
algebra.ssort Returns copy of a BAT with the BUNs sorted on ascending head values.
This is a stable sort.
algebra.ssort rev Returns copy of a BAT with the BUNs sorted on descending head
values. This is a stable sort.
Appendix B: Instruction Help 270
algebra.sunion Returns the union of two BATs; looking at both columns of both
BATs. Results in all BUNs of ’left’ that are not in ’right’, plus all
BUNs of ’right’. *no* double-elimination is done. If you want this, do:
’sunion(left.sunique,right.sunique)’ or: ’sunion(left,right).sunique’.
algebra.sunique Select unique tuples from the input BAT. Double elimination is done
over BUNs as a whole (head and tail). Result is a BAT with real set()
semantics.
algebra.thetajoin Theta join on for ’mode’ in { LE, LT, EQ, GT, GE }. JOIN EQ is
just the same as join(). All other options do merge algorithms. Either
using the fact that they are ordered() already (left on tail, right on
head), or by using/creating binary search trees on the join columns.
algebra.thetaselect The theta (<=,<,=,>,>=) select()
algebra.thetauselect The theta (<=,<,=,>,>=) select() limited to head values
algebra.topN Trim all but the top N tuples.
algebra.tunique Select unique tuples from the input BAT. Double elimination is done
over the BUNs tail. The result is a BAT with property tkeyd()== true
algebra.uhashsplit Same as hashsplit, but only collect the head values in the fragments
algebra.union
algebra.unique
algebra.urangesplit Same as rangesplit, but only collect the head values in the fragments
algebra.uselect Value select, but returning only the head values. SEE
ALSO:select(bat,val)
array.grid Fills an index BAT, (grpcount,grpsize,clustersize,offset) and shift all
elemenets with a factor s
array.product Produce an array product
array.project Fill an array representation with constants
bat.append append the value u to i
bat.attach Returns a new BAT with dense head and tail of the given type and
uses the given file to initialize the tail. The file will be owned by the
server.
bat.delete Delete from the first BAT all BUNs with a corresponding BUN in the
second.
bat.densebat Creates a new [void,void] BAT of size ’size’.
bat.flush Designate a BAT as not needed anymore.
bat.getAccess return the access mode attached to this BAT as a character.
bat.getAlpha Obtain the list of BUNs added
bat.getCapacity Returns the current allocation size (in max number of elements) of a
BAT.
bat.getDelta Obtain the list of BUNs deleted
bat.getDiskSize Approximate size of the (persistent) BAT heaps as stored on disk in
pages of 512 bytes. Indices are not included, as they only live tem-
porarily in virtual memory.
bat.getHead return the BUN head value using the cursor.
bat.getHeadType Returns the type of the head column of a BAT, as an integer type
number.
bat.getHeat Return the current BBP heat (LRU stamp)
Appendix B: Instruction Help 271
bat.getMemorySize Calculate the size of the BAT heaps and indices in bytes rounded to
the memory page size (see bbp.getPageSize()).
bat.getName Gives back the logical name of a BAT.
bat.getRole Returns the rolename of the head column of a BAT.
bat.getSequenceBaseGet the sequence base for the void column of a BAT.
bat.getSize Calculate the size of the BAT descriptor, heaps and indices in bytes.
bat.getSpaceUsed Determine the total space (in bytes) occupied by a BAT.
bat.getStorageSize Determine the total space (in bytes) reserved for a BAT.
bat.getTail return the BUN tail value using the cursor.
bat.getTailType Returns the type of the tail column of a BAT, as an integer type
number.
bat.hasAppendModereturn true if to this BAT is append only.
bat.hasMoreElementsProduce the next bun for processing.
bat.hasReadMode return true if to this BAT is read only.
bat.hasWriteMode return true if to this BAT is read and write.
bat.info Produce a BAT containing info about a BAT in [attribute,value] for-
mat. It contains all properties of the BAT record. See the BAT docu-
mentation in GDK for more information.
bat.inplace inplace replace values on the given locations
bat.insert Insert one BUN[h,t] in a BAT.
bat.isCached Bat is stored in main memory.
bat.isPersistent
bat.isSorted Returns whether a BAT is ordered on head or not.
bat.isSortedReverse Returns whether a BAT is ordered on head or not.
bat.isSynced Tests whether two BATs are synced or not.
bat.isTransient
bat.isaKey return whether the head column of a BAT is unique (key).
bat.isaSet return whether the BAT mode is set to unique.
bat.load Load a particular BAT from disk
bat.mirror Returns the head-mirror image of a BAT (two head columns).
bat.new Localize a bat by name and produce a clone.
bat.newIterator Process the buns one by one extracted from a void table.
bat.order Sorts the BAT itself on the head, in place.
bat.orderReverse Reverse sorts the BAT itself on the head, in place.
bat.pack Pack a pair of values into a BAT.
bat.partition Create a series of cheap slices over the first argument
bat.reduce Drop auxillary BAT structures.
bat.replace Replace the tail value of one BUN that has some head value.
bat.reverse Returns the reverse view of a BAT (head is tail and tail is head).
BEWARE no copying is involved; input and output refer to the same
object!
bat.revert Puts all BUNs in a BAT in reverse order. (Belongs to the BAT sequence
module)
bat.save Save a BAT to storage, if it was loaded and dirty. Returns whether
IO was necessary. Please realize that calling this function violates the
atomic commit protocol!!
Appendix B: Instruction Help 272
bat.setAccess Try to change the update access priviliges to this BAT. Mode: r[ead-
only] - allow only read access. a[append-only] - allow reads and update.
w[riteable] - allow all operations. BATs are updatable by default. On
making a BAT read-only, all subsequent updates fail with an error
message.Returns the BAT itself.
bat.setAppendMode Change access privilige of BAT to append only
bat.setBase Give the non-empty BATs consecutive oid bases.
bat.setCold Makes a BAT very cold for the BBP. The chance of being choses for
swapout is big, afterwards.
bat.setColumn Give both columns of a BAT a new name.
bat.setGarbage Designate a BAT as garbage.
bat.setHash
bat.setHot Makes a BAT very hot for the BBP. The chance of being chosen for
swapout is small, afterwards.
bat.setKey Sets the ’key’ property of the head column to ’mode’. In ’key’ mode, the
kernel will silently block insertions that cause a duplicate entries in the
head column. KNOWN BUG:when ’key’ is set to TRUE, this function
does not automatically eliminate duplicates. Use b := b.kunique;
bat.setMemoryAdvisealias for madvise(b, mode, mode, mode, mode)
bat.setMemoryMap Alias for mmap(b, mode, mode, mode, mode)
bat.setName Give a logical name to a BAT.
bat.setPersistent Make the BAT persistent. Returns boolean which indicates if the BAT
administration has indeed changed.
bat.setReadMode Change access privilige of BAT to read only
bat.setRole Give a logical name to the columns of a BAT.
bat.setSet Sets the ’set’ property on this BAT to ’mode’. In ’set’ mode, the ker-
nel will silently block insertions that cause a duplicate BUN [head,tail]
entries in the BAT. KNOWN BUG:when ’set’ is set to TRUE, this func-
tion does not automatically eliminate duplicates. Use b := b.sunique;
Returns the BAT itself.
bat.setSorted Assure BAT is ordered on the head.
bat.setTransient Make the BAT transient. Returns boolean which indicates if the BAT
administration has indeed changed.
bat.setWriteMode Change access privilige of BAT to read and write
bat.unload Swapout a BAT to disk. Transient BATs can also be swapped out.
Returns whether the unload indeed happened.
bat.unpack Extract the first tuple from a BAT.
batcalc.!= Equate a bat of strings against a singleton
batcalc.% Binary BAT calculator function with new BAT result
batcalc.* Binary BAT calculator function with new BAT result
batcalc.+ Concatenate two strings.
batcalc.++ Unary minus over the tail of the bat
batcalc.- Unary minus over the tail of the bat
batcalc.– Unary minus over the tail of the bat
batcalc./ Binary BAT calculator function with new BAT result
batcalc.< Compare a bat of timestamp against a singleton
Appendix B: Instruction Help 273
batmmath.log
batmmath.log10
batmmath.pow
batmmath.sin
batmmath.sinh
batmmath.sqrt
batmmath.tan
batmmath.tanh
batmtime.day
batmtime.hours
batmtime.milliseconds
batmtime.month
batmtime.seconds
batmtime.year
batstr.chrAt String array lookup operation.
batstr.endsWith Suffix check.
batstr.length Return the length of a string.
batstr.like
batstr.like uselect Perform SQL like operation against a string bat
batstr.ltrim Strip whitespaces from start of a string.
batstr.nbytes Return the string length in bytes.
batstr.r search Reverse search for a substring. Returns position, -1 if not found.
batstr.replace Insert a string into another
batstr.rtrim Strip whitespaces from end of a string.
batstr.search Search for a substring. Returns position, -1 if not found.
batstr.startsWith Prefix check.
batstr.string Return the tail s[offset..n] of a string s[0..n].
batstr.substitute Substitute first occurrence of ’src’ by ’dst’. Iff repeated = true this
is repeated while ’src’ can be found in the result string. In order to
prevent recursion and result strings of unlimited size, repeating is only
done iff src is not a substring of dst.
batstr.substring Substring extraction using [start,start+length]
batstr.toLower Convert a string to lower case.
batstr.toUpper Convert a string to upper case.
batstr.trim Strip whitespaces around a string.
batstr.unicodeAt get a unicode character (as an int) from a string position.
bbp.bind Locate the BAT using its BBP index in the BAT buffer pool
bbp.close Close the bbp box.
bbp.commit Commit updates for this client.
bbp.deposit Relate a logical name to a physical BAT in the buffer pool.
bbp.destroy Schedule a BAT for removal at session end or immediately.
bbp.discard Remove the BAT from the box.
bbp.getCount Create a BAT with the cardinalities of all known BATs
bbp.getDirty Create a BAT with the dirty/ diffs/clean status
bbp.getDiskSpace Estimate the amount of disk space occupied by dbfarm
bbp.getHeadType Map a BAT into its head type
Appendix B: Instruction Help 275
calc.sqlblob
calc.str coercion dbl to str
calc.timestamp
calc.void
calc.wrd coercion dbl to wrd
calc.xor
clients.addScenario add the given scenario to the allowed scenarios for the given user
clients.addUser Allow user with password access to the given scenarios
clients.changePassword
Change the password for the current user
clients.changeUsername
Change the username of the user into the new string
clients.checkPermission
Check permission for a user
clients.exit Terminate the session for a single client using a soft error.
clients.getActions Pseudo bat of client’s command counts.
clients.getId Return a number that uniquely represents the current client.
clients.getInfo Pseudo bat with client attributes.
clients.getLastCommand
Pseudo bat of client’s last command time.
clients.getLogins Pseudo bat of client login time.
clients.getScenario Retrieve current scenario name.
clients.getTime Pseudo bat of client’s total time usage(in usec).
clients.getUsername Return the username of the currently logged in user
clients.getUsers return a BAT with user id and name available in the system with access
to the given scenario(s)
clients.quit Terminate the server. This command can only be initiated from the
console.
clients.removeScenario
remove the given scenario from the allowed scenarios for the given user
clients.removeUser Remove the given user from the system
clients.setHistory Designate console history file for readline.
clients.setListing Turn on/off echo of MAL instructions: 2 - show mal instruction, 4 -
show details of type resolutoin, 8 - show binding information.
clients.setPassword Set the password for the given user
clients.setScenario Switch to other scenario handler, return previous one.
clients.shutdown Close all client connections. If forced=false the clients are moved into
FINISHING mode, which means that the process stops at the next
cycle of the scenario. If forced=true all client processes are immediately
killed
clients.stop Stop the query execution at the next eligble statement.
clients.suspend Put a client process to sleep for some time. It will simple sleep for a
second at a time, until the awake bit has been set in its descriptor
clients.wakeup Wakeup a client process
cluster.column Reorder tail of the BAT using the cluster map
cluster.key Create the hash key list
Appendix B: Instruction Help 279
crackers.buildAVLIndex
Create an AVL tree index for this BAT
crackers.cacheConsciousCrackHashJoin
Join two maps based on head values. Align the maps to avoid overlap-
ping pieces. Reuse hash tables
crackers.cacheConsciousCrackHashJoinAlignOnly
Join two maps based on head values. Align the maps to avoid overlap-
ping pieces. Reuse hash tables
crackers.crackHashJoin
Join two maps based on head values. Align the maps to avoid overlap-
ping pieces. Reuse hash tables
crackers.crackJoin Join two maps based on head values. Align the maps to avoid overlap-
ping pieces
crackers.crackOrdered
Break a BAT into three pieces with tail<mid, tail==mid, tail>mid,
respectively; maintaining the head-oid order within each piece.
crackers.crackOrdered validate
Validate whether a BAT is correctly broken into five pieces with
tail<low, tail==low, low<tail<hgh, tail==hgh, tail>hgh, respectively;
maintaining the head-oid order within each piece.
crackers.crackUnordered validate
Validate whether a BAT is correctly broken into five pieces with
tail<low, tail==low, low<tail<hgh, tail==hgh, tail>hgh, respectively.
crackers.deleteAVL Delete a collection of values from the index
crackers.deletionsOnNeed
Keep the deletions BAT separatelly and do a complete merge only if a
relevant query arrives in the future
crackers.deletionsOnNeedGradually
Keep the deletions BAT separatelly and merge only what is needed if
a relevant query arrives in the future
crackers.deletionsOnNeedGraduallyRipple
Keep the deletions BAT separatelly and merge only what is needed
using ripple if a relevant query arrives in the future
crackers.djoinselect Use the pivot. For each tuple in pivot with a 0, check if the respective
tuple (in the same position) in the tail of cpair satisfies the range
restriction. If yes mark the pivot BUN as 1.
crackers.dproject Sync the cracking pair and project the tail. Use for disjunctive queries
that require a larger bit vector
crackers.dselect Crack based on dbl and evaluate the dbl disjunctive predicate outside
the cracked area. Return a bit vector.
crackers.extendCrackerBAT
Extend the cracker column by P positions
crackers.extendCrackerMap
Extend the cracker map by P positions
crackers.fmaddReference
Appendix B: Instruction Help 281
crackers.markedproject
Sync the cracking pair and project the tail. The result bat has a marked
head
crackers.materializeHead
Materialize the head of BAT b
crackers.pmaddReference
add bp reference to map set of b
crackers.pmclearReferences
clear all references to b
crackers.pmjoinselect
Use the pivot. For each tuple in pivot with a 1, check if the respective
tuple (in the same position) in the tail of cpair(collection of pieces)
satisfies the range restriction. If not mark the pivot BUN as 0.
crackers.pmmaxTail Sync/crack the map and get the max of the tail
crackers.pmproject Sync the map and project the tail based on the pivot
crackers.pmselect Crack based on dbl and evaluate the dbl conjunctive predicate. Return
a bit vector.
crackers.pmtselect Crack based on dbl and project the dbl tail .
crackers.positionproject
Sync the cracking pair and project the tail. The pivot holds the posi-
tions to be projected
crackers.printAVLTree int
Print the AVL Tree of the cracker index (for debugging purposes)
crackers.printCrackerBAT
Print the cracker BAT of b
crackers.printCrackerDeletions
Print the pending deletions of the cracker BAT of b
crackers.printCrackerIndexBATpart
Print the cracker index of b
crackers.printCrackerInsertions
Print the pending insertions of the cracker BAT of b
crackers.printPendingInsertions
Print the pending insertions
crackers.project Sync the cracking pair and project the tail
crackers.projectH Sync the cracking pair and project the head
crackers.select Retrieve the subset using a cracker index producing preferably a
BATview.
crackers.select2 Similar to select but always make sure that we do not create a large
piece i.e., bigger than half the size of the cracked piece
crackers.selectAVL Retrieve the subset using the AVL index
crackers.setStorageThreshold
set the maximum number of total tuples that can be stored in sideways
maps
crackers.simpleJoin Join two maps based on head values by exploiting the already existing
partitioning information
crackers.singlePassJoin
Appendix B: Instruction Help 283
First partition on separate pieces the left input based on the right
index. Then join matching pieces
crackers.sizeCrackerDeletions
Get the size of the pending deletions of the cracker BAT of b
crackers.sizeCrackerInsertions
Get the size of the pending insertions of the cracker BAT of b
crackers.sizePendingInsertions
Get the size of the pending insertions for this map
crackers.sortBandJoin
Band Join two maps based on head values. First sort the right BAT
and then continuously binary search the right BAT for each tuple of
the left one
crackers.tselect Retrieve the subset tail using a cracker index producing preferably a
BATview.
crackers.uselect Retrieve the subset using a cracker index producing preferably a
BATview.
crackers.verifyCrackerIndex
Check the cracker index and column, whether each value is in the
correct chunk
crackers.zcrackOrdered
Break a BAT into three pieces with tail<=low, low<tail<=hgh,
tail>hgh, respectively; maintaining the head-oid order within each
piece.
crackers.zcrackOrdered validate
Validate whether a BAT is correctly broken into three pieces with
tail<=low, low<tail<=hgh, tail>hgh, respectively; maintaining the
head-oid order within each piece.
crackers.zcrackUnordered
Break a BAT into three pieces with tail<=low, low<tail<=hgh,
tail>hgh, respectively.
crackers.zcrackUnordered validate
Validate whether a BAT is correctly broken into three pieces with
tail<=low, low<tail<=hgh, tail>hgh, respectively.
date.!= Equality of two dates
date.< Equality of two dates
date.<= Equality of two dates
date.== Equality of two dates
date.> Equality of two dates
date.>= Equality of two dates
date.date Noop routine.
date.isnil Nil test for date value
daytime.!= Equality of two daytimes
daytime.< Equality of two daytimes
daytime.<= Equality of two daytimes
daytime.== Equality of two daytimes
daytime.> Equality of two daytimes
Appendix B: Instruction Help 284
mdb.dot Dump the data flow of the function M.F in a format recognizable by
the command ’dot’ on the file s
mdb.dump Dump instruction, stacktrace, and stack
mdb.getContext Extract the context string from the exception message
mdb.getDebug Get the kernel debugging bit-set. See the MonetDB configuration file
for details
mdb.getDefinition Returns a string representation of the current function with typing
information attached
mdb.getException Extract the variable name from the exception message
mdb.getReason Extract the reason from the exception message
mdb.getStackDepth Return the depth of the calling stack.
mdb.getStackFrame Collect variable binding of current (n-th) stack frame.
mdb.getStackTrace
mdb.grab Stop and debug another client process.
mdb.inspect Run the debugger on a specific function
mdb.lifespan Dump the current routine lifespan information on standard out.
mdb.list Dump the routine M.F on standard out.
mdb.listMapi Dump the current routine on standard out with Mapi prefix.
mdb.modules List available modules
mdb.setCatch Turn on/off catching exceptions
mdb.setCount Turn on/off bat count statistics tracing
mdb.setDebug Set the kernel debugging bit-set and return its previous value.
mdb.setFlow Turn on/off memory flow debugger
mdb.setIO Turn on/off io statistics tracing
mdb.setMemory Turn on/off memory statistics tracing.
mdb.setMemoryTraceTurn on/off memory foot print tracer for debugger
mdb.setThread Turn on/off thread identity for debugger
mdb.setTimer Turn on/off performance timer for debugger
mdb.setTrace Turn on/off tracing of a variable
mdb.start Start interactive debugger on a running factory
mdb.stop Stop the interactive debugger
mdb.var Dump the symboltable of routine M.F on standard out.
mkey.bulk rotate xor hash
pre: h and b should be synced on head post: [:xor=]([:rotate=](h,
nbits), [hash](b))
mkey.hash compute a hash int number from any value
mkey.rotate left-rotate an int by nbits
mmath.acos The acos(x) function calculates the arc cosine of x, that is the value
whose cosine is x. The value is returned in radians and is mathemati-
cally defined to be between 0 and PI (inclusive).
mmath.asin The asin(x) function calculates the arc sine of x, that is the value whose
sine is x. The value is returned in radians and is mathematically defined
to be between -PI/20 and -PI/2 (inclusive).
mmath.atan The atan(x) function calculates the arc tangent of x, that is the value
whose tangent is x. The value is returned in radians and is mathemat-
ically defined to be between -PI/2 and PI/2 (inclusive).
Appendix B: Instruction Help 289
mmath.atan2 The atan2(x,y) function calculates the arc tangent of the two variables
x and y. It is similar to calculating the arc tangent of y / x, except
that the signs of both arguments are used to determine the quadrant
of the result. The value is returned in radians and is mathematically
defined to be between -PI/2 and PI/2 (inclusive).
mmath.ceil The ceil(x) function rounds x upwards to the nearest integer.
mmath.cos The cos(x) function returns the cosine of x, where x is given in radians.
The return value is between -1 and 1.
mmath.cosh The cosh() function returns the hyperbolic cosine of x, which is defined
mathematically as (exp(x) + exp(-x)) / 2.
mmath.cot The cot(x) function returns the Cotangent of x, where x is given in
radians
mmath.exp The exp(x) function returns the value of e (the base of natural loga-
rithms) raised to the power of x.
mmath.fabs The fabs(x) function returns the absolute value of the floating-point
number x.
mmath.finite The finite(x) function returns true if x is neither infinite nor a ’not-a-
number’ (NaN) value, and false otherwise.
mmath.floor The floor(x) function rounds x downwards to the nearest integer.
mmath.fmod The fmod(x,y) function computes the remainder of dividing x by y.
The return value is x - n * y, where n is the quotient of x / y, rounded
towards zero to an integer.
mmath.isinf The isinf(x) function returns -1 if x represents negative infinity, 1 if x
represents positive infinity, and 0 otherwise.
mmath.isnan The isnan(x) function returns true if x is ’not-a-number’ (NaN), and
false otherwise.
mmath.log The log(x) function returns the natural logarithm of x.
mmath.log10 The log10(x) function returns the base-10 logarithm of x.
mmath.pi return an important mathematical value
mmath.pow The pow(x,y) function returns the value of x raised to the power of y.
mmath.rand return a random number
mmath.round The round(n, m) returns n rounded to m places to the right of the
decimal point; if m is omitted, to 0 places. m can be negative to round
off digits left of the decimal point. m must be an integer.
mmath.sin The sin(x) function returns the cosine of x, where x is given in radians.
The return value is between -1 and 1.
mmath.sinh The sinh() function returns the hyperbolic sine of x, which is defined
mathematically as (exp(x) - exp(-x)) / 2.
mmath.sqrt The sqrt(x) function returns the non-negative square root of x.
mmath.srand initialize the rand() function with a seed
mmath.tan The tan(x) function returns the tangent of x, where x is given in radians
mmath.tanh The tanh() function returns the hyperbolic tangent of x, which is de-
fined mathematically as sinh(x) / cosh(x).
mtime.add returns the timestamp that comes ’msecs’ (possibly negative) after
’value’.
Appendix B: Instruction Help 290
recycle.monitor start/stop the monitoring (printing) of the recycler info (storage size
used and number of statements retained)
recycle.prelude Called at the start of a recycle controlled function
recycle.reset Reset off all recycled variables
recycle.setCachePolicy
Set recycler cache policy with alpha parameter
recycle.setRetainPolicy
Set recycler retainment policy: 0- RETAIN NONE: baseline, keeps
stat, no retain, no reuse 1- RETAIN ALL: infinite case, retain all
2- RETAIN CAT: time-based semantics, retain if beneficial 3- RE-
TAIN ADAPT: adaptive temporal
recycle.setReusePolicy
Set recycler reuse policy
recycle.shutdown Clear the recycle cache
recycle.start Initialize recycler for the current block
recycle.stop Cleans recycler bookkeeping
remote.connect Returns a newly created connection for dbname, user name and
password.
remote.create Create a user-defined connection to a server.
remote.destroy Destroy a previously user-defined connection to a server.
remote.disconnect Disconnects the connection for dbname.
remote.epilogue Release the resources held by the remote module.
remote.exec Remotely executes <mod>.<func> using the argument list of remote
objects and returns the handle to its result
remote.get Retrieves a copy of remote object ident.
remote.getList List available databases with their property for use with connect().
remote.prelude Initialise the remote module.
remote.put Copies object to the remote site and returns its identifier.
remote.register Register <mod>.<fcn> at the remote site.
replicator.bind Create a named persistent BAT if it was not known
replicator.bind dbat Create a named persistent BAT if it was not known
replicator.setMaster Mark the source of this database
replicator.setVersion
Keep the latest version in the symbol table as a constant
sabaoth.epilogue Release the resources held by the sabaoth module
sabaoth.getLocalConnectionHost
Returns the hostname this server can be connected to, or nil if none
sabaoth.getLocalConnectionPort
Returns the port this server can be connected to, or 0 if none
sabaoth.marchConnection
Publishes the given host/port as available for connecting to this server
sabaoth.marchScenario
Publishes the given language as available for this server
sabaoth.prelude Initialise the sabaoth module
sabaoth.retreatScenario
Unpublishes the given language as available for this server
Appendix B: Instruction Help 295
str.substring Extract a substring from str starting at start, for length len
str.suffix Extract the suffix of a given length
str.toLower Convert a string to lower case.
str.toUpper Convert a string to upper case.
str.trim Strip whitespaces around a string.
str.unicode convert a unicode to a character.
str.unicodeAt get a unicode character (as an int) from a string position.
streams.blocked open a block based stream
streams.close close and destroy the stream s
streams.flush flush the stream
streams.openRead convert an ascii stream to binary
streams.openReadBytes
open a file stream for reading
streams.openWrite convert an ascii stream to binary
streams.openWriteBytes
open a file stream for writing
streams.readInt read integer data from the stream
streams.readStr read string data from the stream
streams.socketRead open ascii socket stream for reading
streams.socketReadBytes
open a socket stream for reading
streams.socketWrite open ascii socket stream for writing
streams.socketWriteBytes
open a socket stream for writing
streams.writeInt write data on the stream
streams.writeStr write data on the stream
tablet.display Display a formatted table
tablet.dump Print all pages with header to a stream
tablet.finish Free the storage space of the report descriptor
tablet.firstPage Produce the first page of output
tablet.getPage Produce the i-th page of output
tablet.getPageCnt Return the size in number of pages
tablet.header Display the minimal header for the table
tablet.input Load a bat using specific format.
tablet.lastPage Produce the last page of output
tablet.load Load a bat using specific format.
tablet.nextPage Produce the next page of output
tablet.output Send the bat to an output stream.
tablet.page Display all pages at once without header
tablet.prevPage Produce the prev page of output
tablet.setBracket Format the brackets around a field
tablet.setColumn Bind i-th output column to a variable
tablet.setComplaintsThe comlaints bat identifies all erroneous lines encountered
tablet.setDecimal Set the scale and precision for numeric values
tablet.setDelimiter Set the column separator.
tablet.setFormat Initialize a new reporting structure.
tablet.setName Set the display name for a given column
Appendix B: Instruction Help 298
tablet.setNull Set the display format for a null value for a given column
tablet.setPivot The pivot bat identifies the tuples of interest. The only requirement is
that all keys mentioned in the pivot tail exist in all BAT parameters
of the print comment. The pivot also provides control over the order
in which the tuples are produced.
tablet.setPosition Set the character position to use for this field when loading according
to fixed (punch-card) layout.
tablet.setProperties Define the set of properties
tablet.setRowBracketFormat the brackets around a row
tablet.setStream Redirect the output to a stream.
tablet.setTableBracket
Format the brackets around a table
tablet.setTryAll Skip error lines and assemble an error report
tablet.setWidth Set the maximal display witdh for a given column. All values exceeding
the length are simple shortened without any notice.
timestamp.!= Equality of two timestamps
timestamp.< Equality of two timestamps
timestamp.<= Equality of two timestamps
timestamp.== Equality of two timestamps
timestamp.> Equality of two timestamps
timestamp.>= Equality of two timestamps
timestamp.epoch convert seconds since epoch into a timestamp
timestamp.isnil Nil test for timestamp value
timestamp.unix epoch The Unix epoch time (00:00:00 UTC on January 1, 1970)
timezone.str
timezone.timestamp Utility function to create a timestamp from a number of seconds since
the Unix epoch
transaction.abort Abort changes in certain BATs.
transaction.alpha List insertions since last commit.
transaction.clean Declare a BAT clean without flushing to disk.
transaction.commit Commit changes in certain BATs.
transaction.delta List deletions since last commit.
transaction.prev The previous stae of this BAT
transaction.subcommit
commit only a set of BATnames, passed in the tail (to which you must
have exclusive access!)
transaction.sync Save all persistent BATs
txtsim.editdistance Alias for Levenshtein(str,str)
txtsim.editdistance2 Calculates Levenshtein distance (edit distance) between two strings.
Cost of transposition is 1 instead of 2
txtsim.levenshtein Calculates Levenshtein distance (edit distance) between two strings
txtsim.qgramnormalize
’Normalizes’ strings (eg. toUpper and replaces non-alphanumerics with
one space
txtsim.qgramselfjoin QGram self-join on ordered(!) qgram tables and sub-ordered q-gram
positions
Appendix B: Instruction Help 299