Cloudant API Reference PDF
Cloudant API Reference PDF
Release 1.0.2
CONTENTS
Getting Started
1.1 Introduction to Cloudant . . . . . .
1.2 Prerequisites and Basics . . . . . .
1.3 Create Read Update Delete (CRUD)
1.4 Introduction to Querying . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
5
9
19
API Reference
2.1 API Basics . . . . . . . . . . . . .
2.2 Authentication Methods . . . . . .
2.3 Authorization Settings . . . . . . .
2.4 Databases . . . . . . . . . . . . . .
2.5 Documents . . . . . . . . . . . . .
2.6 Design Documents . . . . . . . . .
2.7 Miscellaneous . . . . . . . . . . .
2.8 Local (non-replicating) Documents
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
31
. 31
. 36
. 37
. 39
. 58
. 71
. 96
. 104
107
107
111
116
Guides
4.1 The CAP Theorem . . . . . . . . . . . . . . . . .
4.2 MapReduce . . . . . . . . . . . . . . . . . . . . .
4.3 Document Versioning and MVCC . . . . . . . . .
4.4 CouchApps and Tiers of Application Architecture
4.5 Replication . . . . . . . . . . . . . . . . . . . . .
4.6 Back up your data . . . . . . . . . . . . . . . . .
4.7 How to monitor indexing and replication tasks . .
4.8 Data that Moves: Switching Clusters . . . . . . .
4.9 Transactions in Cloudant . . . . . . . . . . . . . .
119
119
120
124
127
127
133
136
139
139
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
ii
CHAPTER
ONE
GETTING STARTED
The Getting Started guide is a collection of newly written tutorials, links to currently available documenation for
Cloudant and Apache CouchDB, and links to relevant API references. In addition, we will use material from
the Cloudant Blog, stackoverflow.com and other external sites. It is organized in a way to help you learn to use
Cloudant from the ground up with no prior knowledge. These will lead you from the basics of HTTP and JSON
up through to our most advanced features, including examples and discussions on application design.
database will be restricted to making HTTP requests to add data, defining MapReduce Views, search index functions, and other special server-side functions that are tailored to your application, and then retrieving those results.
Document Store
For most novices, when they hear the word database a number of things come to mind. People are informed about
databases through academic course work and pop culture (movies, TV, country music, etc.). In addition, databases
had been relatively static for the past 30 years or so (before Googles MapReduce and Amazons Dynamo paradigm
shifts). So, when novices hear the word database they often think of tables, which are the typical representations
of relations in a relational database system. However, this is not at all what a Cloudant database looks like.
A Cloudant database is a collection of JSON documents. In a sense, a JSON document store is almost completely
orthogonal to a relational table. A JSON document is a set of key-value pairs, whereas a relation is a set of tuples.
Key-Value
A key-value pair is simply the name of something (key) and its value. In a JSON document it looks something
like this
{
"name":"Adam"
}
JSON documents
Imagine that you have a relation with a schema that has three columns name, age, and gender. Each entry in
the relation would be a tuple that looks something like (Adam, 26, M) or (Sue, 32, F). A JSON document
would contain both the column name, as the key, and the value. These tuples would be stored in a JSON document
like this.
A value can be a string, number, an array or another object (set of key-value pairs). Heres an example JSON
document
{
"name": "adam",
"age": 26,
"car": {
"make": "ford",
"model": "mustang",
"year": 1965
},
"numbers": [
7,
21,
"goats"
],
"colors": [
"blue",
"green",
{"cheese":"caprifeuille"}
],
"friends": [
{
"name": "sean",
"phone": 5551212
},
{
"name": "kara",
"email": "kara@internet.com"
}
]
}
A document store provides a number of advantages over the relational model. JSON documents support a nested,
richer data format, that allows for flexibility and map better to many of todays applications. You can find more
information about JSON in our documentation, in Apache CouchDB docs, and at http://json.org/, which includes
libraries and tools for various languages for handling JSON.
JSON documents in Cloudant
For each JSON document in a Cloudant database, there are two special key-value pairs that are required. These
are the "_id" and the "_rev". The value of the "_id" key is how the database system identifies each document and must be unique in each database. The "_rev" is used to implement multi-version concurrency control
(MVCC). Each time you upload a document to the database for a specific "_id", the "_rev" value incremements up. This is to detect conflicts if a particular document is simultaneously updated by multiple clients on
different nodes in the cluster. MVCC is not a version control system dont even try.
A document in Cloudant will look something like this
{
"_id":"bbc9e6125aca5cffb1cf65aefeb105ec",
"_rev":"1-4c6114c65e295552ab1019e2b046b10e",
"name":"Adam",
"age":26,
"gender":"M"
}
Additionally, there other special keys for each document: _attachments, _conflicts and _deleted,
which will be discussed in detail later. In general, however, you may not give any top-level key a name that begins
with an underscore, such as _foo:bar. Attempts to do so will be returned an error.
Schemaless
Schemaless means that the database management system does not enforce any schema on your data and it does
not depend on any schema, other than the special keys that start with an underscore. Each document in a database
can have a completely unique schema or they can all have the same schema. Its up to your design. Additionally,
the schema of your documents can be changed at any time and for any subset of your documents as needed.
For example, I can later change the document above to
{
"_id":"bbc9e6125aca5cffb1cf65aefeb105ec",
"_rev":"2-a561c4119b2c2f34aa2219ff3504710c",
"phone":"12062959632",
"name":"Adam",
"age":26,
"height":185,
"mass":77.1
}
without having to notify the system of the change or modify any other documents.
Schemaless does not refer to the schema of your applications data. Whenever you build an application you
should probably create (and enforce, if you wish) your own schema for the data you wish to store.
Cloudant Search
With Cloudant Search, built on Lucene, you can index the values of any set of keys found in your documents and
then query that index for exact word matches, numerical matches or fuzzy matches. Cloudant Search allows you to
do more ad-hoc queries over your data than can be done with the primary and secondary indexes. We recommend
using Cloudant Search if your particular use case is to find documents based on multidimensional queries.
Geo-Spatial
Geo-Spatial queries allow you to query your database documents based upon a restricted geographical area. By
adding a GeoJSON object to your documents you may querying the database to retrieve documents that fall within
an arbitrarily shaped geographic region. This could be used for querying data associated with physical locations
on Earth, or you could use this to create heat-maps for events in a 3D game world. This feature is currently in
beta, however. Please contact us if youre interested in joining our early-adoption program.
Replication
Replication is very similar to replaying the activites on one database onto another database. One can replicate
any Cloudant database to another Cloudant database or to other Apache CouchDB-like databases that support
the replication protocol, including mobile platform libraries. Replication can be unidrectional or bidirectional to
create a multi-master system.
Global Data Distribution
Cloudant has partnerships with a number of providers that own data centers around the world: Rackspace, Amazon, IBM/Softlayer, and Microsoft Azure. Cloudant can put your data in any location these companies have data
centers, which lets you put your data closer to your users. Additionally, your data can be distributed to multiple
data centers throughout the world as needed via replication.
Other Features
In addition to the features mentioned above, Cloudants database management system provides
arbitrary data attachment to individual JSON documents
document validation functions to enforce schema, user requirements and some types of business logic.
list and show functions to manipulate documents and secondary index queries server-side before returned
to the client
chained MapReduce to send results to another database to run other MapReduce jobs.
_changes feed to observe all changes made to a database
_db_updates feed to observe updates to all of your accounts databases (early-adoption beta only)
Once the virtual environment is activated, you may then install subsequent libraries in isolation from the rest of
your system. When you return later to continue learning, from the learn_cloudant directory youll need to source
venv/bin/activate to access any libraries installed specifically for this environment.
Alternatively, one could use the virtualenvwrapper.
The Python library we use throughout the Getting Started guides is requests.
requests: http://docs.python-requests.org/en/latest/index.html
pip install requests
cURL
cURL is a shell-based communication tool that supports various networking transfer protocols, including HTTP.
It is usually installed by default on Mac OS X and most Linux distributions or may be installed through a package
manager.
http://curl.haxx.se
./jq
We use jq to print pretty, slice and transform the JSON returned by cURL HTTP calls to the database.
http://stedolan.github.io/jq/
where the headers, data, and parameters are optional, depending on the HTTP verb and API endpoint. In an HTTP
request, the parameters are typically formed as param=value and are separated by an ampersand (&). The
header will almost always be Content-Type: application/json and the data will almost always be in
JSON format. One exception to this is when you upload attachments to a document in your database. In that case,
your header should match the MIME type of the data.
Some HTTP request examples are shown below using cURL and Python. Additionally, the link to the API reference page for each of these commands is provided, which will help the novice to understand how to read the
API.
Example HTTP Requests
GET Your Cluster Information
import requests
response = requests.get(
"http://username.cloudant.com/_all_dbs",
auth=(username, password)
)
print response.json()
DELETE a Document
params={"rev":<doc._rev>}
)
print response.json()
Weve left out the specific values here because often times the _id looks something like
3a4d992b78c7a7361b0a50ef963c1a1e and the _rev like 1-4c6114c65e295552ab1019e2b046b10e,
which would make that slightly less readable.
Also note that the python requests library will form the proper URI for you with the params object.
The Response object returned by the requests module contains the HTTP response code (r.status_code) and
data (r.text), which can be decoded as JSON via r.json().
Additionally, one can use the PUT /db/doc to create a new document. The difference is that you must specify the
"_id" of the document in the URI (gotcha: when using PUT /db/doc, if the "_id" key is specified in the
document, it will be ignored and the "_id" in the URI will be used).
10
Read
Documents can be retrieved by knowing the value of the "_id" and constructing an HTTP GET request to the
proper URI.
API Reference: GET /db/doc
In
the
example
above,
the
value
of
"_id"
assigned
to
the
document
was
0562df6ffcc4301b1e2c5d1061214489. Yours, of course, will be different. The URI for this document is
https://username.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489
If you used the "bob456" for the document "_id", then it would be
https://username.cloudant.com/users/bob456
Here is an example curl command to retrieve this document
curl -X GET -u username https://username.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489 | jq
These examples should print the same document to screen. You will now notice that your document contains the
new keys, "_id" and "_rev".
11
You will often read from the database things other than an entire single document. For example, youll very often
want the results of one of your MapReduce Views. As mentioned in the section on making HTTP requests, you
will still make an HTTP GET request but with a different API endpoint that corresponds to your MapReduce View
result. But the code required to make the HTTP GET, of course, remains exactly the same. Using MapReduce
will be further explained in the Querying section.
Update
Update an existing document on the database.
API Reference: POST /db and PUT /db/doc
At Cloudant, we encourage you to build systems based on immutable documents. That is, instead of updating a
particular document, you POST to the database new documents. Then, you use an incrementally built MapReduce
View to reconstruct the state of your application. See this guide on the subject.
However, sometimes your application design really does need to make updates to existing documents. And that is
perfectly okay.
Making updates to a document is the same as creating that document on the database except that you will need
to include the "_id" and "_rev". In fact, due to the schemaless nature of the Cloudant database, those are the
only key-value pairs that you need. The "_rev" value will have to match the latest value in the document found
on the database. A mismatched "_rev" will result in an error.
For example
# update_user.py
import requests
import json
auth = (username, password)
headers = {Content-type: application/json}
get_url = "http://{0}.cloudant.com/users/0562df6ffcc4301b1e2c5d1061214489".format(auth[0])
r = requests.get(get_url, auth=auth)
#update the document
doc = r.json()
doc[high_score] = 623
doc[level] = 5
#note the doc already has the _rev value, which, in this case, should match the value on the datab
print doc
#with the doc updated with a new high_score and level, we POST it back to the database
post_url = "http://{0}.cloudant.com/users".format(auth[0])
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(doc)
)
print json.dumps(r.json(), indent=1)
You can also update documents with PUT /db/doc. Again, the URI should end with the "_id", and the "_rev"
in the document must match the latest value on the server.
Another way to update documents is to write a specific update handler function in a _design document. This
lets you update only specific keys in a document in your database via a POST request to the URI for that update
handler (a kind of in-place update). This eliminates the need to upload the entire document to change a single
12
key. Using an update function can even be used to create new documents or modify incoming new documents.
For example, you could use an update handler to add a time-stamp on the server-side. We plan to cover this in
the future, but for now we refer you to the CouchDB documentation on update handlers. However, with update
handlers you can only update one document at a time. Well show you below how to create/update multiple
documents in a single HTTP request.
Delete
Delete a single document on the database.
API Reference: DELETE /db/doc
There are two ways to delete a document.
You can make an HTTP DELETE request to the
/db/doc?rev=doc_rev endpoint, or you can add a "_deleted":True key-value pair to the document
and POST that document to the database. Either method can be used for a single document, but the second
method is the only way to delete multiple documents in a single HTTP request. The single document delete call
is shown here and the multiple document delete request is shown below.
To please the gods of verbosity, here is an example curl command
13
To faciliate the subsequent bulk operations in the examples below, the following python script will generate a
number of new documents in your users database. We intend to use these examples in subsequent examples.
import requests
import json
import random
usernames = (alice, bob, cartman, mario, zelda, sawyer, daniel, sabine, luigi)
bulkdocs = {"docs":[]}
for aname in usernames
bulkdocs["docs"].append(
{
"_id":aname,
"first_name":aname,
"high_score":500 + 100*random.random(),
"level": 4 + int(3*random.random())
}
)
auth = (username, password)
headers = {Content-type: application/json}
post_url = "http://{0}.cloudant.com/users/_bulk_docs".format(auth[0])
r = requests.post(
14
post_url,
auth=auth,
headers=headers,
data=json.dumps(bulkdocs)
)
print json.dumps(r.json(), indent=1)
Read
Read multiple documents in a single HTTP request.
API Reference: GET /db/_all_docs
To read multiple documents, one queries either the primary index via /db/_all_docs or a secondary index
created by a MapReduce View. Since we havent introduced MapReduce Views and secondary indexes yet, well
just show how to use the /db/_all_docs endpoint along with a few options.
A first example is GET /db/_all_docs?limit=10. We use the limit=10 option to keep the output to just
ten keys (in case you decided to modify the script above and added a zillion docs).
curl -u username https://username.cloudant.com/users/_all_docs?limit=10 | jq .
and in Python
import requests
import json
auth = (username, password)
get_url = "http://{0}.cloudant.com/users/_all_docs?limit=10".format(auth[0])
r = requests.get(get_url, auth=auth)
print json.dumps(r.json(), indent=1)
The return from this API endpoint returns the following JSON structure
{
"total_rows": N,
"offset": M,
"rows": [
{"key": <doc._id>, "value": {"rev":<doc._rev>}, "id":<doc._id>},
...
]
}
where N and M are integers. The total_rows are the total number of rows in the index (or MapReduce View
result) and the offset tells you which row the rows array starts on relative to all rows in the index. Each element
of "rows" is an object that contains at least three key-value pairs with keys of "key", "value", and "id".
The values of the "key" and "value" depend on the index that was queried. In the case of /db/_all_docs,
the value of "key" is always the value of the document "_id", and the value of "value" is always a JSON
object with a "rev":<doc._rev> pair. The value of "id" is the value of "_id" of the document from which
the "key" and "value" were derived. Yes, the "_id" of each document is found twice in each row - but this
is particuar to the /db/_all_docs endpoint. Yes, this can seem confusing! Take a look at the data returned by
your requests to make sure this is clear to you.
There are a number of parameters that can be used with the /db/_all_docs endpoint. However, well just
cover two of them here and save the rest for later.
15
We can ask to get all of the documents in the query response with the parameter include_docs=true.
curl -u username https://username.cloudant.com/users/_all_docs?limit=10&include_docs=true | jq .
and in Python (note that I can use the params argument to requests.get)
import requests
import json
auth = (username, password)
get_url = "http://{0}.cloudant.com/users/_all_docs".format(auth[0])
params = {"include_docs":"true", "limit":10} #make sure to use "true" not True
r = requests.get(get_url, auth=auth, params=params)
print json.dumps(r.json(), indent=1)
With include_docs=true, each element of "rows" will have fourth key-value pair; the key is "doc" and
its value will be the entire JSON document.
{
"total_rows": N,
"offset": M,
"rows": [
{"key": <doc._id>, "value": {"rev":<doc._rev>}, "id":<doc._id>, "doc":<doc>},
...
]
}
With the startkey and endkey options we can splice out a subset of the results. That is, the "rows" array
will only contains the rows where the values of "key" are in the inclusive range of [startkey, endkey].
Unless, of course, you use the inclusive_end=false option. In this example, we get all documents with
an _id that begins with the letter, s or S. Notice that this cURL command requires single quotes around the
URL because of the double quotes around the values of startkey and endkey.
and in Python
import requests
import json
auth = (username, password)
get_url = "http://{0}.cloudant.com/users/_all_docs".format(auth[0])
params = {
"include_docs":"true", #this is a boolean in the URI string
"limit":10,
"startkey":"\"s\"", #you need to escape the quotes
"endkey":"\"t\"",
"inclusive_end"="false"
}
r = requests.get(get_url, auth=auth, params=params)
print json.dumps(r.json(), indent=1)
16
Feel free to play around with the starkey and endkey values to get a feel for this feature. These will be useful
for future queries to MapReduce View results youll find that many of the options for /db/_all_docs are
the same for queries to secondary indexes.
Update
Update multiple documents in a single HTTP reuqest.
API Reference: POST /db/_bulk_docs
As previously mentioned, updating documents already on the database is nearly the same as creating new documents, except that you need to specifiy the "_id" and current "_rev" key-value pairs for each document. The
following example grabs a set of documents from the primary index, makes some changes to the documents, then
uses /db/_bulk_docs to insert them into the database.
import requests
import json
auth = (username, password)
headers = {Content-type: application/json}
get_url = "http://{0}.cloudant.com/users/_all_docs".format(auth[0])
params = {
"include_docs":"true",
"limit":20
}
r = requests.get(get_url, auth=auth, params=params)
bulkdocs = {"docs":[]}
for row in r.json()["rows"]:
doc = row["doc"]
doc["high_score"] = doc.get("high_score", 0) + 100*random.random()
doc["level"] = doc.get("level", 0) + int(2*random.random())
bulkdocs["docs"].append(doc)
post_url = "http://{0}.cloudant.com/users/_bulk_docs".format(auth[0])
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(bulkdocs)
)
print json.dumps(r.json(), indent=1)
Delete
Delete multiple documents with a single HTTP request.
API Reference: POST /db/_bulk_docs
Are you surprised by the API Reference? In order to delete mutiple documents with a single HTTP call, youll need
to add a "_deleted":True key-value pair to each document and then update those documents to the database
in a single HTTP POST. Youll notice in the example script below that each document is not downloaded in full.
Instead, we use the "_id" and current "_rev" to essentially build a new document with a "_deleted":True
import requests
import json
17
and
"_id" : "content:12345"
The results from the request to /db/_all_docs is then already sorted by type. The request
GET /db/_all_docs?startkey="user:"&endkey="user:\ufff0"&include_docs=true
will return all the user documents. (The \ufff0 is a very large special unicode character useful for setting a
range. See the String Ranges section here.)
18
Notice that the startkey and endkey specify a range of keys that are inclusive. That is, if startkey="a"
and endkey="b" is specified, a document with "_id":"a" and "_id":"b" will be included in the results.
You can make this request exclusive, however, if you use the inclusive_end=false option to exclude an
"_id" equal to the endkey value and the skip=1 to exclude an "_id" equal to the startkey value. (Note,
this skip=1 trick will not necessarily work with secondary indexes since those keys are not required to be
unique.)
Reversing the Order of Results
A mistake is often made by new developers when using both the descending=true option and one or both
of the startkey="<value>", endkey="<value>" options. The results found in the "rows" array are
alphanumerically ordered by the value of the "key". In the case of /db/_all_docs, they are ordered alphanumerically by the document "_id". When you use descending=true this reverses the order. However,
the startkey and endkey parameters are applied after the descending option is applied. So, when using
descending=true you must reverse the values of startkey and endkey in order to get the same results
with descending=false.
For example, lets say you would like all of the documents with values of their "_id" between a123 and a456,
inclusively. You could make the request
GET /db/_all_docs?startkey="a123"&endkey="a456"
19
You can get the same results but returened in a different order by using the descending=true option. The
request should be
GET /db/_all_docs?endkey="a123"&startkey="a456"&descending=true
This reversed order mistake can also be made when querying the secondary index since the startkey, endkey
and descending options work in the same way.
Design Choice regarding "_id"
For time-series data, one trick is to insert the timestamp of the data into the document "_id" assuming that
you can guarantee that the "_id" will be unique. This obviously lets you sort your documents by time with the
primary index. Also, using a combination of the descending, limit=1, and startkey=<T> options, you
can obtain the document before or after some particular time, T.
For example, lets suppose we are inserting documents to the database with an "_id" that looks like timedata:ISO format. Here are a few examples
"_id":"timedata:2010-09-30T01:33+00:00",
"_id":"timedata:2010-10-05T14:18+00:00",
"_id":"timedata:2011-01-16T14:26+00:00",
"_id":"timedata:2011-04-04T08:20+00:00",
"_id":"timedata:2011-05-15T21:57+00:00",
"_id":"timedata:2011-07-21T08:49+00:00",
A query to retrieve the last document inserted to the database before a particular date (lets say April 1, 2011),
would be
GET /db/_all_docs?descending=true&limit=1&include_docs=true&startkey="timedata:2011-04-01T00:00+00
20
documents programmatically. However, in production, you will really want to use a _design document management tool that lets you write your MapReduce, Search and other functions on a local machine and upload them
to the database for you. Additionally, this will let you use a version control system to manage your _design
document code.
1. Simulate Game Results
Were going to create a new database (called gameresults) and store documents in that database that simulate
recording the results of each game played by our users. In this case, well only simulate having 11 unique users,
to keep it simple. Each user plays the game a random number of times with a randomly generated score. The
following Python code will generate this simulation (hopefully its not too difficult for you to translate this into the
language that youre using to learn Cloudant).
import
import
import
import
requests
json
random
time
username = username
gameresultsdb = gameresults
auth = (username, password)
headers = {Content-type: application/json}
def randomDate(start, end):
prop = random.random()
format = %Y-%m-%dT%H:%M+00:00
stime = time.mktime(time.strptime(start, format))
etime = time.mktime(time.strptime(end, format))
ptime = stime + prop * (etime - stime)
return time.strftime(format, time.localtime(ptime))
def generateDoc():
usernames = (alice, bob, cartman, daniel, sawyer, zelda, sabine, luigi, mario,
dateplayed = randomDate(2010-1-1T00:00+00:00, 2013-11-01T00:00+00:00)
randomlevel = int(10*random.random() + 1)
randomscore = int(sum([100*random.random() for i in range(randomlevel)]))
return {
"playername":random.choice(usernames),
"score":randomscore,
"level":randomlevel,
"date_played":dateplayed
}
def bulk_insert(ddocs):
post_url = "https://{0}.cloudant.com/{1}/_bulk_docs".format(username, gameresultsdb)
r = requests.post(
post_url,
auth=auth,
headers=headers,
data=json.dumps(ddocs)
)
if r.status_code != 201:
raise Exception(Bulk Insert. bad status code: %d. %s, r.status_code, r.text)
21
As you can see, the documents in our database look something like
{
"_id":<db generated id>,
"_rev":"1-abc...",
"playername":"sabine",
"score":623,
"level":6,
"date_played":2010-09-30T01:33+00:00
}
Well now create a few MapReduce functions that will let us sort and analyze the documents using the different
keys.
2. Count Number of Plays
This first example might not be the most useful, but is, at least, instructional. This will simply count the number of times each user played the game. Were going to use a web browser to write and save our MapReduce
function into a _design document. Sign in to your Cloudant account, navigate to your Dashboard, click
22
on your gameresults database, then click on the View in Futon link near the top. The URL should be
https://cloudant.com/futon/database.html?<username>%2Fgameresults.
Next, from the View pull-down menu, select Temporary View and enter the following code into the Map
function.
function(doc){
if(doc.playername)
emit(doc.playername, 1);
}
First, we want to emphasize the use of duck-typing. In the Map function, check to make sure that each document
passed in has a key called playername before we emit its value as the key. This Map function emits the keyvalue pair of doc.playername, and 1. Click on the Save As button and set the name of the _design
document to be playstats and the View to be byplayername.
In the browser you should now see the last ten results of this Map function. The key will be zelda since this is
the last name in alphanumeric order of all possible names. If you click on the key in the browser, it will take you
to the document from which that key originated.
Now, lets use the built-in _sum Reduce function to count. (We could also use _count, but since we emitted 1
for each value, the results will be the same.) Type in _sum for the Reduce function and click Save. After you
click on the reduce checkbox, you should see in your browser a list of the playername values and the number
of times that player played a game (since each document represents the result of a played game).
To get these same results with an HTTP call from curl (well leave out the Python request since you should know
how to do this) we need to set group=true when querying the View.
23
3. Total Score
This next MapReduce View will let you simultaneously determine the number of games played and give you the
total number of points accumulated by each user. In addition, it will let you find the number of points scored for
each level the user reached. Save the following Map and Reduce functions in a new View in the same _design
document and call it byplayername_level.
//map.js
function(doc){
if(doc.playername && !isNaN(doc.level) && !isNaN(doc.score))
emit([doc.playername, doc.level], doc.score);
}
//reduce.js
_stats
In this example we have emitted a complex key, which lets us use different group_level options in the query.
We are also using the _stats reduce function, which calculates the sum, count, min, max and sumsqr
of the values.
Count Number of Plays First, lets find the number of times each player has played the game. First with curl
with Python
import requests
import json
auth = (username, password)
get_url = "https://{0}.cloudant.com/gameresults/_design/playstats/_view/byplayername_level?group_l
r = requests.get(get_url, auth=auth)
for row in r.json()[rows]:
print row[key][0], row[value][count]
Get Total Score To get the total score, just look at the sum calculation in the value. In the code above change
count to sum.
Get Average Score To get the average score, change the jq filter to .value.sum/.value.count (or print
out the similar ratio in your Python script). Similarly, you can estimate the standard deviation of the scores as well
using the sumsqr, which is the sum of the square of each doc.score.
4. Total Score for Each Level
Using the group_level=2 option, we can gain some more granularity in the top scores by grouping by both
the playername and the level. Heres the curl command
For the Python script, you just need to change group_level=2 and modify the for-loop to display the results
since youll now get two elements in the key.
for row in r.json()[rows]:
print row[key][0], row[key][1], row[value][sum]
24
Instead of just a single number, one can emit an array of numbers in the Map function value and still use the
built-in _sum and _stats reduce function. This feature, so far, is not well documented, so lets fix that.
The _sum and _stats reduce functions work on numbers. Of course, emitting a string in the value will cause
problems when trying to calculate the sum! The reduce function will fail on the server-side. However, when
the value emitted by the Map function is an array of numbers, the _sum and _stats will make those same
calculations for each element of that array.
The modification of the Map function is to place emit [doc.score, doc.level] in the value.
//map.js
function(doc){
if(doc.playername && !isNaN(doc.level) && !isNaN(doc.score))
emit([doc.playername, doc.level], [doc.score, doc.level]);
}
//reduce.js
_stats
Save
this
MapReduce
View
in
byplayername_level_withscore_level.
the
playstats
design
document
as
Now, you get the _stats Reduce calculations made on both elements of the value array, with which you can
calculate the average level attained by each player. For example,
As a final step in this tutorial, which leads into the next section in the Getting Started series, well take a look at
the actual JSON _design document that was created above. Since its an ordinary document in the database, we
can view it with the usual tools. You can also navigate to that document in the Futon interface. Under the Views
pulldown menu in Futon, you can select Design Documents and then click on the document. It should look
similar to this
As you can see, the Views are stored in the key views. The views value contains a key for each MapReduce
View, with the name of the key being the name of the View. For each MapReduce View key, there are two
key-value pairs, one for the map and one for the reduce.
25
In addition to MapReduce Views, search index functions, list functions, show functions, update functions and a
single validate_doc_update function are stored in the _design document in a similar fashion.
MapReduce View Names
One thing to think about when creating MapReduce Views in your _design documents is to give it a proper
name. The name shouldnt be too long as to be unreadable, but it can be helpful if the name contains some description of the results. In the tutorial above we roughly used the naming pattern by<key>_with<value>.
We used by because the View lets us sort by the emitted keys. This naming pattern tells us the the keys and
values emitted by the Map function. One could add some Reduce function information in the naming convention: by<key>_with<value>_reduce<function> where function could be _sum, _stats, _count
or maybe even custom in the case you write your own Reduce. A final naming suggestion, which is a little more
verbose, but hopefully not too cumbersome is
by_<key[0]-key[1]-key[2]...>_with_<value[0]-value[1]-value[2]...>_reduce_<function>
With this pattern, the View functions in our tutorial would have been
by_playername_with_1_reduce_sum
by_playername-level_with_score_reduce_stats
by_playername-level_with_score-level_reduce_stats
We often see developers using MapReduce Views just for their map functions in order to retrieve full documents.
This pattern has been documented, especially by the Apache CouchDB community since Apache CouchDB does
not contain any other way of indexing your documents. However, with Cloudant, you can build search indexes,
which are often the more approprate tool when you want to retrieve documents based on a multidimentional query
(a query involving multiple keys) and are not interested in the sorted order or a reduce calculation.
Query defaults: reduce=true and group=false
By default, the reduce=true and group=false options are set when you query the results of a MapReduce
View. If you query the first View we built above (playstats/byplayername) without options with curl
curl -u username https://username.cloudant.com/gameresults/_design/playstats/_view/byplayername
youll get the following results, which may not be what you expect
{
"rows":[
{"key":null,"value":1000}
]
}
To get the results that you saw in the browser, set group=true.
View Collation - the order of your results
The order of the keys returned by a query to a Cloudant MapReduce index is the same as that of Apache CouchDB.
A very good explanation of the order is found on the Apache CouchDB View Collation document.
26
Often times, new users are confused about this order, which can result in confusion when a particular query doesnt
return any results. So, when youre developing and testing new Views and you get an unexpected empty query,
check to make sure the key-related query options arent excluding your expected results.
For convenience, weve cribbed two key figures from the Apache CouchDB View Collation page. This first one
gives a high-level overview of the order that keys will be sorted by when querying an index.
// special values sort before all other types
null
false
true
// then numbers
1
2
3.0
4
// then text, case sensitive
"a"
"A"
"aa"
"b"
"B"
"ba"
"bb"
// then arrays. compared element by element until different.
// Longer arrays sort after their prefixes
["a"]
["b"]
["b","c"]
["b","c", "a"]
["b","d"]
["b","d", "e"]
// then object, compares each key value in the list until different.
// larger objects sort after their subset objects.
{a:1}
{a:2}
{b:1}
{b:2}
{b:2, a:1} // Member order does matter for collation.
// CouchDB preserves member order
// but doesnt require that clients will.
// this test might fail if used with a js engine
// that doesnt preserve order
{b:2, c:2}
The second figure is the collation sequence for 7-bit ASCII characters, which can come in handy.
^ _ - , ; : ! ? . " ( ) [ ] { } @ * / \ & # % + < = > | ~ $
0 1 2 3 4 5 6 7 8 9 a A b B c C d D e E f F g G h H i I j J k K
l L m M n N o O p P q Q r R s S t T u U v V w W x X y Y z Z
27
The _design documents are ordinary database JSON documents with two exceptions: the "_id" of each
document begins with _design/ and there is a particular schema that must be used to hold function definitions.
One can construct a _design document by hand by writing a JSON document and uploading it to the database
in the normal way. However, the organized structure of a _design document and the awkwardness of writing an
entire javascript function inside a string has led to the construction of tools to help build _design documents.
Furthermore, by developing the _design documents locally, you can then use a proper version control system
to manage your code development.
There are a handful of applications out there that help you to do this, such as Couchapp, Erica, couchapp.js and
others.
Youll notice that the name of the tools or the documentation for these tools mention that they will help you build
a Couchapp. We may eventually cover Couchapps, but they are nothing more than HTML/CSS/Javascript files
served directly from the _attachments key of a _design document. So, we can use these tools to simply
create _design documents on your database even if we have no files in the _attachments key to serve.
For example, the Couchapp and Erica tools basically work in the same way. They map files and folders on your
local machine into a heirarchical JSON document and upload that JSON document to your database. Each file
and folder name will become the keys of the document. For files, the value of those keys are the content of the
file represented as a string. For folders, the value is another JSON object that represents the content inside the
folder. Some tools designate special file names that are not uploaded in the JSON document to the database but
are instead used for configuration puposes.
The basic steps to use these tools are
1. Create a folder on your local machine to hold your design document.
2. Inside that folder create the necessary top-level files (such as _id and .couchapprc).
3. Give your design document a name by setting the value in the _id file.
4. Create a folder called views to hold your MapReduce Views.
5. Inside views, create folders for each MapReduce View name.
6. Inside each MapReduce View folder, create two files, called map.js and reduce.js.
7. The contents of map.js and reduce.js are, of course, the javascript functions.
8. A similar pattern follows for the Search index function where the top-level directory would be called indexes (the sub-directories map to the function name, and the index.js file defines the function).
9. Within the top-level folder, execute the tools command to push the content of the folder into a _design
document on your database.
The schema of a _design document is roughly
{
"_id": <name of design doc>,
"language" : "javascript",
"views" {
"<view name"> : {
"map" : "<function definition>",
"reduce" : "<"function definition>"
},
"<another view name>" : {
...
},
...
"<optional commonjs module group name for map functions>" : {
"<module name.js>":"<module definition>"
}
28
},
"indexes"{
"<search index name>" {
"index":"<function definition>",
"analyzer" : "<analyzer definition>" // optional
},
"<search index name>" {
...
}
},
"lists" : {
"<list function name>" : "<function definition>",
...
},
"shows" : {
"<show function name>" : "<function definition>",
...
},
"updates":{
"<update handler function name>" : "<function definition>",
...
},
"validate_doc_update" : "<function definition>",
"<optional commonjs module group name>" : {
"<module name.js>":"<module definition>"
}
In this rough schema definition, the things in brackets (< >) are, of course, defined by you. Also, all keys/functions
in a _design document are optional you dont need to define search index functions, list functions, etc. if you
dont need them.
You can find more information about CommonJS usage on this Apache CouchDB Wiki page.
We highly recommend that you use one of the Couchapp tools to define and deploy your _design documents.
For example, there is no search index function equivalent to writing MapReduce Views in Futon as shown in the
previous tutorial. The only other way would be to write the function inside the value of a key in a JSON document,
which will probably break your eyes. Also, dont forget that you can review your _design documents in the
database via Futon, which will give you that warm fuzzy feeling when you see that your local function definitions
were uploaded successfully, follow the schema outlined above, and are emitting results.
29
30
CHAPTER
TWO
API REFERENCE
Cloudants database API is based on HTTP. If you know CouchDB, you should feel right at home, as Cloudants
API is very similar. To access your data on Cloudant, you connect to username.cloudant.com via HTTP or HTTPS.
For most requests, you will need to supply your user name or an API key and a password. See Authentication
Methods for details. Cloudant uses the JSON format for all documents in the database as well as for any metadata.
Thus, the request or response body of any HTTP request - unless specified otherwise - has to be a valid JSON
document. A good place to start reading about the API and its basic building blocks is the API Basics section.
This documentation is forked from the Apache Couch DB API Reference, due to the capabilities Cloudant adds
to the API. If you notice any problems with these docs, please let us know at support@cloudant.com.
31
POST
Upload data. Within Cloudants API, POST is used to set values, including uploading documents, setting
document values, and starting certain administration commands.
PUT
Used to put a specified resource. In Cloudants API, PUT is used to create new objects, including databases,
documents, views and design documents.
DELETE
Deletes the specified resource, including documents, views, and design documents.
COPY
A special method that can be used to copy documents and objects.
If you use an unsupported HTTP request type with a URL that does not support the specified type, a 405 error will
be returned, listing the supported HTTP methods. For example:
{
"error":"method_not_allowed",
"reason":"Only GET,HEAD allowed"
}
If the client (such as some web browsers) does not support using these HTTP methods, POST can be used instead
with the X-HTTP-Method-Override request header set to the actual HTTP method.
32
Note that the returned content type is text/plain even though the information returned by the request is
in JSON format.
Explicitly specifying the Accept header:
GET /recipes HTTP/1.1
Host: username.cloudant.com
Accept: application/json
If-None-Match
This header can optionally be sent to find out whether a document has been modified since it was last read
or updated. The value of the If-None-Match header should match the last Etag value received. If the
value matches the current revision of the document, the server sends a 304 Not Modified status code
and the response will not have a body. If not, you should get a normal 200 response, provided the document
still exists and no other errors occur.
Response Headers
Response headers are returned by the server when sending back content and include a number of different header
fields, many of which are standard HTTP response header and have no significance to how Cloudant operates.
The list of response headers important to Cloudant are listed below.
The Cloudant design document API and the functions when returning HTML (for example as part of a show or
list) enable you to include custom HTTP headers through the headers field of the return object.
Content-type
Specifies the MIME type of the returned data. For most request, the returned MIME type is text/plain.
All text is encoded in Unicode (UTF-8), and this is explicitly stated in the returned Content-type, as
text/plain;charset=utf-8.
Cache-control
The cache control HTTP response header provides a suggestion for client caching mechanisms on how to
treat the returned information. Cloudant typically returns the must-revalidate, which indicates that
the information should be revalidated if possible. This is used to ensure that the dynamic nature of the
content is correctly updated.
Content-length
The length (in bytes) of the returned content.
Etag
33
The Etag HTTP header field is used to show the revision for a document or the response from a show
function. For documents, the value is identical to the revision of the document. The value can be used
with an If-None-Match request header to get a 304 Not Modified response if the revision is still
current.
ETags cannot currently be used with views or lists, since the ETags returned from those requests are just
random numbers that change on every request.
Boolean - a true or false value. You can use these strings directly. For example:
{ "value": true}
Object - a set of key/value pairs (i.e. an associative array, or hash). The key must be a string, but the value
can be any of the supported JSON values. For example:
{
"servings" : 4,
"subtitle" : "Easy to make in advance, and then cook when ready",
"cooktime" : 60,
"title" : "Chicken Coriander"
}
In Cloudant databases, the JSON object is used to represent a variety of structures, including all documents
in a database.
Parsing JSON into a JavaScript object is supported through the JSON.parse() function in JavaScript, or
through various libraries that will perform the parsing of the content into a JavaScript object for you. Libraries for
parsing and generating JSON are available in all major programming languages.
Warning: Care should be taken to ensure that your JSON structures are valid, invalid structures will cause
Cloudant to return an HTTP status code of 400 (bad request).
34
35
Path
GET
/_session
POST
/_session
DELETE
/_session
Description
Returns cookie
based login user
information
Do cookie based
user login
Logout
cookie
based user
Headers
Form Parameters
Content-Type:
application/x-www-form-urlencoded
name,
password
200 OK
Cache-Control: must-revalidate
Content-Length: 42
Content-Type: text/plain; charset=UTF-8
Date: Mon, 04 Mar 2013 14:06:11 GMT
server: CouchDB/1.0.2 (Erlang OTP/R14B)
Set-Cookie: AuthSession="a2ltc3RlYmVsOjUxMzRBQTUzOtiY2_IDUIdsTJEVNEjObAbyhrgz"; Expires=Tue, 05 Ma
x-couch-request-id: a638431d
{
"ok": true,
"name": "kimstebel",
"roles": []
}
Once you have obtained the cookie, you can make a GET request to obtain the username and its roles:
36
{
"ok": true,
"info": {
"authentication_db": "_users",
"authentication_handlers": ["cookie", "default"]
},
"userCtx": {
"name": null,
"roles": []
}
}
To log out, you have to send a DELETE request to the same URL and sumbit the Cookie in the request.
200 OK
Cache-Control: must-revalidate
Content-Length: 12
Content-Type: application/json
Date: Mon, 04 Mar 2013 14:06:12 GMT
server: CouchDB/1.0.2 (Erlang OTP/R14B)
Set-Cookie: AuthSession=""; Expires=Fri, 02 Jan 1970 00:00:00 GMT; Max-Age=0; Path=/; HttpOnly; Ve
x-couch-request-id: e02e0333
{
"ok": true
}
Path
Description
https://cloudant.com/api/set_permissions
Set permissions for a user and
database
https://cloudant.com/api/generate_api_key
Generate a random API key
Parameters
database, username,
roles[]
37
Query Arguments
Ar- Description
gument
database
The database for which
permissions are set. This has to be
a string of the form
accountname/databasename.
username
The user name or API key for
which permissions are set
rolesThe roles the user can have. This
parameter can be passed multiple
times for each role.
string
yes
string
no
Example Request
POST /api/set_permissions HTTP/1.1
Host: cloudant.com
Content-Length: 83
Content-Type: application/x-www-form-urlencoded
username=aUserNameOrApiKey&database=accountName/db&roles=_reader&roles=_writer
Example Response
{
"ok": true
}
38
Example Response
{
"password": "generatedPassword",
"ok": true,
"key": "generatedKey"
}
2.4 Databases
The database level endpoints provide an interface to entire databases within Cloudant. These are database level
rather than document level requests.
A list of the available methods and URL paths are provided below:
Method
GET
GET
PUT
DELETE
GET
POST
POST
GET
Path
/_all_dbs
/db
/db
/db
/db/_all_docs
/db/_all_docs
/db/_bulk_docs
/_db_updates
GET
/db/_changes
GET
/db/_shards
POST
/db/_missing_revs
POST
/db/_revs_diff
GET
/db/_revs_limit
PUT
/db/_revs_limit
GET
PUT
POST
/db/_security
/db/_security
/db/_view_cleanup
Description
Returns a list of all databases
Returns database information
Create a new database
Delete an existing database
Returns a built-in view of all documents in this database
Returns certain rows from the built-in view of all documents
Insert multiple documents in to the database in a single request
Returns information about databases that have been updated
Returns information about documents that have been updated in a
database
Returns information about the shards in a database or the shard a document belongs to
Given a list of document revisions, returns the document revisions that
do not exist in the database
Given a list of document revisions, returns differences between the
given revisions and ones that are in the database
Gets the limit of historical revisions to store for a single document in
the database
Sets the limit of historical revisions to store for a single document in the
database
Returns the special security object for the database
Sets the special security object for the database
Removes view files that are not used by any design document
Description
Request completed successfully
2.4. Databases
39
GET http://username.cloudant.com/_all_dbs
Accept: application/json
Code
200
404
Description
The database exists and information about it is returned.
The database could not be found. If further information is available, it will be returned as a JSON object.
Gets information about the specified database. For example, to retrieve the information for the database recipe:
GET /db HTTP/1.1
Accept: application/json
The JSON response contains meta information about the database. A sample of the JSON returned for an empty
database is provided below:
{
"update_seq": "0-g1AAAADneJzLYWBgYMlgTmFQSElKzi9KdUhJMtbLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRIsv_
"db_name": "db",
"purge_seq": 0,
"other": {
"data_size": 0
},
"doc_del_count": 0,
"doc_count": 0,
"disk_size": 316,
40
"disk_format_version": 5,
"compact_running": false,
"instance_start_time": "0"
}
The elements of the returned structure are shown in the table below:
Field
Description
comSet to true if the database compaction routine is operating on this database.
pact_running
db_name
The name of the database.
disk_format_version
The version of the physical format used for the data when it is stored on disk.
disk_size
Size in bytes of the data as stored on the disk. Views indexes are not included in the
calculation.
doc_count
A count of the documents in the specified database.
doc_del_count
Number of deleted documents
inAlways 0.
stance_start_time
purge_seq
The number of purge operations on the database.
update_seq
An opaque string describing the state of the database. It should not be relied on for
counting the number of updates.
other
Json object containing a data_size field.
Creating a database
Method: PUT /db
Request: None
Response: JSON success statement
roles permitted: _admin
Code
201
202
403
412
Description
Database created successfully
The database has been successfully created on some nodes, but the number of nodes is less than the
write quorum.
Invalid database name.
Database aleady exists.
Creates a new database. The database name must be composed of one or more of the following characters:
Lowercase characters (a-z)
Name must begin with a lowercase letter
Digits (0-9)
Any of the characters _, $, (, ), +, -, and /.
Trying to create a database that does not meet these requirements will return an error quoting these restrictions.
To create the database recipes:
PUT /db HTTP/1.1
Accept: application/json
Anything else should be treated as an error, and the problem should be taken from the HTTP response code.
2.4. Databases
41
Deleting a database
Method: DELETE /db
Request: None
Response: JSON success statement
roles permitted: _admin
Return Codes
Code
200
404
Description
Database has been deleted
The database could not be found. If further information is available, it will be returned as a JSON object.
Deletes the specified database, and all the documents and attachments contained within it.
To delete the database recipes you would send the request:
DELETE /db HTTP/1.1
Accept: application/json
Argument
Description
descending
Return the documents in descending by key order
endkey
Stop returning records when the specified key is reached
include_docs Include the full content of the documents in the return
inclusive_end Include rows whose key equals the endkey
key
Return only documents that match the specified key
limit
Limit the number of the returned documents to the
specified number
skip
Skip this number of records before starting to return the
results
startkey
Return records starting with the specified key
Optional
yes
yes
yes
yes
yes
yes
yes
yes
Type
Default
boolean false
string
boolean false
boolean true
string
numeric
nu0
meric
string
Returns a JSON structure of all of the documents in a given database. The information is returned as a JSON structure containing meta information about the return structure, and the list documents and basic contents, consisting
the ID, revision and key. The key is generated from the document ID.
42
Field
offset
rows
total_rows
update_seq
Description
Offset where the document list started
Array of document object
Number of documents in the database/view
Current update sequence for the database
Type
numeric
array
numeric
string
By default the information returned contains only the document ID and revision. For example, the request:
GET /test/_all_docs HTTP/1.1
Accept: application/json
The information is returned in the form of a temporary view of all the database documents, with the returned
key consisting of the ID of the document. The remainder of the interface is therefore identical to the View query
arguments and their behavior.
POST /db/_all_docs
Method: POST /db/_all_docs
Request: JSON of the document IDs you want included
Response: JSON of the returned view
roles permitted: _admin, _reader
The POST to _all_docs allows to specify multiple keys to be selected from the database. This enables you to
request multiple documents in a single request, in place of multiple Retrieving a document requests.
The request body should contain a list of the keys to be returned as an array to a keys object. For example:
POST /recipes/_all_docs
User-Agent: MyApp/0.1 libwww-perl/5.837
{
"keys" : [
"Zingylemontart",
"Yogurtraita"
]
}
2.4. Databases
43
The return JSON is the all documents structure, but with only the selected keys in the output:
{
"total_rows" : 2666,
"rows" : [
{
"value" : {
"rev" : "1-a3544d296de19e6f5b932ea77d886942"
},
"id" : "Zingylemontart",
"key" : "Zingylemontart"
},
{
"value" : {
"rev" : "1-91635098bfe7d40197a1b98d7ee085fc"
},
"id" : "Yogurtraita",
"key" : "Yogurtraita"
}
],
"offset" : 0
}
Description
All documents have been created or updated.
For at least one document, the write quorum (specified by w) has not been met.
The bulk document API allows you to create and update multiple documents at the same time within a single
request. The basic operation is similar to creating or updating a single document, except that you batch the
document structure and information. When creating new documents the document ID is optional. For updating
existing documents, you must provide the document ID, revision information, and new document values.
For both inserts and updates the basic structure of the JSON document in the request is the same:
Request Body
Field
docs
Description
Bulk Documents Document
Type
array of objects
Optional
no
44
Description
Document ID
Document revision
Whether the document should be deleted
Type
string
string
boolean
Optional
yes, but mandatory for updates
yes, but mandatory for updates
yes
The return type from a bulk insertion will be 201, with the content of the returned structure indicating specific
success or otherwise messages on a per-document basis.
The return structure from the example above contains a list of the documents created, here with the combination
and their revision IDs:
201 Created
Cache-Control: must-revalidate
Content-Length: 269
Content-Type: application/json
Date: Mon, 04 Mar 2013 14:06:20 GMT
server: CouchDB/1.0.2 (Erlang OTP/R14B)
x-couch-request-id: e8ff64d5
[{
"id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"rev": "1-54dd23d6a630d0d75c2c5d4ef894454e"
}, {
"id": "5a049246-179f-42ad-87ac-8f080426c17c",
"rev": "1-0cde94a828df5cdc0943a10f3f36e7e5"
}, {
"id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"rev": "1-a2b6e5dac4e0447e7049c8c540b309d6"
}]
The content and structure of the returned JSON will depend on the transaction semantics being used for the bulk
update; see Bulk Documents Transaction Semantics for more information. Conflicts and validation errors when
2.4. Databases
45
updating documents in bulk must be handled separately; see Bulk Document Validation and Conflict Errors.
Updating Documents in Bulk
The bulk document update procedure is similar to the insertion procedure, except that you must specify the document ID and current revision for every document in the bulk update JSON string.
For example, you could send the following request:
POST /test/_bulk_docs HTTP/1.1
Accept: application/json
{
"docs": [{
"name": "Nicholas",
"age": 45,
"gender": "female",
"_id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"_attachments": {
},
"_rev": "1-54dd23d6a630d0d75c2c5d4ef894454e"
}, {
"name": "Taylor",
"age": 50,
"gender": "female",
"_id": "5a049246-179f-42ad-87ac-8f080426c17c",
"_attachments": {
},
"_rev": "1-0cde94a828df5cdc0943a10f3f36e7e5"
}, {
"name": "Owen",
"age": 51,
"gender": "female",
"_id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"_attachments": {
},
"_rev": "1-a2b6e5dac4e0447e7049c8c540b309d6"
}]
}
The return structure is the JSON of the updated documents, with the new revision and ID information:
[{
"id": "96f898f0-f6ff-4a9b-aac4-503992f31b01",
"rev": "2-ff7b85665c4c297838963c80ecf481a3"
}, {
"id": "5a049246-179f-42ad-87ac-8f080426c17c",
"rev": "2-9d5401898196997853b5ac4163857a29"
}, {
"id": "d1f61e66-7708-4da6-aa05-7cbc33b44b7e",
"rev": "2-cbdef49ef3ddc127eff86350844a6108"
}]
You can optionally delete documents during a bulk update by adding the _deleted field with a value of true
to each document ID/revision combination within the submitted JSON structure.
The return type from a bulk insertion will be 201, with the content of the returned structure indicating specific
success or otherwise messages on a per-document basis.
The content and structure of the returned JSON will depend on the transaction semantics being used for the bulk
46
update; see Bulk Documents Transaction Semantics for more information. Conflicts and validation errors when
updating documents in bulk must be handled separately; see Bulk Document Validation and Conflict Errors.
Bulk Documents Transaction Semantics
Cloudant will only guarantee that some of the documents will be saved if your request yields a 202
response. The response will contain the list of documents successfully inserted or updated during the
process.
The response structure will indicate whether the document was updated by supplying the new _rev
parameter indicating a new document revision was created. If the update failed, then you will get an
error of type conflict. For example:
[
{
"id" : "FishStew",
"error" : "conflict",
"reason" : "Document update conflict."
},
{
"id" : "LambStew",
"error" : "conflict",
"reason" : "Document update conflict."
},
{
"id" : "7f7638c86173eb440b8890839ff35433",
"error" : "conflict",
"reason" : "Document update conflict."
}
]
In this case no new revision has been created and you will need to submit the document update with
the correct revision tag, to update the document.
Bulk Document Validation and Conflict Errors
The JSON returned by the _bulk_docs operation consists of an array of JSON structures, one for each document
in the original submission. The returned JSON structure should be examined to ensure that all of the documents
submitted in the original request were successfully added to the database.
The structure of the returned information is:
Field
docs [array]
Description
Bulk Documents Document
Field
id
error
reason
Type
array of objects
Description
Document ID
Error type
Error string with extended reason
Type
string
string
string
When a document (or document revision) is not correctly committed to the database because of an error, you
should check the error field to determine error type and course of action. Errors will be one of the following
type:
conflict
The document as submitted is in conflict. If you used the default bulk transaction mode then the new revision
will not have been created and you will need to re-submit the document to the database.
Conflict resolution of documents added using the bulk docs interface is identical to the resolution procedures
used when resolving conflict errors during replication.
2.4. Databases
47
forbidden
Entries with this error type indicate that the validation routine applied to the document during submission
has returned an error.
For example, if your validation routine includes the following:
throw({forbidden: invalid recipe ingredient});
heartbeat
Time in milliseconds after which an empty line
is sent during longpoll or continuous if there
have been no changes
limitMaximum number of results to return
yes
yes
yes
yes
yes
numeric
yes
boolean
false
Response Headers
Independent of the feed parameter, the changes feed always uses Transfer-Encoding: Chunked for all
its responses. This means that the response does not have a Content-Size header. Instead, the body contains
48
the sizes of each chunk (There will be one chunk for each update). Most http client libraries are able to decode
these responses so that there is no difference from the perspective of the application. However, some libraries
might require manual processing of the chunks. See RFC 2616 for more information.
Description
Obtains a list of changes to databases. Changes can be either updates to the database, creation, or deletion of
a database. This can be used to monitor for updates and modifications to the database for post processing or
synchronization across databases. It is most useful in applications that use many small databases, so that the
application does not need to keep changes feeds open for each database. The feed is not guaranteed to return
changes in the correct order and might contain changes more than once. In rare cases, changes might even be
skipped. There are three kinds of feeds: polling, long polling, and continuous. All requests are polling request by
default. You can select any feed type explicitly using the feed query argument.
Polling
If you do not set the feed parameter, you will get all changes that have occurred until now. Once they have been
sent, the http connection will be closed and another request can be made later to get more changes. This type
of request returns a single JSON document containing information about updates to databases. For example, the
query
GET /_db_updates
Will get all of the changes to all databases. You can request a starting point using the since query parameter and
specifying the sequence number. You will need to record the latest sequence number in your client and then use
this when making another request as the new value to the since parameter.
Longpoll
With long polling the request to the server will remain open until a change is made on the database, when the
changes will be reported, and then the connection will close. The long poll is useful when you want to monitor
for changes for a specific purpose without wanting to monitor continuously.
Because the wait for a change can be significant you can set a timeout before the connection is automatically
closed (the timeout parameter). You can also set a heartbeat interval (using the heartbeat query parameter),
which let the server send a newline to keep the connection open.
The following example request...
GET /_db_updates?feed=longpoll&since=672-g1AAAAHeeJzLYWBg4MhgTmFQSElKzi9KdUhJMtIrSS0uqTQwMNNLzskvT
"results": [{
"dbname": "documentationchanges1documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "created",
"account": "testy006-admin",
"seq": "673-g1AAAAJAeJyN0EtuwjAQgGGXViq3KCzYsIjsxI9k1UgcpPV4jBBKEwnCghXcpL1JuUl7k-BHVAkWaTZjyR
}],
"last_seq": "673-g1AAAAJAeJyN0EtuwjAQgGGXViq3KCzYsIjsxI9k1UgcpPV4jBBKEwnCghXcpL1JuUl7k-BHVAkWaTZ
}
2.4. Databases
49
Continuous
Continuous sends all new changes back to the client immediately, without closing the connection. In continuous
mode the format of the changes is slightly different to accommodate the continuous nature while ensuring that the
JSON output is still valid for each change notification.
As with the longpoll feed type you can set both the timeout and heartbeat intervals to ensure that the connection
is kept open for new changes and updates.
The return structure for normal and longpoll modes is a JSON array of changes objects, and the last update
sequence number.
results: Array of changes
dbname: name of the database that changed
type: type of change, created, updated, or deleted
seq: sequence number of the change. Sequence numbers are non-contiguous.
last_seq: sequence number of the last change
In continuous mode, the server sends a CRLF (carriage-return, linefeed) delimited line for each change. Each
line contains the JSON object with the same structure as the object inside the results array of other feed types.
The following example request...
GET /_db_updates?feed=continuous&since=665-g1AAAAHeeJzLYWBg4MhgTmFQSElKzi9KdUhJMtErSS0uqTQwMNNLzsk
"dbname": "documentationchangescontinuous1documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "created",
"account": "testy006-admin",
"seq": "666-g1AAAAJAeJyN0EkKwjAUgOE4gN5CxZWbksakSVcWL6IZEakVtC5c6U30JnoTvUnNIEJdVDcvEMLHy58DAPqr
}
{
"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "667-g1AAAAJAeJyN0EkKwjAUgOE4gN5CBTduShqTJl1ZvIhmRKRW0LpwpTfRm-hN9CY1gwh1Ud28QAgfL38OAOiv
}
{
"dbname": "documentationchangescontinuous2documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "created",
"account": "testy006-admin",
"seq": "668-g1AAAAJAeJyN0EuqwjAUgOH4AN2FCk6c1DQmTTqyuBHNE5FaQevAke5Ed6I70Z3UPEToHfQ6OYEQPk7-HADQ
}
{
"dbname": "documentationchangescontinuous1documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "deleted",
"account": "testy006-admin",
"seq": "669-g1AAAAJAeJyN0EuqwjAUgOH4AN2FCo4c1DQmTTqyuBHNE5FaQevAke5Ed6I70Z3UPEToHfQ6OYEQPk7-HADQ
}
50
"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "670-g1AAAAJAeJyN0EuqwjAUgOH4AN2FOnDioKYxadKRxY1onojUCloHjnQnuhPdie6k5iFC76DXyQmE8HHy5wCA
}
{
"dbname": "documentationchangescontinuous2documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "deleted",
"account": "testy006-admin",
"seq": "671-g1AAAAJAeJyN0EuqwjAUgOH4gOsu1IETByWNSZOOLG7k3jwRqRW0DhzpTnQnuhPdSc1DLtZB7eQEwuEj-XMA
}
{
"dbname": "documentationchangescontinuous2documentation94fb157e-d35e-4b2d-b14c-c2eeadfdec71",
"type": "deleted",
"account": "testy006-admin",
"seq": "672-g1AAAAJAeJyN0DsKwjAYwPH4AL2FOrg4lDQmTTpZvIjmiUitoHVw0pvoTfQmepOahwh1qC5fIIQfX_45AKC_
}
{
"dbname": "documentationchanges1documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "created",
"account": "testy006-admin",
"seq": "673-g1AAAAJAeJyN0DsKwjAYwPH4AL2FOrg4lDQmTTpZ8CCaJyK1gtbBSW-iN9Gb6E1qHiLUobp8gRB-fPnnAID}
{
"dbname": "documentationchanges2documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "created",
"account": "testy006-admin",
"seq": "674-g1AAAAJAeJyN0EsOATEYwPF6JNwCCxuLSafaaWeFOAh9RmSMhLGw4ibchJtwk9GHSMZi2HxNmuaXr_8MANBd
}
{
"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "675-g1AAAAJAeJyN0EsOATEYwPF6JNwCGwuLSafaaWeFOAh9RmSMhLGw4ibchJtwk9GHSMZi2HxNmuaXr_8MANBd
}
{
"dbname": "documentationchanges1documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "deleted",
"account": "testy006-admin",
"seq": "676-g1AAAAJAeJyN0EsOATEYwPF6JNwCGwuLSafaaWeFOAh9RmSMhLGw4ibchJtwk9GHSMZi2HxNmuaXr_8MANBd
}
{
"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "677-g1AAAAJAeJyN0EsOATEYwPF6JNwCK4nFpFPttLNCHIQ-IzJGwlhYcRNuwk24yehDJGMxbL4mTfPL138GAOiu
}
{
"dbname": "documentationchanges2documentation9f4f4b7e-7d6c-4df2-865d-b5899d0e4c96",
"type": "deleted",
"account": "testy006-admin",
"seq": "678-g1AAAAJAeJyN0EsOATEYwPF6JNwCKwvJpFPttLNCHIQ-IzJGwlhYcRNuwk24yehDJGMxbL4mTfPL138GAOiu
}
{
"dbname": "dbs",
"type": "updated",
"account": "_admin",
"seq": "679-g1AAAAJAeJyN0EkKwjAUgOE4gN5C3QlCSWPSpCsVD6IZEakVtC5c6U30JnoTvUnNIEJdVDcvEMLHy58BALqr
}
2.4. Databases
51
feed
Type of feed
yes
yes
yes
array
of
strings
string nor- continuous: Continuous
mal mode, longpoll: Long
polling mode, normal:
Polling mode
string
yes
nu60000
meric
yes
yes
booleanfalse
nunone
meric
string 0
yes
yes
booleanfalse
numeric
Obtains a list of the changes made to the database. This can be used to monitor for update and modifications to
the database for post processing or synchronization. The _changes feed is not guaranteed to return changes in
the correct order. There are three different types of supported changes feeds, poll, longpoll, and continuous. All
requests are poll requests by default. You can select any feed type explicitly using the feed query argument.
Polling
With polling you can request the changes that have occured since a specific sequence number. This returns the
JSON structure containing the changed document information. When you perform a poll change request, only the
changes since the specific sequence number are returned. For example, the query
GET /recipes/_changes
Content-Type: application/json
Will get all of the changes in the database. You can request a starting point using the since query argument and
specifying the sequence number. You will need to record the latest sequence number in your client and then use
this when making another request as the new value to the since parameter.
52
Longpoll
With long polling the request to the server will remain open until a change is made on the database, when the
changes will be reported, and then the connection will close. The long poll is useful when you want to monitor
for changes for a specific purpose without wanting to monitoring continuously for changes.
Because the wait for a change can be significant you can set a timeout before the connection is automatically
closed (the timeout argument). You can also set a heartbeat interval (using the heartbeat query argument),
which sends a newline to keep the connection open.
The return structure for normal and longpoll modes is a JSON array of changes objects, and the last update
sequence number. The response is structured as follows:
last_seq: Last change sequence string
pending: Number of changes after the ones in this response
results: Array of changes made to a database
changes: Array of changes, field-by-field, for this document
id: Document ID
seq: Update sequence string
Eexample request and response
GET /db/_changes?feed=longpoll&since=0-g1AAAAI9eJyV0F8KgjAcwPFRQd0iu4BsTZ17ypvU_iJiCrUeeqqb1E3qJnU
{
"results": [{
"seq": "1-g1AAAAI9eJyV0EsKwjAUBdD4Ad2FdQMlMW3TjOxONF9KqS1oHDjSnehOdCe6k5oQsNZBqZP3HiEcLrcEAMzz
"id": "foo",
"changes": [{
"rev": "1-967a00dff5e02add41819138abb3284d"
}]
}],
"last_seq": "1-g1AAAAI9eJyV0EsKwjAUBdD4Ad2FdQMlMW3TjOxONF9KqS1oHDjSnehOdCe6k5oQsNZBqZP3HiEcLrcEA
"pending": 0
}
Continuous
Continuous sends all new changes back to the client immediately, without closing the connection. In continuous
mode the format of the changes is slightly different to accommodate the continuous nature while ensuring that the
JSON output is still valid for each change notification.
As with the longpoll feed type you can set both the timeout and heartbeat intervals to ensure that the connection
is kept open for new changes and updates.
In continuous mode, the server sends a CRLF (carriage-return, linefeed) delimited line for each change. Each
line contains the JSON object.
Example request and response
GET /db/_changes?feed=continuous&since=0-g1AAAAI7eJyN0EEOgjAQBdBGTfQWcgLSVtriSm6iTDuEGIRE68KV3kRvo
{
"seq": "1-g1AAAAI7eJyN0EsOgjAQBuD6SPQWcgLSIm1xJTdRph1CCEKiuHClN9Gb6E30JlisCXaDbGYmk8mXyV8QQubZRB
"id": "2documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "1-967a00dff5e02add41819138abb3284d"
}]
}
{
2.4. Databases
53
"seq": "2-g1AAAAI7eJyN0E0OgjAQBeD6k-gt5ASkRdriSm6iTDuEEIREceFKb6I30ZvoTbBYE-wG2cxMmubLyysIIfNsoo
"id": "1documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "1-967a00dff5e02add41819138abb3284d"
}]
}
{
"seq": "3-g1AAAAI7eJyN0EsOgjAQBuD6SPQWcgLSIqW4kpso0w4hBCFRXLjSm-hN9CZ6EyyUBLtBNjOTyeTL5M8JIct0po
"id": "1documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "2-eec205a9d413992850a6e32678485900"
}],
"deleted": true
}
{
"seq": "4-g1AAAAI7eJyN0EEOgjAQBdAGTfQWcgLSIm1xJTdRph1CCEKiuHClN9Gb6E30JlisCXaDbGYmTfPy80tCyDyfaO
"id": "2documentation22d01513-c30f-417b-8c27-56b3c0de12ac",
"changes": [{
"rev": "2-eec205a9d413992850a6e32678485900"
}],
"deleted": true
}
You can also request the full contents of each document change (instead of just the change notification) by using
the include_docs parameter.
Filtering
You can filter the contents of the changes feed in a number of ways. The most basic way is to specify one or more
document IDs to the query. This causes the returned structure value to only contain changes for the specified IDs.
Note that the value of this query argument should be a JSON formatted array.
You can also filter the _changes feed by defining a filter function within a design document. The specification
for the filter is the same as for replication filters. You specify the name of the filter function to the filter
parameter, specifying the design document name and filter name. For example:
GET /db/_changes?filter=design_doc/filtername
The _changes feed can be used to watch changes to specific document IDs or the list of _design documents
in a database. If the filters parameter is set to _doc_ids a list of doc IDs can be passed in the doc_ids
parameter as a JSON array.
54
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
["dbcore@db1.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"dbcore@db2.testy004.cloudant.net",
"range": "80000000-9fffffff",
"nodes": ["dbcore@db1.testy004.cloudant.net", "dbcore@db2.testy004.cloudant.net", "dbcore@db3.te
}
2.4. Databases
55
{
"ok" : true
}
56
2.4. Databases
57
1000
2.5 Documents
The document endpoints can be used to create, read, update and delete documents within a database.
A list of the available methods and URL paths is provided below:
Method
POST
GET
HEAD
PUT
DELETE
COPY
GET
PUT
DELETE
Path
/db
/db/doc
/db/doc
/db/doc
/db/doc
/db/doc
/db/doc/attachment
/db/doc/attachment
/db/doc/attachment
Description
Create a new document
Returns the latest revision of the document
Returns bare information in the HTTP Headers for the document
Inserts a new document, or new version of an existing document
Deletes the document
Copies the document
Gets the attachment of a document
Adds an attachment of a document
Deletes an attachment of a document
58
Query Arguments
Argument
batch
Description
Allow document store request to be batched with others
Optional
yes
Type
string
Supported Values
ok: enable batching
Return Codes
Code
201
409
Description
Document has been created successfully
Conflict - a document with the specified document ID already exists
Response Headers
Field
ETAG
Description
Revision of the document, Same as the _rev field.
Create a new document in the specified database, using the supplied JSON document structure. If the JSON
structure includes the _id field, then the document will be created with the specified document ID. If the _id
field is not specified, a new unique ID will be generated.
For example, you can generate a new document with a generated UUID using the following request:
POST /recipes/
Content-Type: application/json
{
"servings" : 4,
"subtitle" : "Delicious with fresh bread",
"title" : "Fish Stew"
}
The returned JSON will specify the automatically generated ID and revision information:
{
"id" : "64575eef70ab90a2b8d55fc09e00440d",
"ok" : true,
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}
The document ID can be specified by including the _id field in the JSON of the submitted record. The following
request will create the same document with the ID FishStew:
POST /recipes/
Content-Type: application/json
{
"_id" : "FishStew",
"servings" : 4,
"subtitle" : "Delicious with fresh bread",
"title" : "Fish Stew"
}
2.5. Documents
59
{
"id" : "FishStew",
"ok" : true,
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}
If a document with the given id already exists, a 409 conflict response will be returned.
Batch Mode Writes
You can write documents to the database at a higher rate by using the batch option. This collects document writes
together in memory (on a user-by-user basis) before they are committed to disk. This increases the risk of the
documents not being stored in the event of a failure, since the documents are not written to disk immediately.
To use the batched mode, append the batch=ok query argument to the URL of the PUT or POST request. The
server will respond with a 202 HTTP response code immediately.
Including Attachments
You can include one or more attachments with a given document by incorporating the attachment information
within the JSON of the document. This provides a simpler alternative to loading documents with attachments than
making a separate call (see Creating or updating an attachment).
_id (optional): Document ID
_rev (optional): Revision ID (when updating an existing document)
_attachments (optional): Document Attachment
filename: Attachment information
* content_type: MIME Content type string
* data: File attachment content, Base64 encoded
The filename will be the attachment name. For example, when sending the JSON structure below:
{
"_id" : "FishStew",
"servings" : 4,
"subtitle" : "Delicious with fresh bread",
"title" : "Fish Stew"
"_attachments" : {
"styling.css" : {
"content-type" : "text/css",
"data" : "cCB7IGZvbnQtc2l6ZTogMTJwdDsgfQo=",
},
},
}
60
You can use the If-None-Match header to retrieve the document only if it has been modified. See HTTP
basics.
Query Arguments
ArguDescription
ment
conflictsReturns the conflict tree for the document.
Optional
yes
Type
yes
yes
string
boolean
yes
boolean false
rev
revs
Default
boolean false
Supported Values
true: Includes
conflicting revisions
Return Codes
Code
200
304
400
404
Description
Document retrieved
See HTTP basics
The format of the request or revision was invalid
The specified document or revision cannot be found, or has been deleted
Returns the specified doc from the specified db. For example, to retrieve the document with the id DocID you
would send the following request:
GET /db/DocID HTTP/1.1
Accept: application/json
The returned JSON is the JSON of the document, including the document ID and revision number:
{
"_id": "DocID",
"_rev": "1-2b458b0705e3007bce80b0499a1199e7",
"name": "Anna",
"age": 89,
"gender": "female"
}
Unless you request a specific revision, the latest revision of the document will always be returned.
Attachments
If the document includes attachments, then the returned structure will contain a summary of the attachments
associated with the document, but not the attachment data itself.
The JSON for the returned document will include the _attachments field, with one or more attachment definitions. For example:
{
"_id": "DocID",
"_rev": "2-f29c836d0bedc4b4b95cfaa6d99e95df",
"name": "Anna",
2.5. Documents
61
"age": 89,
"gender": "female",
"_attachments": {
"my attachment": {
"content_type": "application/json; charset=UTF-8",
"revpos": 2,
"digest": "md5-37IZysiyWLRWx31J/1WQHw==",
"length": 12,
"stub": true
}
}
}
You can obtain a list of the revisions for a given document by adding the revs=true parameter to the request
URL. For example:
GET /recipes/FishStew?revs=true
Accept: application/json
The returned JSON structure includes the original document, including a _revisions structure that includes
the revision information:
{
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"_id" : "FishStew",
"title" : "Irish Fish Stew",
"_revisions" : {
"ids" : [
"a1a9b39ee3cc39181b796a69cb48521c",
"7c4740b4dcf26683e941d6641c00c39d",
"9c65296036141e575d32ba9c034dd3ee"
],
"start" : 3
},
"_rev" : "3-a1a9b39ee3cc39181b796a69cb48521c"
}
62
ids [array]: Array of valid revision IDs, in reverse order (latest first)
start: Prefix number for the latest revision
Obtaining an Extended Revision History
You can get additional information about the revisions for a given document by supplying the revs_info argument to the query:
GET /recipes/FishStew?revs_info=true
Accept: application/json
This returns extended revision information, including the availability and status of each revision:
{
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"_id" : "FishStew",
"_revs_info" : [
{
"status" : "available",
"rev" : "3-a1a9b39ee3cc39181b796a69cb48521c"
},
{
"status" : "available",
"rev" : "2-7c4740b4dcf26683e941d6641c00c39d"
},
{
"status" : "available",
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}
],
"title" : "Irish Fish Stew",
"_rev" : "3-a1a9b39ee3cc39181b796a69cb48521c"
}
To get a specific revision, add the rev argument to the request, and specify the full revision number:
GET /recipes/FishStew?rev=2-7c4740b4dcf26683e941d6641c00c39d
Accept: application/json
The specified revision of the document will be returned, including a _rev field specifying the revision that was
requested:
{
"_id" : "FishStew",
"_rev" : "2-7c4740b4dcf26683e941d6641c00c39d",
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"title" : "Fish Stew"
}
2.5. Documents
63
If there are conflicts, the returned document will include a _conflicts field specifying the revisions that are in
conflict.
{
"_id" : "FishStew",
"_rev" : "2-7c4740b4dcf26683e941d6641c00c39d",
"servings" : 4,
"subtitle" : "Delicious with a green salad",
"title" : "Fish Stew",
"_conflicts": ["2-65db2a11b5172bf928e3bcf59f728970","2-5bc3c6319edf62d4c624277fdd0ae191"]
}
As in the case of updates there is an r query-string parameter that sets the quorum for reads. When a document is
read, requests are issued to all N copies of the partition hosting the document and the client receives a response
when r matching success responses are received. The default quorum is the simple majority of N, which is the
recommended choice for most applications.
Retrieving revision and size of a document
Method: HEAD /db/doc
Request: None
Response: None
Roles permitted: _reader
Returns the HTTP Headers containing a minimal amount of information about the specified document. The HEAD
method supports the same query arguments and returns the same status codes as the GET method, but only the
header information (including document size, and the revision as an ETag), is returned. For example, a simple
HEAD request:
HEAD /recipes/FishStew
Content-Type: application/json
The Etag header shows the current revision for the requested document, and the Content-Length specifies
the length of the data, if the document were requested in full.
Adding any of the query arguments (as supported by GET_ method), then the resulting
HTTP Headers will correspond to what would be returned. Note that the
current revision is not returned when the refs_info argument is used. For example:
64
HTTP/1.1 200 OK
Server: CouchDB/1.0.1 (Erlang OTP/R13B)
Date: Fri, 05 Nov 2010 14:57:16 GMT
Content-Type: text/plain;charset=utf-8
Content-Length: 609
Cache-Control: must-revalidate
Argument
batch
Description
Allow document store request to be batched with others
Optional
yes
Type
string
Supported Values
ok: Enable batching
HTTP Headers
Header
If-Match
Description
Current revision of the document for validation
Optional
yes
Return Codes
Code
201
202
Description
Document has been created successfully
Document accepted for writing (batch mode)
The PUT method creates a new named document, or creates a new revision of the existing document. Unlike the
POST method, you must specify the document ID in the request URL.
For example, to create the document DocID, you would send the following request:
PUT /db/DocID HTTP/1.1
Accept: application/json
{
"name": "Hannah",
"age": 120,
"gender": "female",
"_id": "DocID",
"_attachments": {
}
}
The return type is JSON of the status, document ID,and revision number:
{
"ok": true,
"id": "DocID",
"rev": "1-764b9b11845fd0b73cfa0e61acc74ecf"
}
2.5. Documents
65
To update an existing document you must specify the current revision number within the rev parameter. For
example:
PUT /db/DocID?rev=1-764b9b11845fd0b73cfa0e61acc74ecf HTTP/1.1
Accept: application/json
{
"name": "Hannah",
"age": 40,
"gender": "female",
"_id": "DocID",
"_attachments": {
},
"_rev": "1-764b9b11845fd0b73cfa0e61acc74ecf"
}
Alternatively, you can supply the current revision number in the If-Match HTTP header of the request. For
example:
PUT /test/DocID
If-Match: 1-61029d20ba39869b1fc879227f5d9f2b
Content-Type: application/json
{
"name": "Hannah",
"age": 40,
"gender": "female",
"_id": "DocID",
"_attachments": {
},
"_rev": "1-764b9b11845fd0b73cfa0e61acc74ecf"
}
The w query-string parameter on updates overrides the default write quorum for the database. When the N copies
of each document are written, the client will receive a response after w of them have been committed successfully
(the operations to commit the remaining copies will continue in the background). w defaults to the simple majority
of N, which is the recommended choice for most applications.
See also
For information on batched writes, which can provide improved performance, see Batch Mode Writes.
Deleting a document
Method: DELETE /db/doc
66
Request: None
Response: JSON of the deleted revision
Roles permitted: _writer
Query Arguments
Argument
rev
Description
Current revision of the document for validation
Optional
yes
Description
Current revision of the document for validation
Optional
yes
Type
string
HTTP Headers
Header
If-Match
Return Codes
Code
409
Description
Revision is missing, invalid or not the latest
Deletes the specified document from the database. You must supply (one of) the current revision(s), either by
using the rev parameter...
DELETE /test/DocID?rev=3-a1a9b39ee3cc39181b796a69cb48521c
The returned JSON contains the document ID, revision and status:
{
"id" : "DocID",
"ok" : true,
"rev" : "4-2719fd41187c60762ff584761b714cfb"
}
Note: Note that deletion of a record increments the revision number. The use of a revision for deletion of the
record allows replication of the database to correctly track the deletion in synchronized copies.
Copying a document
Method: COPY /db/doc
Request: None
Response: JSON of the new document and revision
Roles permitted: _writer
Query Arguments
Argument
rev
Description
Revision to copy from
2.5. Documents
Optional
yes
Type
string
67
HTTP Headers
Header
Destination
Description
Destination document (and optional revision)
Optional
no
Return Codes
Code
201
409
Description
Document has been copied and created successfully
Revision is missing, invalid or not the latest
The COPY command (which is non-standard HTTP) copies an existing document to a new or existing document.
The source document is specified on the request line, with the Destination HTTP Header of the request
specifying the target document.
Copying a Document to a new document
You can copy the latest version of a document to a new document by specifying the current document and target
document:
COPY /test/DocID
Content-Type: application/json
Destination: NewDocId
The above request copies the document DocID to the new document NewDocId. The response is the ID and
revision of the new document.
{
"id" : "NewDocId",
"rev" : "1-9c65296036141e575d32ba9c034dd3ee"
}
To copy from a specific version, add the rev argument to the query string:
COPY /test/DocID?rev=5-acfd32d233f07cea4b4f37daaacc0082
Content-Type: application/json
Destination: NewDocID
The new document will be created using the information in the specified revision of the source document.
Copying to an Existing Document
To copy to an existing document, you must specify the current revision string for the target document, adding the
rev parameter to the Destination HTTP Header string. For example:
COPY /test/DocID
Content-Type: application/json
Destination: ExistingDocID?rev=1-9c65296036141e575d32ba9c034dd3ee
The return value will be the new revision of the copied document:
{
"id" : "ExistingDocID",
"rev" : "2-55b6a1b251902a2c249b667dab1c6692"
}
68
2.5.2 Attachments
Retrieving an attachment
Method: GET /db/doc/attachment
Request: None
Response: Returns the attachment data
Roles permitted: _reader
Returns the file attachment attachment associated with the document doc. The raw data of the associated
attachment is returned (just as if you were accessing a static file. The returned HTTP Content-type will be
the same as the content type set when the document attachment was submitted into the database.
HTTP Range Requests
HTTP allows you to specify byte ranges for requests. This allows the implementation of resumable downloads
and skippable audio and video streams alike. This is available for all attachments inside Cloudant. To request a
range of bytes from an attachments, submit a Range header with your request:
GET /db/doc/attachment HTTP/1.1
Host: username.cloudant.com
Range: bytes=0-12
The response will return a status code 206 and specify the number of bytes sent in the Content-Length header
as well as the range in the Content-Range header.
206 Partial Content
Content-Type: application/octet-stream
Content-Range: bytes 0-12/30
Content-Length: 13
Accept-Ranges: bytes
HTTP supports many ways to specify single and even multiple byte ranges. Read all about it in RFC 2616.
Creating or updating an attachment
Method: PUT /db/doc/attachment
Request: Raw document data
Response: JSON document status
Roles permitted: _writer
Query Arguments
Argument
rev
Description
Current document revision
Optional
no
Type
string
HTTP Headers
Header
Content-Length
Content-Type
If-Match
2.5. Documents
Description
Length (bytes) of the attachment being uploaded
MIME type for the uploaded attachment
Current revision of the document for validation
Optional
no
no
yes
69
Return Codes
Code
201
Description
Attachment has been accepted
Upload the supplied content as an attachment to the specified document (doc). The attachment name provided
must be a URL encoded string. You must also supply either the rev query argument or the If-Match HTTP
header for validation, and the HTTP headers (to set the attachment content type). The content type is used when
the attachment is requested as the corresponding content-type in the returned document header.
For example, you could upload a simple text document using the following request:
PUT /recipes/FishStew/basic?rev=8-a94cb7e50ded1e06f943be5bfbddf8ca
Content-Length: 10
Content-Type: text/plain
Roast it
Note: Uploading an attachment updates the corresponding document revision. Revisions are tracked for the
parent document, not individual attachments.
Uploading an attachment using an existing attachment name will update the corresponding stored content of the
database. Since you must supply the revision information to add an attachment to a document, this serves as
validation to update the existing attachment.
Creating a document with an inline attachment
Inline attachments are just like any other attachment, except that their data is included in the document itself via
Base 64 encoding when the document is created or updated.
{
"_id":"attachment_doc",
"_attachments": {
"foo.txt": {
"content_type":"text/plain",
"data": "VGhpcyBpcyBhIGJhc2U2NCBlbmNvZGVkIHRleHQ="
}
}
}
70
Deleting an attachment
Method: DELETE /db/doc/attachment
Request: None
Response: JSON status
Roles permitted: _writer
Query Arguments
Argument
rev
Description
Current document revision
Optional
no
Type
string
HTTP Headers
Header
If-Match
Description
Current revision of the document for validation
Optional
yes
Return Codes
Code
200
409
Description
Attachment deleted successfully
Supplied revision is incorrect or missing
Deletes the attachment attachment to the specified doc. You must supply the rev argument with the current
revision to delete the attachment.
For example to delete the attachment basic from the recipe FishStew:
DELETE /db/DocID/my+attachment?rev=2-f29c836d0bedc4b4b95cfaa6d99e95df HTTP/1.1
Accept: application/json
71
72
The result will be that the view contains every document with the key being the id of the document, effectively
creating a copy of the database.
If the object passed to emit has an _id field, a view query with include_docs set to true will contain the
document with the given ID.
Reduce functions
If a view has a reduce function, it is used to produce aggregate results for that view. A reduce function is passed
a set of intermediate values and combines them to a single value. Reduce functions must accept, as input, results
emitted by its corresponding map function as well as results returned by the reduce function itself. The latter
case is referred to as a rereduce.
Here is an example of a reduce function:
function (key, values, rereduce) {
return sum(values);
}
Reduce functions are passed three arguments in the order key, values, and rereduce.
Reduce functions must handle two cases:
1. When rereduce is false:
key will be an array whose elements are arrays of the form [key,id], where key is a key emitted by the
map function and id is that of the document from which the key was generated.
values will be an array of the values emitted for the respective elements in keys
i.e.
reduce([ [key1,id1], [key2,id2], [key3,id3] ],
[value1,value2,value3], false)
2. When rereduce is true:
key will be null.
values will be an array of values returned by previous calls to the reduce function.
i.e. reduce(null, [intermediate1,intermediate2,intermediate3], true)
Reduce functions should return a single value, suitable for both the value field of the final view and as a member
of the values array passed to the reduce function.
Often, reduce functions can be written to handle rereduce calls without any extra code, like the summation function
above. In that case, the rereduce argument can be ignored.
73
For performance reasons, a few simple reduce functions are built in. To use one of the built-in functions, put its
name into the reduce field of the view object in your design document.
Function
_sum
_count
_stats
Description
Produces the sum of all values for a key, values must be numeric
Produces the row count for a given key, values can be any valid json
Produces a json structure containing sum, count, min, max and sum squared, values must be numeric
Dbcopy
If the dbcopy field of a view is set, the view contents will be written to a database of that name. If dbcopy is
set, the view must also have a reduce function. For every key/value pair created by a reduce query with group
set to true, a document will be created in the dbcopy database. If the database does not exist, it will be created.
The documents created have the following fields:
Field
key
value
_id
salt
partials
Description
The key of the view result. This can be a string or an array.
The value calculated by the reduce function.
The ID is a hash of the key.
This value is an implementation detail used internally.
This value is an implementation detail used internally.
This index function indexes only a single field in the document. You, however, compute the value to be indexed
from several fields or index only part of a field (rather than its entire value).
74
The index function also provides a third, options parameter that receives a JavaScript Object with the following
possible values and defaults:
Op- Description
tion
boostAnaolgous to the boost query string
parameter, but done at index time rather than
query time.
Values
Default
Float
analyzed, analyzed_no_norms,
no, not_analyzed,
not_analyzed_no_norms
true, false
1.0
(no
boosting)
analyzed
false
For more information on indexing and searching, see Searching for documents using Lucene queries.
Show functions
Show function can be used to render a document in a different format or extract only some information from a
larger document. Some show functions dont deal with documents at all and just return information about the user
making the request or other request parameters. Show functions take two arguments: The document identified by
the doc-id part of the URL (if specified) and an object describing the HTTP request. The return value of a show
function is either a string containing any data to be returned in the HTTP response or a Javascript object with fields
for the headers and the body of the HTTP response.
Example of a simple show function
function(doc, req) {
return <person name=" + doc.name + " birthday=" + doc.birthday
}
+ " />;
The request object passed to the show function describes the http request and has the following fields:
info: An object containing information about the database.
id: ID of the object being shown or null if there is no object.
method: The HTTP method used, e.g. GET.
path: An array of strings describing the path of the request URL.
query: An object that contains a field for each query parameter.
headers: An object that contains a field for each header of the HTTP request.
peer: The IP address making the request.
cookie: An object that contains a field for each cookie submitted with the HTTP request.
body: The body of the HTTP request.
form: An object containing a field for each form field of the request, if the request has the
x-www-form-urlencoded content type.
userCtx: An object describing the identity and permissions of the user making the request.
db: database name
name: user name
2.6. Design Documents
75
roles: An array of strings for each role the user has, e.g. ["_admin", "_reader", "_writer"]
Here is an example for a request object:
{
"info": {
"update_seq": "31-g1AAAADneJzLYWBgYMlgTmFQSElKzi9KdUhJMtbLTS3KLElMT9VLzskvTUnMK9HLSy3JAapkSmRI
"db_name": "dbname",
"purge_seq": 0,
"other": {
"data_size": 209
},
"doc_del_count": 0,
"doc_count": 2,
"disk_size": 1368408,
"disk_format_version": 5,
"compact_running": false,
"instance_start_time": "0"
},
"uuid": "d2b979d10234eaedc505a090968a4e7e",
"id": "74b2be56045bed0c8c9d24b939000dbe",
"method": "GET",
"path": [
"dbname",
"_design",
"designdocname",
"_show",
"showfunctionname",
"74b2be56045bed0c8c9d24b939000dbe"
],
"query": {
"foo": "bar"
},
"headers": {
"Accept": "text\/html,application\/xhtml+xml,application\/xml;q=0.9,*\/*;q=0.8",
"Accept-Charset": "ISO-8859-1,utf-8;q=0.7,*;q=0.3",
"Accept-Encoding": "gzip,deflate,sdch",
"Accept-Language": "en-US,en;q=0.8,de-DE;q=0.6,de;q=0.4",
"Connection": "close",
"Host": "username.cloudant.com",
"User-Agent": "Mozilla\/5.0 (X11; Linux x86_64) AppleWebKit\/537.22 (KHTML, like Gecko) Ubuntu
"X-Forwarded-For": "109.69.82.183"
},
"body": "undefined",
"peer": "109.69.82.183",
"form": {
},
"cookie": {
"foo": "bar"
},
"userCtx": {
"db": "dbname",
"name": "username",
"roles": [
"_admin",
"_reader",
"_writer"
]
}
}
76
Return values
Show functions can either return a string or an object with the headers and body of the HTTP response. The object
returned should have the following fields:
body: A String containing the body of the HTTP response
headers: An object with fields for each HTTP header of the response
Example show function returning a response object
function(doc, req) {
return {
body: (<h1> + req.query.header + </h1> +
<ul><li> + doc.first + </li> +
<li> + doc.second + </li></ul>),
headers: { Content-Type: text/html }
};
}
List functions
List functions are a lot like show functions, but instead of taking just one object as their input, they are applied to
all data returned from a view. Like the name suggests, they can be used to create lists of objects in various formats
(xml, html, csv).
List functions take two parameters: The first one is usually called head and contains information about the
number of rows returned from the view. The second parameter is identical to the request parameter described
under The request object.
Head parameter
The send function is used to send content in the body of the response.
send(hello);
send(bye);
The get_row function returns the next row from the view data or null if there are no more rows. The object
returned has the following fields:
id: The ID of the document associated with this row.
key: The key emitted by the view.
value: The data emitted by the view.
77
This example function created an unordered HTML list from the foo fields of the view values.
function(head, req, third, fourth, fifth) {
start({code: 200, headers: {"Content-Type": "text/html"}});
var row;
send("<ul>");
while (row = getRow()) {
send("<li>" + row.value.foo + "</li>");
}
send("</ul>");
}
Rewrite rules
A design document can contain rules for URL rewriting as an array in the rewrites field. Requests that match
the rewrite rules must have a URL path that starts with /db/_design/doc/_rewrite.
"rewrites": [
{
"from": "/",
"to": "index.html",
"method": "GET",
"query": {}
},{
"from": "/foo/:var",
"to": "/foo",
"method": "GET",
"query": {"v": "var"}
}
]
Description
A path relative to /db/_design/doc/_rewrite used to match URLs to rewrite rules. Path
elements that start with a : are treated as variables and match any string that does not contain a /. A
* can only appear at the end of the string and matches any string - including slashes.
to
The path (relative to /db/_design/doc/ and not including the query part of the URL) that will
be the result of the rewriting step. Variables captured in from can be used in to. * can also be used
and will contain everything captured by the pattern in from.
methodThe HTTP method that should be matched on.
query The query part of the resulting URL. This is a JSON object containing the key/value pairs of the
query.
78
Examples
Rule
Url
Rewrite to
Tokens
/db/_design/doc/_rewrite/a/b?k=v
/db/_design/doc/some/k=v
k=
v
/db/_design/doc/_rewrite/a/b
/db/_design/doc/some/b?var=b
var
=b
/db/_design/doc/_rewrite/a
/db/_design/doc/some
/db/_design/doc/_rewrite/a/b/c
/db/_design/doc/some/b/c
/db/_design/doc/_rewrite/a
/db/_design/doc/some
/db/_design/doc/_rewrite/a/b/c
/db/_design/doc/some/b/c?foo=b
foo
=b
/db/_design/doc/_rewrite/a/b
/db/_design/doc/some/?k=b&foo=b
foo
=:=
b
/db/_design/doc/_rewrite/a?foo=b
/db/_design/doc/some/b&foo=b
foo
=b
Description
Current revision of the document for validation
Optional
yes
Type
string
HTTP Headers
Header
If-Match
Description
Current revision of the document for validation
Optional
yes
Delete an existing design document. Deleting a design document also deletes all of the associated view indexes,
and recovers the corresponding space on disk for the indexes in question.
To delete, you must specify the current revision of the design document using the rev query argument.
For example:
DELETE /recipes/_design/recipes?rev=2-ac58d589b37d01c00f45a4418c5a15a8
Content-Type: application/json
79
The above request copies the design document recipes to the new design document recipelist. The
response is the ID and revision of the new document.
{
"id" : "recipes/_design/recipelist"
"rev" : "1-9c65296036141e575d32ba9c034dd3ee",
}
Note: Copying a design document does not automatically reconstruct the view indexes. These will be recreated,
as with other views, the first time the new view is accessed.
The new design document will be created using the specified revision of the source document.
80
The return value will be the new revision of the copied document:
{
"id" : "recipes/_design/recipes"
"rev" : "2-55b6a1b251902a2c249b667dab1c6692",
}
The individual fields in the returned JSON structure are detailed below:
name: Name/ID of Design Document
view_index: View Index
compact_running: Indicates whether a compaction routine is currently running on the view
disk_size: Size in bytes of the view as stored on disk
language: Language for the defined views
purge_seq: The purge sequence that has been processed
signature: MD5 signature of the views for the design document
81
update_seq: The update sequence of the corresponding database that has been indexed
updater_running: Indicates if the view is currently being updated
waiting_clients: Number of clients waiting on views from this design document
waiting_commit: Indicates if there are outstanding commits to the underlying database that need to
processed
82
Query Arguments
ArDecription
gument
descending
Return the documents in descending by key
order
endkeyStop returning records when the specified key
is reached
Op- Type
tional
endkey_docid
Stop returning records when the specified
document ID is reached
group Group the results using the reduce function to a
group or single row
group_level
Only applicable if the view uses complex keys,
i.e. keys that are JSON arrays. Groups reduce
results for the specified number of array fields.
include_docs
Include the full content of the documents in the
response
inclusive_end
included rows with the specified endkey
key
Return only documents that match the
specified key. Note that keys are JSON values
and must be URL-encoded.
limit Limit the number of the returned documents to
the specified number
reduceUse the reduce function
skip Skip this number of rows from the start
stale Allow the results from a stale view to be used.
This makes the request return immediately,
even if the view has not been completely built
yet. If this parameter is not given, a response
will be returned only after the view has been
built.
startkey
Return records starting with the specified key
yes
startkey_docid
Return records starting with the specified
document ID
update_seq
Include the update sequence in the generated
results
yes
string
or
JSON
array
string
yes
boolean false
yes
boolean false
yes
yes
string
or
JSON
array
string
yes
boolean false
yes
numeric
yes
boolean false
yes
yes
boolean true
string
yes
numeric
boolean true
nu0
meric
string false ok: Allow stale views,
update_after: Allow
stale views, but update them
immediately after the request
yes
yes
yes
Executes the specified view-name from the specified design-doc design document.
Querying Views and Indexes
The definition of a view within a design document also creates an index based on the key information defined
within each view. The production and use of the index significantly increases the speed of access and searching or
selecting documents from the view.
However, the index is not updated when new documents are added or modified in the database. Instead, the index
is generated or updated, either when the view is first accessed, or when the view is accessed after a document has
been updated. In each case, the index is updated before the view query is executed against the database.
View indexes are updated incrementally in the following situations:
A new document has been added to the database.
83
Requesting the same in descending order will reverse the entire view content. For example the request
GET /recipes/_design/recipes/_view/by_title?limit=5&descending=true
Accept: application/json
Content-Type: application/json
85
{
"id" : "Zucchiniinagrodolcesweet-sourcourgettes",
"key" : "Zucchini in agrodolce (sweet-sour courgettes)",
"value" : [
null,
"Zucchini in agrodolce (sweet-sour courgettes)"
]
},
{
"id" : "Zingylemontart",
"key" : "Zingy lemon tart",
"value" : [
null,
"Zingy lemon tart"
]
},
{
"id" : "Zestyseafoodavocado",
"key" : "Zesty seafood avocado",
"value" : [
null,
"Zesty seafood avocado"
]
},
{
"id" : "Zabaglione",
"key" : "Zabaglione",
"value" : [
null,
"Zabaglione"
]
},
{
"id" : "Yogurtraita",
"key" : "Yogurt raita",
"value" : [
null,
"Yogurt raita"
]
}
],
"total_rows" : 2667
}
The sorting direction is applied before the filtering is applied using the startkey and endkey query arguments.
For example the following query:
GET /recipes/_design/recipes/_view/by_ingredient?startkey=%22carrots%22&endkey=%22egg%22
Accept: application/json
Content-Type: application/json
Will operate correctly when listing all the matching entries between carrots and egg. If the order of output is
reversed with the descending query argument, the view request will return no entries:
GET /recipes/_design/recipes/_view/by_ingredient?descending=true&startkey=%22carrots%22&endkey=%22
Accept: application/json
Content-Type: application/json
86
"offset" : 21882
}
The results will be empty because the entries in the view are reversed before the key filter is applied, and therefore
the endkey of egg will be seen before the startkey of carrots, resulting in an empty list.
Instead, you should reverse the values supplied to the startkey and endkey parameters to match the descending sorting applied to the keys. Changing the previous example to:
GET /recipes/_design/recipes/_view/by_ingredient?descending=true&startkey=%22egg%22&endkey=%22carr
Accept: application/json
Content-Type: application/json
87
Query Arguments
Argu- Decription
ment
descending
Return the documents in descending by
key order
endkey Stop returning records when the specified
key is reached
Op- Type
De- Supported Values
tional
fault
yes boolean false
endkey_docid
Stop returning records when the specified
document ID is reached
group Group the results using the reduce function
to a group or single row
group_level
Only applicable if the view uses complex
keys, i.e. keys that are JSON arrays.
Groups reduce results for the specified
number of array fields.
include_docs
Include the full content of the documents
in the response
inclusive_end
included rows with the specified endkey
key
Return only documents that match the
specified key. Note that keys are JSON
values and must be URL-encoded.
limit Limit the number of the returned
documents to the specified number
reduce Use the reduce function
skip Skip this number of rows from the start
yes
string
or
JSON
array
string
yes
boolean false
yes
numeric
yes
boolean false
yes
yes
boolean true
string
yes
yes
startkey_docid
Return records starting with the specified
document ID
update_seq
Include the update sequence in the
generated results
yes
numeric
boolean true
nu0
meric
string false ok: Allow stale views,
update_after: Allow stale
views, but update them
immediately after the request
string
or
JSON
array
string
yes
boolean false
yes
yes
yes
yes
Executes the specified view-name from the specified design-doc design document. Unlike the GET method
for accessing views, the POST method supports the specification of explicit keys to be retrieved from the view
results. The remainder of the POST view functionality is identical to the Querying a view API.
For example, the request below will return all the recipes where the key for the view matches either claret or
clear apple cider :
POST /recipes/_design/recipes/_view/by_ingredient
Content-Type: application/json
{
"keys" : [
"claret",
"clear apple juice"
]
}
The returned view data contains the standard view information, but only where the keys match.
88
{
"total_rows" : 26484,
"rows" : [
{
"value" : [
"Scotch collops"
],
"id" : "Scotchcollops",
"key" : "claret"
},
{
"value" : [
"Stand pie"
],
"id" : "Standpie",
"key" : "clear apple juice"
}
],
"offset" : 6324
}
Multi-document Fetching
By combining the POST method to a given view with the include_docs=true query argument you can
obtain multiple documents from a database. The result is more efficient than using multiple Retrieving a document
requests.
For example, sending the following request for ingredients matching claret and clear apple juice:
POST /recipes/_design/recipes/_view/by_ingredient?include_docs=true
Content-Type: application/json
{
"keys" : [
"claret",
"clear apple juice"
]
}
89
"cuisine@british traditional",
"diet@corn-free",
"diet@citrus-free",
"special collections@very easy",
"diet@shellfish-free",
"main ingredient@meat",
"occasion@christmas",
"meal type@main",
"diet@egg-free",
"diet@gluten-free"
],
"preptime" : "10",
"servings" : "4",
"subtitle" : "This recipe comes from an old recipe book of 1683 called The Gentlewoma
"title" : "Scotch collops",
"totaltime" : "18"
},
"id" : "Scotchcollops",
"key" : "claret",
"value" : [
"Scotch collops"
]
},
{
...
"doc" : {
"_id" : "Standpie",
"_rev" : "1-bff6edf3ca2474a243023f2dad432a5a",
"cooktime" : "92",
"ingredients" : [
],
"keywords" : [
"diet@dairy-free",
"diet@peanut-free",
"special collections@classic recipe",
"cuisine@british traditional",
"diet@corn-free",
"diet@citrus-free",
"occasion@buffet party",
"diet@shellfish-free",
"occasion@picnic",
"special collections@lunchbox",
"main ingredient@meat",
"convenience@serve with salad for complete meal",
"meal type@main",
"cook method.hob, oven, grill@hob / oven",
"diet@cow dairy-free"
],
"preptime" : "30",
"servings" : "6",
"subtitle" : "Serve this pie with pickled vegetables and potato salad.",
"title" : "Stand pie",
"totaltime" : "437"
},
"id" : "Standpie",
"key" : "clear apple juice",
"value" : [
"Stand pie"
]
}
],
"total_rows" : 26484
}
90
The JSON object contains only the queries field, which holds an array of query objects. Each query object can
have fields for the parameters of a query. The field names and their meaning are the same as the query parameters
of a regular view request.
Here is an example of a response:
{
"results": [{
"total_rows": 3,
"offset": 0,
"rows": [{
"id": "8fbb1250-6908-42e0-8862-aef60dc430a2",
"key": 0,
"value": {
"_id": "8fbb1250-6908-42e0-8862-aef60dc430a2",
"_rev": "1-ad1680946839206b088da5d9ac01e4ef",
"foo": 0,
"bar": "foo"
}
}, {
"id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"key": 1,
"value": {
"_id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"_rev": "1-abb9a4fc9f0f339efbf667ace66ee6a0",
"foo": 1,
"bar": "bar"
}
}, {
"id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"key": 2,
"value": {
"_id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"_rev": "1-d075a71f2d47af7d4f64e4a367160e2a",
"foo": 2,
"bar": "baz"
}
}]
}, {
"total_rows": 3,
"offset": 1,
"rows": [{
"id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"key": 1,
91
"value": {
"_id": "d69fb42c-b3b1-4fae-b2ac-55a7453b4e41",
"_rev": "1-abb9a4fc9f0f339efbf667ace66ee6a0",
"foo": 1,
"bar": "bar"
}
}, {
"id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"key": 2,
"value": {
"_id": "d1fa85cd-cd18-4790-8230-decf99e1f60f",
"_rev": "1-d075a71f2d47af7d4f64e4a367160e2a",
"foo": 2,
"bar": "baz"
}
}]
}]
}
The JSON object contains only the results field, which holds an array of result objects - one for each query.
Each result object contains the same fields as the response to a regular view request.
92
Example request
GET /db/_design/my+lists/_list/asHtml/myview HTTP/1.1
Accept: application/json
93
Query Arguments
ArDescription
gument
queryA Lucene query.
bookmark
A bookmark that was received from a
previous search. This allows you to page
through the results. If there are no more
results after the bookmark, you will get a
response with an empty rows array and the
same bookmark. That way you can
determine that you have reached the end of
the result list.
staleAllow the results from a stale search index
to be used
limitLimit the number of the returned
documents to the specified number. In case
of a grouped search, this parameter limits
the number of documents per group.
include_docs
Include the full content of the documents
in the response
sort Specifies the sort order of the results. In a
grouped search (i.e. when group_field
is used), this specifies the sort order within
a group. The default sort order is
relevance.
group_field
Field by which to group search matches.
group_limit
Maximum group count. This field can only
be used if group_field is specified.
group_sort
This field defines the order of the groups in
a search using group_field. The
default sort order is relevance.
string
or
number
yes string
yes string
yes numeric
yes booleanfalse
yes JSON A JSON string of the form
"fieldname<type>" or
-fieldname<type> for descending
order, where fieldname is the name of a
string or number field and type is either
number or string or a JSON array of
such strings. The type part is optional and
defaults to number. Some examples are
"foo", "-foo", "bar<string>",
"-foo<number>" and
["-foo<number>",
"bar<string>"]. String fields used
for sorting must not be analyzed fields.
The field(s) used for sorting must be
indexed by the same indexer used for the
search query.
yes String A string containing the field name and
optionally the type of the field (string
or number) in angle brackets. If the type
is not specified, it defaults to string.
Examples are name<string>, which is
equivalent to name, and age<number>.
yes Numeric
yes JSON This field can have the same values as the
sort field, so single fields as well as
arrays of fields are supported.
This request searches for documents whose index fields match the Lucene query. Which fields of a document are
indexed and how is determined by the index functions in the design document. For more information, see Creating
or updating a design document
Here is an example of an HTTP request:
94
Search Response
The response is a JSON document that has the following structure.
total_rows: Number of results that match the search query. This number can be higher than the number
of objects in the rows array.
bookmark: String to be submitted in the next query to page through results. If this response contained no
results, the bookmark will be the same as the one used to obtain this response.
rows: Array of objects describing a search result for ungrouped (i.e. without group_field) searches.
id: Document ID
order: Specifies the order with regard to the indexed fields
fields: Object containing other search indexes
groups: Array of group objects describing each group of the search result. This field is only present for
grouped searches.
rows: Array of objects in this group that match the search. The objects in the array have the same
fields as the objects in the rows array for ungrouped searches.
total_rows: Number of objects that match the search. This number can be higher than the number
of objects in the rows array.
by: The value of the grouping field for this group.
Here is the response corresponding to the request above:
{
"total_rows": 3,
"bookmark": "g1AAAACWeJzLYWBgYMpgTmFQSElKzi9KdUhJMtbLTS3KLElMT9VLzskvTUnMK9HLSy3JAalMcgCSSfX____
"rows": [{
"id": "dd828eb4-c3f1-470f-aeff-c375ef70e4ad",
"order": [0.0, 1],
"fields": {
"default": "aa",
"foo": 0.0
}
}, {
"id": "ea522cf1-eb8e-4477-aa92-d1fa459bb216",
"order": [1.0, 0],
"fields": {
"default": "ab",
"foo": 1.0
}
}, {
"id": "c838baed-d573-43ea-9c34-621cf0f13301",
"order": [2.0, 0],
"fields": {
"default": "ac",
"foo": 2.0
}
}]
}
95
"groups": [{
"by": "group0",
"total_rows": 3,
"rows": [{
"id": "3497ff56-6d8c-435a-bcf3-704ac92252ff",
"order": [1.0, 0],
"fields": {
"default": "ac",
"bar": "ac",
"foo": "group0"
}
}, {
"id": "47d6a6cc-4533-42a3-87b7-91850fbadac8",
"order": [1.0, 0],
"fields": {
"default": "aa",
"bar": "aa",
"foo": "group0"
}
}, {
"id": "5f6f54e0-e947-412b-97e3-5942958b509d",
"order": [1.0, 1],
"fields": {
"default": "ab",
"bar": "ab",
"foo": "group0"
}
}]
}, {
"by": "group1",
"total_rows": 2,
"rows": [{
"id": "9a7e5990-d396-46d0-b642-420ce7178902",
"order": [1.0, 0],
"fields": {
"default": "ae",
"bar": "ae",
"foo": "group1"
}
}, {
"id": "fc91e465-fda9-44aa-a539-1a15c639d468",
"order": [1.0, 1],
"fields": {
"default": "ad",
"bar": "ad",
"foo": "group1"
}
}]
}]
}
2.7 Miscellaneous
These endpoints provide information about the state of the cluster and let you start replication tasks.
A list of the available methods and URL paths is provided below:
96
Method
GET
GET
GET
GET
POST
GET
Path
/
/_active_tasks
/_membership
/_all_dbs
/_replicate
/_uuids
Description
Get the welcome message and version information
Obtain a list of the tasks running in the server
Obtain a list of nodes in the cluster.
Get a list of all the DBs
Set or cancel replication
Get generated UUIDs from the server
"user": null,
"updated_on": 1363274088,
"type": "replication",
"target": "https://repl:*****@tsm.cloudant.com/user-3dglstqg8aq0uunzimv4uiimy/",
"docs_read": 0,
"doc_write_failures": 0,
"doc_id": "tsm-admin__to__user-3dglstqg8aq0uunzimv4uiimy",
"continuous": true,
"checkpointed_source_seq": "403-g1AAAADfeJzLYWBgYMlgTmGQS0lKzi9KdUhJMjTRyyrNSS3QS87JL01JzCvRy0
"changes_pending": 134,
"pid": "<0.1781.4101>",
"node": "dbcore@db11.julep.cloudant.net",
"docs_written": 0,
"missing_revisions_found": 0,
"replication_id": "d0cdbfee50a80fd43e83a9f62ea650ad+continuous",
"revisions_checked": 0,
"source": "https://repl:*****@tsm.cloudant.com/tsm-admin/",
"source_seq": "537-g1AAAADfeJzLYWBgYMlgTmGQS0lKzi9KdUhJMjTUyyrNSS3QS87JL01JzCvRy0styQGqY0pkSLL
"started_on": 1363274083
},
2.7. Miscellaneous
97
{
"user": "acceptly",
"updated_on": 1363273779,
"type": "indexer",
"node": "dbcore@db11.julep.cloudant.net",
"pid": "<0.20723.4070>",
"changes_done": 189,
"database": "shards/00000000-3fffffff/acceptly/acceptly_my_chances_logs_live.1321035717",
"design_document": "_design/MyChancesLogCohortReport",
"started_on": 1363273094,
"total_changes": 26389
},
{
"user": "username",
"updated_on": 1371118433,
"type": "search_indexer",
"total_changes": 5466,
"node": "dbcore@db7.meritage.cloudant.net",
"pid": "<0.29569.7037>",
"changes_done": 4611,
"database": "shards/40000000-7fffffff/username/database_name",
"design_document": "_design/lucene",
"index": "search1",
"started_on": 1371118426
},
{
"view": 1,
"user": "acceptly",
"updated_on": 1363273504,
"type": "view_compaction",
"total_changes": 26095,
"node": "dbcore@db11.julep.cloudant.net",
"pid": "<0.21218.4070>",
"changes_done": 20000,
"database": "shards/80000000-bfffffff/acceptly/acceptly_my_chances_logs_live.1321035717",
"design_document": "_design/MyChancesLogCohortReport",
"phase": "view",
"started_on": 1363273094
},
{
"updated_on": 1363274040,
"node": "dbcore@db11.julep.cloudant.net",
"pid": "<0.29256.4053>",
"changes_done": 272195,
"database": "shards/00000000-3fffffff/heroku/app3245179/id_f21a08b7005e_logs.1346083461",
"started_on": 1363272496,
"total_changes": 272195,
"type": "database_compaction"
}
]
The returned structure includes the following fields for each task:
pid: Erlang Process ID
type: Operation Type
updated_on: Time when the last update was made to this task record. Updates are made by the job as
progress occurs. The value is in Unix time UTC.
started_on: Time when the task was started. The value is in Unix time UTC.
total_changes: Total number of documents to be processed by the task. The exact meaning depends on the
type of the task.
98
99
Description
Replication request successfully completed
Continuous replication request has been accepted
Either the source or target DB is not found
JSON specification was invalid
Push replication is where the source is a local database, and destination is a remote database.
For example, to request replication between a database on the server example.com, and a database on Cloudant
you might use the following request:
POST /_replicate
Content-Type: application/json
Accept: application/json
{
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
}
In all cases, the requested databases in the source and target specification must exist. If they do not, an error
will be returned within the JSON object:
{
"error" : "db_not_found"
"reason" : "could not open http://username.cloudant.com/ol1ka/",
}
You can create the target database (providing your user credentials allow it) by adding the create_target
field to the request object:
POST http://username.cloudant.com/_replicate
Content-Type: application/json
Accept: application/json
{
"create_target" : true
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
}
The create_target field is not destructive. If the database already exists, the replication proceeds as normal.
Single Replication
You can request replication of a database so that the two databases can be synchronized. By default, the replication
process occurs one time and synchronizes the two databases together. For example, you can request a single
synchronization between two databases by supplying the source and target fields within the request JSON
content.
POST /_replicate
Content-Type: application/json
Accept: application/json
{
"source" : "http://user:pass@user.cloudant.com/recipes",
"target" : "http://user:pass@user.cloudant.com/recipes2",
}
In the above example, the databases recipes and recipes2 will be synchronized. The response will be a
JSON structure containing the success (or failure) of the synchronization process, and statistics about the process:
{
"ok" : true,
"history" : [
{
"docs_read" : 1000,
"session_id" : "52c2370f5027043d286daca4de247db0",
"recorded_seq" : 1000,
2.7. Miscellaneous
101
"end_last_seq" : 1000,
"doc_write_failures" : 0,
"start_time" : "Thu, 28 Oct 2010 10:24:13 GMT",
"start_last_seq" : 0,
"end_time" : "Thu, 28 Oct 2010 10:24:14 GMT",
"missing_checked" : 0,
"docs_written" : 1000,
"missing_found" : 1000
}
],
"session_id" : "52c2370f5027043d286daca4de247db0",
"source_last_seq" : 1000
}
The structure defines the replication status, as described in the table below:
history [array]: Replication History
doc_write_failures: Number of document write failures
docs_read: Number of documents read
docs_written: Number of documents written to target
end_last_seq: Last sequence number in changes stream
end_time: Date/Time replication operation completed
missing_checked: Number of missing documents checked
missing_found: Number of missing documents found
recorded_seq: Last recorded sequence number
session_id: Session ID for this replication operation
start_last_seq: First sequence number in changes stream
start_time: Date/Time replication operation started
ok: Replication status
session_id: Unique session ID
source_last_seq: Last sequence number read from source database
Continuous Replication
Synchronization of a database with the previously noted methods happens only once, at the time the replicate request is made. To have the target database permanently replicated from the source, you must set the continuous
field of the JSON object within the request to true.
With continuous replication changes in the source database are replicated to the target database in perpetuity until
you specifically request that replication ceases.
POST /_replicate
Content-Type: application/json
Accept: application/json
{
"continuous" : true
"source" : "http://user:pass@example.com/db",
"target" : "http://user:pass@user.cloudant.com/db",
}
Changes will be replicated between the two databases as long as a network connection is available between the
two instances.
102
Note: To keep two databases synchronized with each other, you need to set replication in both directions; that is,
you must replicate from databasea to databaseb, and separately from databaseb to databasea.
Requesting cancellation of a replication that does not exist results in a 404 error.
Description
Number of UUIDs to return
Optional
yes
Type
numeric
Requests one or more Universally Unique Identifiers (UUIDs). The response is a JSON object providing a list of
UUIDs. For example:
{
"uuids" : [
"7e4b5a14b22ec1cf8e58b9cdd0000da3"
]
}
2.7. Miscellaneous
103
You can use the count argument to specify the number of UUIDs to be returned. For example:
GET /_uuids?count=5
Returns:
{
"uuids" : [
"c9df0cdf4442f993fc5570225b405a80",
"c9df0cdf4442f993fc5570225b405bd2",
"c9df0cdf4442f993fc5570225b405e42",
"c9df0cdf4442f993fc5570225b4061a0",
"c9df0cdf4442f993fc5570225b406a20"
]
}
Path
/db/_local/local-doc
/db/_local/local-doc
/db/_local/local-doc
/db/_local/local-doc
Description
Returns the latest revision of the non-replicated document
Inserts a new version of the non-replicated document
Deletes the non-replicated document
Copies the non-replicated document
104
Argument: revs
* Description: Return a list of the revisions for the document
* Optional: yes
* Type: boolean
Argument: revs_info
* Description: Return a list of detailed revision information for the document
* Optional: yes
* Type: boolean
* Supported Values
true: Includes the revisions
Return Codes:
400: The format of the request or revision was invalid
404: The specified document or revision cannot be found, or has been deleted
Gets the specified local document. The semantics are identical to accessing a standard document in the specified
database, except that the document is not replicated. See Retrieving a document.
105
* Optional: yes
Return Codes:
409: Supplied revision is incorrect or missing
Deletes the specified local document. The semantics are identical to deleting a standard document in the specified
database, except that the document is not replicated. See Deleting a document.
106
CHAPTER
THREE
Although you can access Cloudant directly over HTTP, plenty of language-specific tools and frameworks exist to
make it even easier.
3.1 Python
Pythons readability and standardized coding practices make it effortless to read and write, and as a result its great
for teaching. Academicians and scientists make heavy use of it for all manner of research and analysis, notably
the Natural Language ToolKit, SciPy, NumPy, Matplotlib, Numba, and PyTables.
Python is also superb for working with Cloudant. If you dont already have Python, oh golly, go get it.
Thatll install to someplace on your PYTHONPATH, so your projects have access to the downloaded package. If
you want to isolate dependencies between projects, use virtualenv, which you can install like this:
sudo pip install virtualenv
Then, to create an isolated Python environment for your project, just do this:
# create a venv folder
virtualenv venv
# use its bin, etc., folders for Python business
source venv/bin/activate
You should see (venv) prepended to your terminals command line. Now, any pip packages you install will go into
that venv folder.
For even more convenience creating virtual environments for your Python projects, check out virtualenvwrapper,
which extends virtualenv for more ease-of-use.
107
CouchDB-Python
Cloudants API resembles the way Python dict objects work, which makes CouchDB-Python wickedly intuitive
to use. To get the library, just pip it!
pip install couchdb
Its interface is very similar, too, which you can see in more detail here:
import couchdbkit
# connect to cloudant
server = couchdbkit.Server(https://USERNAME:PASSWORD@USERNAME.cloudant.com)
db = server.get_or_create_db(posts)
# save a document
db.save_doc({
author: Mike Broberg,
content: "In my younger and more vulnerable years, my father told me, Son, turn that racket
})
108
Couchdbkit exposes a document class for adding simple schemas to your application, like so:
class Post(couchdbkit.Document):
author = couchdbkit.StringProperty()
content = couchdbkit.StringProperty()
# associate posts with a given database object
Post.set_db(db)
new_post = Post(
author="Mike Broberg",
content="In his spare time, Mike Broberg enjoys barbecuing meats of various origins."
)
# save the post to its associated database
new_post.save()
For more detail than you could ever want, check out the API Documentation.
Requests
Because Cloudants API is just an HTTP interface, you can interact with it from any HTTP library. Python has a
beautiful HTTP library called Requests, so if you dont want to deal with all the abstraction of a client library, use
that. Heres how:
First, pip!
pip install requests
3.1. Python
109
import json
Dandy, eh? Just hit Cloudants API by URL directly. Dont forget to set headers and request bodies appropriately!
See Requests documentation for more info.
Run it with python [file] and boom, youre live. Thats all it takes to build a website with Flask.
A rich community of addons, plugins, and integrations mean you can add features and functionality quickly
without relying on monolithic software choices. Just google flask [thing] and youll more than probably find
integrations that suit your needs.
Django
Django is to Python as Ruby on Rails is to Ruby: an enormous, web-focused project that attempts to make
intelligent design decisions about app structure, architecture, layout, and tooling so that you can focus on building
your app. Some of what it provides out of the box:
A SQL model layer
HTML templating
Automatic administrative interface
Elegant URL routing
Internationalization support
Authentication and authorization systems
Django has an enormous community of addons, plugins, integrations, and the like, so if you need more, just google
django [thing] and youll likely find what you need. To get started with Django, check out their documentation.
110
3.1.4 CouchApps
The original CouchApp utility, couchapp, is written in Python. Though the original author has chosen to focus on
Erica instead, the python utility is still more feature-complete and works just fine. To install the utility, just pip it:
sudo pip install couchapp
Then, you can scaffold, push, and even pull CouchApps right from the command line:
# scaffold a couchapp
couchapp generate cloudant
# push it live
couchapp push cloudant https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE
# clone it!
couchapp clone https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE/_design/cloudant cloudant2
# oh look, there it is!
ls cloudant2
The python utility also supports the _docs folder, which couchapp will upload to your target database as JSON
documents rather than attachments. Very nifty for scaffolding data, or syncing projects.
As always, if you have any trouble, post your question to StackOverflow, ping us on IRC, or if youd like to
discuss the matter in private, email us at support@cloudant.com.
Happy coding!
3.2 Node.js
Node.js is an open-source platform for writing JavaScript on the server. Its fast, asynchronous, and has a tremendous community that is only getting bigger. Because Cloudant indexes are in JavaScript, along with all client-side
code, writing JavaScript on the server means your head never needs to switch gears. But perhaps most importantly,
there are a ton of tools that make developing on Cloudant with Node.js effortless.
To get started with Node.js, download the binary for your operating system here.
Because npmjs.org is a CouchApp, you can replicate it and host your own registry using Cloudant. Specifically,
replicate from https://registry.npmjs.org/ to your database, and bam, you have your own private
registry. Then, you can push custom or private libraries to your registry and download them just like you would
from npm.
111
3.2.3 Libraries
Several JavaScript libraries make it effortless to work with Cloudant:
PouchDB
PouchDB is a JavaScript package that runs either in the browser or in a Node.js environment, and acts like its own
little Cloudant instance, so you can write data to it even if youre not online, and sync data between it and remote
Cloudant instances.
Creating a PouchDB instance that syncs with a Cloudant database:
var db = new PouchDB(dbname),
remote = https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE,
opts = {
continuous: true
};
db.replicate.to(remote, opts);
db.replicate.from(remote, opts);
112
db.query({
// write your map function in JavaScript
map: function (doc) {
if (doc.title) emit(doc.title, null);
}
}, {
// in this example, we wont use a reduce function
reduce: false
}, function (err, response) {
// log the error, or the response if no error
console.log(err || response);
});
Nano
Nano tries to be as out-of-your-way as possible, so its very lightweight to use:
// require nano, point it at our instances root
var nano = require(nano)(https://garbados.cloudant.com);
// create a database
nano.db.create(example);
// create an alias for working with that database
var example = nano.db.use(example);
// fetch the primary index
example.list(function(err, body){
if (err) {
// something went wrong!
throw new Error(err);
} else {
// print all the documents in our database
console.log(body);
}
});
Nano has begun to support Cloudant-specific features like search, which makes it my library of choice for working
with Cloudant from Node.js.
Cradle
Cradle is a more full-bodied library than Nano, with features like caching, and convenience methods to get and
update documents. This usage example comes from its readme:
var cradle = require(cradle);
var db = new(cradle.Connection)().database(starwars);
db.get(vader, function (err, doc) {
doc.name; // Darth Vader
assert.equal(doc.force, dark);
});
db.save(skywalker, {
force: light,
name: Luke Skywalker
}, function (err, res) {
if (err) {
// Handle error
} else {
// Handle success
}
});
3.2. Node.js
113
Nifty, eh?
Check out these CouchApps built with node.couchapp.js as examples:
Egg Chair: like Pinterest and Flickr, but without the terms and conditions.
Chaise Blog: A CouchApp blog, using two databases and filtered replication to share only what you want.
114
With only a few commands, my project is built, tested, and deployed. Here are some generators I use frequently
at Cloudant:
generator-reveal: Scaffold reveal.js presentations and upload them to Cloudant by running grunt couch.
generator-couchapp: Scaffolds a blank CouchApp that you can upload to Cloudant by running grunt.
And Grunt plugins:
grunt-couchapp automates pushing CouchApps, using node.couchapp.js, along with creating and deleting
databases, using nano
grunt-couch: like grunt-couchapp, but with an interface more like the classic Python CouchApp utility.
For more on asynchronous programming in Node.js, check out Control Flow in Node.
3.2. Node.js
115
3.3.1 Libraries
MyCouch
MyCouch is a modern, asynchronous CouchDb / Cloudant client for .NET. It provides an extensible, thin wrapper
around the CouchDb HTTP API, allowing you to plug in your own serialisation layer. MyCouch is also the only
.NET client to currently have first-class support for Cloudant.
An example:
// connect to Cloudant
using (var client ~ new Client("https://USERNAME:PASSWORD@USERNAME.cloudant.com/DATABASE"))
{
// get document by ID
await client.Documents.Get("12345");
// get document by ID (strongly typed POCO version)
MyObject myObj ~ await client.Documents.Get<MyObject>("12345");
}
Installation: NuGet
Compatibility: .NET 4 and above, Windows Store
LoveSeat
LoveSeat is a popular and well established CouchDB / Cloudant C# client, architected with the intent to abstract
away just enough so that its easy to use, but not enough so that you dont know whats going on. The API is
synchronous and doesnt yet support Cloudant-specific features such as Search or API key management.
An example:
// connect to Cloudant
var client ~ new CouchClient("username.cloudant.com", 443, username, password, true, Authentic
var db~ client.GetDatabase("Northwind");
// get document by ID
Document myDoc ~ db.GetDocument("12345");
// get document by ID (strongly typed POCO version)
MyObject myObj ~ db.GetDocument<MyObject>("12345");
Installation: NuGet
Compatibility: .NET 3.5 and above, Mono 2.9
3.3.2 Tutorials
MyCouch says hello to Cloudant
In this tutorial, MyCouch developer, Daniel Wertheim, walks us through setting up MyCouch to talk to Cloudant
and perform basic CRUD operations.
116
3.3.3 Resources
Windows Azure
Windows Azure is an open and flexible cloud platform that enables you to quickly build, deploy and manage
applications across a global network of Microsoft-managed datacenters. Cloudant have partnered with Microsoft
to provide a multi-tenant cluster on Azure (Lagoon 2 - currently in beta).
AppHarbour
AppHarbour is a fully-hosted .NET platform as a service. Cloudant is available as an add-on for the service or
you can choose to sign up to Cloudant directly.
FoxWeave
FoxWeave is a service that allows you to build a data import workflow, processing data as it streams from one data
store to another. For example taking data from an existing SQL database and shipping it to Cloudant.
117
118
CHAPTER
FOUR
GUIDES
119
The consistency of application data can be addressed after the fact. As Seth Gilbert and Nancy Lynch of MIT
conclude in their proof of CAP theorem, most real-world systems today are forced to settle with returning most
of the data, most of the time.
4.2 MapReduce
MapReduce is an algorithm for slicing and dicing large datasets across distributed computing systems, such as
Cloudant clusters.
A MapReduce program has two parts:
a map function, which processes documents from your dataset into key-value pairs.
a reduce function, which combines the set returned by map or the results of previous reduce functions
into a single value per key.
Keep reading to see how it works at Cloudant, or head to our blog to read about the math and science behind
MapReduce with some foundational MapReduce literature.
120
Chapter 4. Guides
For secondary indexes, Cloudant uses an implementation of MapReduce which works incrementally. When you
insert or update a document, rather than rerun the program on the entire dataset, we compute only for the documents that changed, and the reduce results those documents impact, so you can access MapReduce results in
only the time it takes to read them from disk, rather than the time it takes to compute them anew.
This will let us sort and group events by location, and use a reduce value to, for example, sum the number of
attendees. The result of this map over a dataset might look something like this:
{
"total_rows": 3,
"offset": 0,
"rows": [
{
// the documents ID
"id": "eac6f1faf2cc8dd6fbbbb5205c001763",
// the key we emitted
"key": ["France", "Paris"],
// the value we emitted
"value": 67
},
{
"id": "eac6f1faf2cc8dd6fbbbb5205c0021ce",
"key": ["UK", "Bristol"],
"value": 32
},
{
"id": "986d02a1d491fe906856609e9935fa47",
"key": ["USA", "Boston"]
"value": 194
},
{
"id": "ecfaf6648cec1f8f1f7c6b365c1115f4",
"key": ["UK", "Bristol"],
"value": 45
}
]
}
Both keys and values can be any valid JSON data structure: strings, numbers, arrays, or objects.
Check out query options for all the options you can use to modify map results.
4.2. MapReduce
121
Reduce
If at all possible, dont use custom reduce functions! Use this section to learn about how reduces work, but prefer
the built-in functions outlined in the next section. They are simpler, faster, and will save you time.
Lets say we wanted to sum up all the values a map function emitted. That operation would be done in the reduce
function.
Reduces are called with three parameters: key, values and rereduce.
keys will be a list of keys as emitted by the map or, if rereduce is true, null.
values will be a list of values for each element in keys, or if rereduce is true, a list of results from previous
reduce functions.
rereduce will be true or false.
Heres an example that finds the largest value within the dataset:
function (key, values, rereduce) {
// Return the maximum numeric value.
var max = -Infinity
for(var i = 0; i < values.length; i++)
if(typeof values[i] == number)
max = Math.max(values[i], max)
return max
}
ReReduce
Reduce functions can be given either the results of map functions, or the results of reduce functions that already
ran. In that latter case, rereduce is true, because the reduce function is re-reducing the data. (Get it?)
This way, nodes reduce datasets more quickly by handling both map results and, once thats all been processed,
newly computed reduce values.
Heres a simple reduce function that counts values, and handles for rereduce:
For the mathematically inclined: operations which are both commutative and associative need not worry about
rereduce.
Built-in Reduces
Cloudant exposes several built-in reduce functions which, because theyre written in Cloudants native Erlang
rather than JavaScript, run much faster than custom functions.
_sum
Given an array of numeric values, _sum just, well, sums them up. Our Chained MapReduce example uses _sum
to report the best sales months and top sales reps. Heres an example view:
122
Chapter 4. Guides
"map": "function(doc){
if (doc.rep){
emit({"rep": doc.rep}, doc.amount);
}
}",
"reduce": "_sum"
This yields sales by rep. Queried without options, the view will report the total sales for all reps. But, if you group
the results using group=true, youll get sales by rep.
_sum works for documents containing objects and arrays with numeric values inside of them, as long as the
structure of those documents is consistent. So, two documents like...
[
{
"x":
"y":
"z":
},
{
"x":
"y":
"z":
}
1,
2,
3
4,
5,
6
_count reports the number of docs emitted by the map function, regardless of the emitted values types. Consider
this example:
map: function(doc){ if(doc.type === event){ emit(doc.location, null); } }, reduce: _count
If we grouped by key, this would tell us how many events happened at each location.
_stats
Like _sum on steroids, _stats produces a JSON structure containing the sum, count, min, max and sum squared.
Also like _sum, _stats only deals with numeric values and arrays of numbers; itll get mighty angry if you
start passing it strings or objects. Consider how you might use _stats to get statistics about shopping cart
interactions:
"map": "function(doc){
if(doc.type === "stock"){
emit([doc.stock_symbol, doc.created_at.hour], doc.value);
}
}",
"reduce": "_stats"
With group=true&group_level=1, which groups results on the first key, youll get stats per symbol across
all time. With group=true&group_level=2, youll get stats for trades by stock symbol by hour. Nifty, eh?
4.2. MapReduce
123
{
"map": "function(doc){
if(doc.type === event){
emit(doc.location, doc.attendees);
}
}",
"dbcopy": "other_database"
}
This will populate other_database (or whatever database you indicate) with the results of that map function, like
this:
{
"id": "eac6f1faf2cc8dd6fbbbb5205c0021ce",
"key": ["UK", "Bristol"],
"value": 32
}
You can then write secondary indexes for other_database that manipulate the results accordingly, potentially
including secondary indexes that use dbcopy again to emit another transformation to another database.
4.3.1 Revisions
In a Cloudant database, every document has a revision. The revision is stored in the _rev field of the document. As
a developer, you should treat it as an opaque string used internally by the database and not rely on it as a counter.
When you retrieve a document from the database, you can either retrieve the latest revision or you can ask for
a past revision by specifying the rev query parameter. However, past revisions will only be kept in the database
for a short time or if the revisions are in conflict. Otherwise, old revisions will be deleted regularly by a process
called compaction. Cloudants revisions are thus not a good fit for implementing a version control system. For this
purpose, we recommend creating a new document per revision. When you update a document, you have to specify
the previous revision, and if the update is successful, the _rev field will be updated automatically. However, if
the revision you specified in your update request does not match the latest revision in the database, your request
will fail with HTTP status 409 (conflict). This technique is called multi-version concurrency control (MVCC);
it prevents concurrent updates from accidentally overwriting or reversing each others changes, works well with
disconnected clients and does not require write locks. That said, as any mechanism for dealing with concurrency,
it does have some tricky parts.
124
Chapter 4. Guides
You can then regularly query this view and resolve conflicts as needed or query the view after each replication.
As the document doesnt have a description yet, someone might add one.
{
"_id": "74b2be56045bed0c8c9d24b939000dbe",
"_rev": "2-61ae00e029d4f5edd2981841243ded13",
"name": "Samsung Galaxy S4",
"description": "Latest smartphone from Samsung",
"price": 650
}
At the same time, someone else - working with a replicated database - reduces the price.
{
"_id": "74b2be56045bed0c8c9d24b939000dbe",
"_rev": "2-f796915a291b37254f6df8f6f3389121",
"name": "Samsung Galaxy S4",
"description": "",
"price": 600
}
125
http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?conflicts=true
...and get the following response:
{
"_id":"74b2be56045bed0c8c9d24b939000dbe",
"_rev":"2-f796915a291b37254f6df8f6f3389121",
"name":"Samsung Galaxy S4",
"description":"",
"price":600,
"_conflicts":["2-61ae00e029d4f5edd2981841243ded13"]
}
The version with the changed price has been chosen arbitrarily as the latest version of the document and the
conflict is noted in the _conflicts array. In most cases this array has only one element, but there can be many
conflicting revisions.
2. Merge the changes
Now your applications needs to compare the revisions to see what has been changed. To do that, it gets all the
version from the database with the following URLs:
http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe
http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=261ae00e029d4f5edd2981841243ded13
http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=17438df87b632b312c53a08361a7c3299
Since the two changes are for different fields of the document, it is easy to merge them automatically.
Depending on your application and the nature of the changes, other conflict resolution strategies might be useful.
Some common strategies are:
time based: first or last edit
reporting conflicts to users and letting them decide on the best resolution
more sophisticated merging algorithms, e.g. 3-way merges of text fields
3. Upload the new revision
We produce the following document and update the database with it.
{
"_id": "74b2be56045bed0c8c9d24b939000dbe",
"_rev": "3-daaecd7213301a1ad5493186d6916755",
"name": "Samsung Galaxy S4",
"description": "Latest smartphone from Samsung",
"price": 600
}
DELETE http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=2-61ae00e029d4f5
DELETE http://username.cloudant.com/products/74b2be56045bed0c8c9d24b939000dbe?rev=2-f796915a291b37
After that, the document is not in conflict any more and you can verify that by getting the document again with
the conflicts parameter set to true.
126
Chapter 4. Guides
4.5 Replication
Replication is an incremental, one-way process involving two databases, a source and a destination. At the end of
the replication process, all latest revisions of documents in the source database are also in the destination database
and all documents that were deleted from the source database are also deleted (if necessary) from the destination
database.
The replication process only copies the latest revision of a document, so all previous revisions that were only on
the source database are not copied to the destination database.
127
Field
Name
source
ReDescription
quired
yes
Identifies the database to copy revisions from. Can be a database URL, or an object
whose url property contains the full URL of the database.
target
yes
Identifies the database to copy revisions to. Same format and interpretation as source.
cancel
no
Include this property with a value of true to cancel an existing replication between
the specified source and target.
continuousno
A value of true makes the replication continuous (see below for details.)
create_target
no
A value of true tells the replicator to create the target database if it doesnt exist.
doc_ids
no
Array of document IDs; if given, only these documents will be replicated.
filter
no
Name of a filter function that can choose which revisions get replicated. cc
proxy
no
Proxy server URL.
query_params
no
Object containing properties that are passed to the filter function.
use_checkpoints
no
Whether to create checkpoints. Checkpoints greatly reduce the time and resources
needed for repeated replications. Setting this to false removes the requirement for
write access to the source database. Defaults to true.
The source and a target fields indicate the databases that documents will be copied from and to, respectively.
Unlike CouchDB, you have to use the full URL of the database.
POST /_replicate HTTP/1.1
{
"source": "http://username.cloudant.com/example-database",
"target": "http://example.org/example-database"
}
The target database has to exist and is not implicitly created. Add "create_target":true to the JSON
object to create the target database (remote or local) prior to replication. The names of the source and target
databases do not have to be the same.
Canceling replication
A replication triggered by POSTing to /_replicate/ can be canceled by POSTing the exact same JSON object
but with the additional cancel property set to true.
POST /_replicate HTTP/1.1
{
"source": "https://username:password@username.cloudant.com/example-database",
"target": "https://username:password@example.org/example-database",
"cancel": true
}
Notice: the request which initiated the replication will fail with error 500 (shutdown).
The replication ID can be obtained from the original replication request (if its a continuous replication) or from
/_active_tasks.
Example
128
Chapter 4. Guides
The "ok":
{
"ok": true,
"_local_id": "0a81b645497e6270611ec3419767a584+continuous+create_target"
}
Continuous replication
To make replication continuous, add a "continuous":true parameter to the JSON, for example:
$ curl -H Content-Type: application/json -X POST http://username.cloudant.com/_replicate \
-d {
"source": "http://username:password@example.com/foo",
"target": "http://username:password@username.cloudant.com/bar",
"continuous": true
}
Replications can be persisted, so that they survive server restarts. For more, see Replicator Database.
Filtered Replication
Sometimes you dont want to transfer all documents from source to target. You can include one or more filter
functions in a design document on the source and then tell the replicator to use them.
A filter function takes two arguments (the document to be replicated and the the replication request) and returns
true or false. If the result is true, the document is replicated.
function(doc, req) {
return !!(doc.type && doc.type == "foo");
}
4.5. Replication
129
Authentication
The source and the target database may require authentication, and if checkpoints are used (on by default), even
the source will require write access. The easiest way to authenticate is to put a username and password into the
URL; the replicator will use these for HTTP Basic auth:
{
"source": "https://username:password@example.com/db",
"target": "https://username:password@username.cloudant.com/db"
}
130
Chapter 4. Guides
http_connections - The maximum number of HTTP connections per replication. For push replications, the effective number of HTTP connections used is min(worker_processes + 1, http_connections). For
pull replications, the effective number of connections used corresponds to this parameters value. Default
value is 20.
connection_timeout - The maximum period of inactivity for a connection in milliseconds. If a connection is idle for this period of time, its current request will be retried. Default value is 30000 milliseconds
(30 seconds).
retries_per_request - The maximum number of retries per request. Before a retry, the replicator
will wait for a short period of time before repeating the request. This period of time doubles between each
consecutive retry attempt. This period of time never goes beyond 5 minutes and its minimum value (before
the first retry is attempted) is 0.25 seconds. The default value of this parameter is 10 attempts.
socket_options - A list of options to pass to the connection sockets. The available options can be found
in the documentation for the Erlang function setopts/2 of the inet module. Default value is [{keepalive,
true}, {nodelay, false}].
Example
POST /_replicate HTTP/1.1
{
"source": "https://username:password@example.com/example-database",
"target": "https://username:password@example.org/example-database",
"connection_timeout": 60000,
"retries_per_request": 20,
"http_connections": 30
}
As soon as the replication is triggered, the document will be updated with 3 new fields:
{
"_id": "my_rep",
"source": "https://username:password@myserver.com:5984/foo",
"target": "https://username:password@username.cloudant.com/bar",
4.5. Replication
131
"create_target": true,
"_replication_id": "c0ebe9256695ff083347cbf95f93e280",
"_replication_state": "triggered",
"_replication_state_time": "2011-06-07T16:54:35+01:00"
}
Note: special fields set by the replicator start with the prefix _replication_.
_replication_id: the ID internally assigned to the replication. This is the ID exposed by the output
from /_active_tasks/;
_replication_state: the current state of the replication;
_replication_state_time: an RFC3339 compliant timestamp that tells us when the current replication state (defined in _replication_state) was set.
When the replication finishes,
it will update the _replication_state field
_replication_state_time) with the value "completed", so the document will look like:
(and
{
"_id": "my_rep",
"source": "https://username:password@myserver.com:5984/foo",
"target": "https://username:password@username.cloudant.com/bar",
"create_target": true,
"_replication_id": "c0ebe9256695ff083347cbf95f93e280",
"_replication_state": "completed",
"_replication_state_time": "2011-06-07T16:56:21+01:00"
}
When an error happens during replication, the _replication_state field is set to "error".
There are only 3 possible values for the _replication_state field: "triggered", "completed" and
"error". Continuous replications never get their state to "completed".
Canceling replications
To cancel a replication simply DELETE the document which triggered the replication. Note that if the replication
is in an error state, the replicator will try it again and again, updating the replication document and thereby
changing the revision. You thus need to get the revision immediately before deleting the document or you might
get a document update conflict response.
Example
$ curl -X DELETE http://username.cloudant.com/_replicator/replication1?rev=...
Note: You need to DELETE the document that triggered the replication. DELETEing another document that
describes the same replication but did not trigger it will not cancel the replication.
The user_ctx property and delegations
Replication documents can have a custom user_ctx property. This property defines the user context under
which a replication runs. For the old way of triggering replications (POSTing to /_replicate/), this property
was not needed (it didnt exist in fact) - this is because at the moment of triggering the replication it has information
about the authenticated user. With the replicator database, since its a regular database, the information about the
authenticated user is only present at the moment the replication document is written to the database - the replicator
database implementation is like a _changes feed consumer (with ?include_docs=true) that reacts to what
was written to the replicator database - in fact this feature could be implemented with an external script/program.
This implementation detail implies that for non admin users, a user_ctx property, containing the users name and
a subset of his/her roles, must be defined in the replication document. This is ensured by the document update
validation function present in the default design document of the replicator database. This validation function also
132
Chapter 4. Guides
ensure that a non admin user can set a user name property in the user_ctx property that doesnt match his/her
own name (same principle applies for the roles).
For admins, the user_ctx property is optional, and if its missing it defaults to a user context with name null
and an empty list of roles - this means design documents will not be written to local targets. If writing design
documents to local targets is desired, then a user context with the roles _admin must be set explicitly.
Also, for admins the user_ctx property can be used to trigger a replication on behalf of another user. This is
the user context that will be passed to local target database document validation functions.
Note: The user_ctx property only has effect for local endpoints.
Example delegated replication document:
{
"_id": "my_rep",
"source": "https://username:password@myserver.com:5984/foo",
"target": "https://username:password@username.cloudant.com/bar",
"continuous": true,
"user_ctx": {
"name": "joe",
"roles": ["erlanger", "researcher"]
}
}
As stated before, for admins the user_ctx property is optional, while for regular (non admin) users its mandatory. When the roles property of user_ctx is missing, it defaults to the empty list [ ].
Monitoring progress
The active tasks API was enhanced to report additional information for replication tasks. Example:
$ curl http://username.cloudant.com/_active_tasks
[
{
"pid": "<0.1303.0>",
"replication_id": "e42a443f5d08375c8c7a1c3af60518fb+create_target",
"checkpointed_source_seq": 17333,
"continuous": false,
"doc_write_failures": 0,
"docs_read": 17833,
"docs_written": 17833,
"missing_revisions_found": 17833,
"progress": 3,
"revisions_checked": 17833,
"source": "http://username.cloudant.com/db/",
"source_seq": 551202,
"started_on": 1316229471,
"target": "test_db",
"type": "replication",
"updated_on": 1316230082
}
]
133
data in the database, no amount of duplication will prevent that. For the first scenario, you need a cluster that spans
multiple geographic locations, which we offer to customers on our dedicated pricing plan, or you can replicate
your data to a cluster (dedicated or multi-tenant) in a different geographic location. The second scenario is what
this guide is about. In the case of a faulty application, you need a backup that preserves the state of the database
at certain points in time.
4.6.3 An example
Lets say you have one database to back up, and you want to create a full backup on Monday and an incremental
one on Tuesday. You can use curl and jq to do this, but of course any other http client will work.
You save your base URL and the content type in a variable, so that you dont have to enter it again and again for
each request.
$ url=https://<username>:<password>@<username>.cloudant.com
$ ct=Content-Type: application-json
You create three databases, one original and two for backups.
$ curl -X PUT "${url}/original"
$ curl -X PUT "${url}/backup-monday"
$ curl -X PUT "${url}/backup-tuesday"
On Monday, you backup your data for the first time, so you replicate everything from original to
backup-monday.
134
Chapter 4. Guides
On Tuesday, things get more complicated. You first need to get the ID of the checkpoint document.
$ repl_id=$(curl "${url}/_replicator/backup-monday" | jq -r ._replication_id)
Once you have that, you use it to get the recorded_seq value.
$ recorded_seq=$(curl "${url}/original/_local/${repl_id}" | jq -r .history[0].recorded_seq)
And with the recorded_seq you can start the incremental backup for Tuesday.
$ curl -X PUT "${url}/_replicator/backup-tuesday" -H "${ct}" -d @- <<END
{
"_id": "backup-tuesday",
"source": "${url}/original",
"target": "${url}/backup-tuesday",
"since_seq": "${recorded_seq}"
}
END
To restore from the backup, you replicate the initial full backup and any number of incremental backups to a new
database.
If you want to restore mondays state, just replicate from the backup-monday database:
$ curl -X PUT "${url}/_replicator/restore-monday" -H "$ct" -d @- <<END
{
"_id": "restore-monday",
"source": "${url}/backup-monday",
"target": "${url}/restore",
"create-target": true
}
END
If you want to restore tuesdays state, first replicate from backup-tuesday and then from backup-monday.
Using this order, documents that were updated on tuesday will only have to be written to the target database once.
$ curl -X PUT "${url}/_replicator/restore-tuesday" -H "$ct" -d @- <<END
{
"_id": "restore-tuesday",
"source": "${url}/backup-tuesday",
"target": "${url}/restore",
"create-target": true
}
END
$ curl -X PUT "${url}/_replicator/restore-monday" -H "$ct" -d @- <<END
{
"_id": "restore-monday",
"source": "${url}/backup-monday",
"target": "${url}/restore"
}
END
135
Design documents
If you back up design documents, indexes will be created on the backup destination. This slows down the backup
process and unnecessarily takes up disk space. So if you dont need indexes on the backup system, use a filter
function in all replications that filters out design documents. This can also be a good place to filter out other
documents that arent needed anymore.
Backing up many databases
If your application uses one database per user or allows each user to create several databases, backup jobs will
need to be created for each new database. Make sure that the replication jobs dont all start at the same time.
Chapter 4. Guides
jq lets you filter a list of documents by their field values, which makes it easy to get all replication documents or
just one particular view indexing task you are interested in. Have a look at the detailed manual to find out more!
To estimate the time needed until the indexing task is complete, you can monitor the number of changes_done and
compare this value to total_changes. For instance, if changes_done increases by 250 per second and total_changes
is 1,000,000, the task will take about 66 minutes to complete. However, this is only an estimate. How long the
process will really take depends on:
The time it takes to process each document. For instance, a view might check the type of a document first
and only emit new index entries for one type.
The size of the documents
The current workload on the cluster
These factors combined can lead to your estimate being off by as much as 100%
You can extract the changes_done field using jq like this:
curl ... | jq .[] | select(.type=="search_indexer") | .changes_done
137
We recommend that you start a replication process by creating a document in the _replicator database and setting
its _id field. That makes it easier to select the information about this process from the active tasks:
curl ... | jq .[] | select(.doc_id==ID)
Is it stuck?
So what can you do with all this information? In the case on a one-off (i.e. non-continuous) replication
where the source database isnt updated a lot during the replication, the changes_pending value tells you
how many documents are still to be processed and is a good indicator of when the replication will be finished. In the case of a continuous replication, you will be more interested in how the number of documents
processed changes over time and whether changes_pending increases. If changes_pending increases and
revisions_checked stays constant for a while, the replication is probably stalled. If changes_pending
increases, but revisions_checked also increases, this might indicate that the replication cant keep up with
the volume of data added to or updated in the database.
What to do?
To resolve a stalled replication, it is sometimes necessary to cancel the replication process and start it again. If
that does not help, the replication might be stalled because the user accessing the source or target database does
not have write permissions. Note that replication makes use of checkpoints so that it doesnt have to repeat work
if it is triggered again. However, that means you need write permission on both the target and the source. If you
created the replication process by creating a document in the _replicator database, you can also check the status
of the replication there.
138
Chapter 4. Guides
item and account, then, are IDs for other objects in your database. To calculate a running total for an account,
we would use a view like this:
{
views: {
totals: {
map: function(doc){
if(doc.type === purchase){
emit(doc.account, doc.quantity * doc.unit_price);
}else{
if(doc.type === payment){
emit(doc.account, -doc.value);
}
}
139
},
reduce: _sum
}
}
}
Voila! Now calling this view with the group=true&key={account} options will give us a running balance
for a particular account. If you need to roll back a purchase or payment, just insert a document with values to
balance out the interaction you want to negate.
This practice of logging events, and aggregating them to determine an objects state, is called event sourcing. Used
well, it provides SQL-like transactional atomicity even in a NoSQL database like Cloudant.
This way, when the user purchases everything in their cart, you can use _uuids to generate a shared transaction_id that allows you to retrieve them as a group later. For that, we might use a view like this:
{
views: {
transactions: {
map: function(doc){
if(doc.type === purchase){
emit(doc.transaction_id, null);
}
}
}
}
}
140
Chapter 4. Guides
To map that into another database as a series of transaction events, try this:
{
views: {
events: {
map: function(doc){
for(var i in doc.transaction_history){
var transaction = doc.transaction_history[i];
emit({
from: doc.account_id,
to: transaction.destination_account,
transaction_id: transaction.transaction_id,
date: transaction.date
}, transaction.change);
}
},
dbcopy: events
}
}
}
This will output the results of the map function into the events database, filling it with documents like this:
{
key: {
from: ...,
to: ...,
transaction_id: ...,
date: ...
},
value: 100
}
And lo, from barren earth we have made a garden. Nifty, eh?
141
4.9.4 Summary
Although Cloudants eventual consistency model makes satisfying ACIDs consistency requirement difficult, you
can satisfy the rest of the requirements through how you structure your data. For event sourcing, regard these
guidelines:
The atomic unit is the document. The database should never find itself in an inconsistent state because a
document failed to write.
Use secondary indexes, not documents, to reflect overall application state.
If youve got unruly data, use dbcopy to map it into a friendly way and output it to another database.
If you have any trouble with any of this, post your question on StackOverflow, hit us up on IRC, or if youd like
to speak more privately, send us a note at support@cloudant.com
142
Chapter 4. Guides