Raste Rio
Raste Rio
Raste Rio
Release 1.4dev
Sean Gillies
1 Introduction 3
1.1 Philosophy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Rasterio license . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Installation 5
2.1 Easy installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Advanced installation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Python Quickstart 7
3.1 Opening a dataset in reading mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Dataset attributes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.3 Dataset georeferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.4 Reading raster data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.5 Spatial indexing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.6 Creating data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.7 Opening a dataset in writing mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.8 Saving raster data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
i
5 Advanced Topics 27
5.1 Using rio-calc . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
5.2 Color . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
5.3 Concurrent processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.4 GDAL Option Configuration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.5 Advanced Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.6 Error Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.7 Vector Features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5.8 Filling nodata areas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.9 Georeferencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5.10 Options . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.11 Interoperability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
5.12 Masking a raster using a shapefile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.13 Nodata Masks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.14 In-Memory Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.15 Migrating to Rasterio 1.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.16 Overviews . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.17 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.18 Profiles and Writing Files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.19 Reading Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.20 Reprojection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.21 Resampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
5.22 Switching from GDAL’s Python bindings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.23 Tagging datasets and bands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
5.24 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
5.25 Virtual Warping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.26 Virtual Filesystems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.27 Windowed reading and writing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.28 Writing Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7 Contributing 235
7.1 Code of Conduct . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.2 Rights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.3 Issue Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
7.4 Design Principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.5 Dataset Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.6 Path Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.7 Band Objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.8 GDAL Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
7.9 Git Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.10 Conventions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.11 New Containerized Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.12 Historical Development Environment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
7.13 Additional Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
ii
Python Module Index 245
Index 247
iii
iv
rasterio Documentation, Release 1.4dev
Geographic information systems use GeoTIFF and other formats to organize and store gridded raster datasets such as
satellite imagery and terrain models. Rasterio reads and writes these formats and provides a Python API based on
Numpy N-dimensional arrays and GeoJSON.
Here’s an example program that extracts the GeoJSON shapes of a raster’s valid data footprint.
import rasterio
import rasterio.features
import rasterio.warp
CONTENTS 1
rasterio Documentation, Release 1.4dev
2 CONTENTS
CHAPTER
ONE
INTRODUCTION
1.1 Philosophy
Before Rasterio there was one Python option for accessing the many different kind of raster data files used in the GIS
field: the Python bindings distributed with the Geospatial Data Abstraction Library, GDAL. These bindings extend
Python, but provide little abstraction for GDAL’s C API. This means that Python programs using them tend to read
and run like C programs. For example, GDAL’s Python bindings require users to watch out for dangling C pointers,
potential crashers of programs. This is bad: among other considerations we’ve chosen Python instead of C to avoid
problems with pointers.
What would it be like to have a geospatial data abstraction in the Python standard library? One that used modern Python
language features and idioms? One that freed users from concern about dangling pointers and other C programming
pitfalls? Rasterio’s goal is to be this kind of raster data library – expressing GDAL’s data model using fewer non-
idiomatic extension classes and more idiomatic Python types and protocols, while performing as fast as GDAL’s Python
bindings.
High performance, lower cognitive load, cleaner and more transparent code. This is what Rasterio is about.
3
rasterio Documentation, Release 1.4dev
4 Chapter 1. Introduction
CHAPTER
TWO
INSTALLATION
Installation of the Rasterio package is complicated by its dependency on libgdal and other C libraries. There are easy
installations paths and an advanced installation path.
Rasterio has several extension modules which link against libgdal. This complicates installation. Binary distributions
(wheels) containing libgdal and its own dependencies are available from the Python Package Index and can be installed
using pip.
pip install rasterio
These wheels are mainly intended to make installation easy for simple applications, not so much for production. They
are not tested for compatibility with all other binary wheels, conda packages, or QGIS, and omit many of GDAL’s
optional format drivers.
Many users find Anaconda and conda-forge a good way to install Rasterio and get access to more optional format
drivers (like TileDB and others).
Rasterio 1.4 requires Python 3.9 or higher and GDAL 3.3 or higher.
Once GDAL and its dependencies are installed on your computer (how to do this is documented at https://gdal.org)
Rasterio can be built and installed using setuptools or pip. If your GDAL installation provides the gdal-config
program, the process is simpler.
Without pip:
GDAL_CONFIG=/path/to/gdal-config python setup.py install
These are pretty much equivalent. Pip will use setuptools as the build backend. If the gdal-config program is on your
executable path, then you don’t need to set the environment variable.
Without gdal-config you will need to configure header and library locations for the build in another way. One way to
do this is to create a setup.cfg file in the source directory with content like this:
5
rasterio Documentation, Release 1.4dev
[build_ext]
include_dirs = C:/vcpkg/installed/x64-windows/include
libraries = gdal
library_dirs = C:/vcpkg/installed/x64-windows/lib
This is the approach taken by Rasterio’s wheel-building workflow. With this file in place you can run either python
setup.py install or python -m pip install --user ..
You can also pass those three values on the command line following the setuptools documentation. However, the
setup.cfg approach is easier.
6 Chapter 2. Installation
CHAPTER
THREE
PYTHON QUICKSTART
Reading and writing data files is a spatial data programmer’s bread and butter. This document explains how to use
Rasterio to read existing files and to create new files. Some advanced topics are glossed over to be covered in more
detail elsewhere in Rasterio’s documentation. Only the GeoTIFF format is used here, but the examples do apply to
other raster data formats. It is presumed that Rasterio has been installed.
Consider a GeoTIFF file named example.tif with 16-bit Landsat 8 imagery covering a part of the United States’s
Colorado Plateau1 . Because the imagery is large (70 MB) and has a wide dynamic range it is difficult to display it in a
browser. A rescaled and dynamically squashed version is shown below.
1 “example.tif” is an alias for band 4 of Landsat scene LC80370342016194LGN00.
7
rasterio Documentation, Release 1.4dev
Rasterio’s open() function takes a path string or path-like object and returns an opened dataset object. The path may
point to a file of any supported raster format. Rasterio will open it using the proper GDAL format driver. Dataset
objects have some of the same attributes as Python file objects.
>>> dataset.name
(continues on next page)
Properties of the raster data stored in the example GeoTIFF can be accessed through attributes of the opened dataset
object. Dataset objects have bands and this example has a band count of 1.
>>> dataset.count
1
A dataset band is an array of values representing the partial distribution of a single variable in 2-dimensional (2D)
space. All band arrays of a dataset have the same number of rows and columns. The variable represented by the
example dataset’s sole band is Level-1 digital numbers (DN) for the Landsat 8 Operational Land Imager (OLI) band 4
(wavelengths between 640-670 nanometers). These values can be scaled to radiance or reflectance values. The array
of DN values is 7731 columns wide and 7871 rows high.
>>> dataset.width
7731
>>> dataset.height
7871
Some dataset attributes expose the properties of all dataset bands via a tuple of values, one per band. To get a map-
ping of band indexes to variable data types, apply a dictionary comprehension to the zip() product of a dataset’s
DatasetReader.indexes and DatasetReader.dtypes attributes.
The example file’s sole band contains unsigned 16-bit integer values. The GeoTIFF format also supports signed integers
and floats of different size.
A GIS raster dataset is different from an ordinary image; its elements (or “pixels”) are mapped to regions on the earth’s
surface. Every pixels of a dataset is contained within a spatial bounding box.
>>> dataset.bounds
BoundingBox(left=358485.0, bottom=4028985.0, right=590415.0, top=4265115.0)
Our example covers the world from 358485 meters (in this case) to 590415 meters, left to right, and 4028985 meters
to 4265115 meters bottom to top. It covers a region 231.93 kilometers wide by 236.13 kilometers high.
The value of DatasetReader.bounds attribute is derived from a more fundamental attribute: the dataset’s geospatial
transform.
>>> dataset.transform
Affine(30.0, 0.0, 358485.0,
0.0, -30.0, 4265115.0)
A dataset’s DatasetReader.transform is an affine transformation matrix that maps pixel locations in (col, row)
coordinates to (x, y) spatial positions. The product of this matrix and (0, 0), the column and row coordinates of the
upper left corner of the dataset, is the spatial position of the upper left corner.
But what do these numbers mean? 4028985 meters from where? These coordinate values are relative to the origin of
the dataset’s coordinate reference system (CRS).
>>> dataset.crs
CRS.from_epsg(32612)
EPSG:32612 identifies a particular coordinate reference system: UTM zone 12N. This system is used for mapping areas
in the Northern Hemisphere between 108 and 114 degrees west. The upper left corner of the example dataset, (358485.
0, 4265115.0), is 141.5 kilometers west of zone 12’s central meridian (111 degrees west) and 4265 kilometers north
of the equator.
Between the DatasetReader.crs and DatasetReader.transform attributes, the georeferencing of a raster dataset
is described and the dataset can compared to other GIS datasets.
Data from a raster band can be accessed by the band’s index number. Following the GDAL convention, bands are
indexed from 1.
>>> dataset.indexes
(1,)
>>> band1 = dataset.read(1)
>>> band1
array([[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
...,
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0],
[0, 0, 0, ..., 0, 0, 0]], dtype=uint16)
Values from the array can be addressed by their row, column index.
Datasets have an DatasetReader.index() method for getting the array indices corresponding to points in georef-
erenced space. To get the value for the pixel 100 kilometers east and 50 kilometers south of the dataset’s upper left
corner, do the following.
To get the spatial coordinates of a pixel, use the dataset’s DatasetReader.xy() method. The coordinates of the center
of the image can be computed like this.
Reading data is only half the story. Using Rasterio dataset objects, arrays of values can be written to a raster data file
and thus shared with other GIS applications such as QGIS.
As an example, consider an array of floating point values representing, e.g., a temperature or pressure anomaly field
measured or modeled on a regular grid, 240 columns by 180 rows. The first and last grid points on the horizontal
axis are located at 4.0 degrees west and 4.0 degrees east longitude, the first and last grid points on the vertical axis are
located at 3 degrees south and 3 degrees north latitude.
The fictional field for this example consists of the difference of two Gaussian distributions and is represented by the
array Z. Its contours are shown below.
To save this array along with georeferencing information to a new raster data file, call rasterio.open() with a path
to the new file to be created, 'w' to specify writing mode, and several keyword arguments.
• driver: the name of the desired format driver
• width: the number of columns of the dataset
• height: the number of rows of the dataset
• count: a count of the dataset bands
• dtype: the data type of the dataset
• crs: a coordinate reference system identifier or description
• transform: an affine transformation matrix, and
• nodata: a “nodata” value
The first 5 of these keyword arguments parametrize fixed, format-specific properties of the data file and are required
when opening a file to write. The last 3 are optional.
In this example the coordinate reference system will be '+proj=latlong', which describes an equirectangular coor-
dinate reference system with units of decimal degrees. The proper affine transformation matrix can be computed from
the matrix product of a translation and a scaling.
>>> transform
Affine(0.033333333333333333, 0.0, -4.0166666666666666,
0.0, 0.033333333333333333, -3.0166666666666666)
The upper left point in the example grid is at 3 degrees west and 2 degrees north. The raster pixel centered on this grid
point extends res / 2, or 1/60 degrees, in each direction, hence the shift in the expression above.
A dataset for storing the example grid is opened like so
Values for the height, width, and dtype keyword arguments are taken directly from attributes of the 2-D array, Z. Not
all raster formats can support the 64-bit float values in Z, but the GeoTIFF format can.
To copy the grid to the opened dataset, call the new dataset’s DatasetWriter.write() method with the grid and
target band number as arguments.
>>> new_dataset.write(Z, 1)
Then call the DatasetWriter.close() method to sync data to disk and finish.
>>> new_dataset.close()
Because Rasterio’s dataset objects mimic Python’s file objects and implement Python’s context manager protocol, it is
possible to do the following instead.
with rasterio.open(
'/tmp/new.tif',
'w',
driver='GTiff',
height=Z.shape[0],
width=Z.shape[1],
count=1,
dtype=Z.dtype,
crs='+proj=latlong',
transform=transform,
(continues on next page)
These are the basics of reading and writing raster data files. More features and examples are contained in the advanced
topics section.
FOUR
$ rio --help ␣
˓→
Options:
-v, --verbose Increase verbosity.
-q, --quiet Decrease verbosity.
--aws-profile TEXT Select a profile from the AWS credentials file
--aws-no-sign-requests Make requests anonymously
--aws-requester-pays Requester pays data transfer costs
--version Show the version and exit.
--gdal-version
--help Show this message and exit.
Commands:
blocks Write dataset blocks as GeoJSON features.
bounds Write bounding boxes to stdout as GeoJSON.
calc Raster data calculator.
clip Clip a raster to given bounds.
convert Copy and convert raster dataset.
edit-info Edit dataset metadata.
env Print information about the Rasterio environment.
gcps Print ground control points as GeoJSON.
info Print information about a data file.
insp Open a data file and start an interpreter.
mask Mask in raster using features.
merge Merge a stack of raster datasets.
overview Construct overviews in an existing dataset.
(continues on next page)
1 In some Linux distributions “rio” may instead refer to the command line Diamond Rio MP3 player controller. This conflict can be avoided by
15
rasterio Documentation, Release 1.4dev
Commands are shown below. See --help of individual commands for more details.
For commands that create new datasets, format specific creation options may also be passed using --co. For example,
to tile a new GeoTIFF output file, add the following.
--co compress=LZW
4.2 blocks
This command prints features describing a raster’s internal blocks, which are used directly for raster I/O. These features
can be used to visualize how a windowed operation would operate using those blocks.
Output features have two JSON encoded properties: block and window. Block is a two element array like [0, 0]
describing the window’s position in the input band’s window layout. Window is a JSON serialization of rasterio’s
Window class like {"col_off": 0, "height": 3, "row_off": 705, "width": 791}.
Block windows are extracted from the dataset (all bands must have matching block windows) by default, or from the
band specified using the --bidx option:
By default a GeoJSON FeatureCollection is written. With the --sequence option a GeoJSON feature stream is written
instead.
Output features are reprojected to OGC:CRS84 (WGS 84) unless the --projected flag is provided, which causes the
output to be kept in the input datasource’s coordinate reference system.
For more information on exactly what blocks and windows represent, see rasterio._base.DatasetBase.
block_windows().
4.3 bounds
The bounds command writes the bounding boxes of raster datasets to GeoJSON for use with, e.g., geojsonio-cli.
Shoot the GeoJSON into a Leaflet map using geojsonio-cli by typing rio bounds tests/data/RGB.byte.tif |
geojsonio.
4.3. bounds 17
rasterio Documentation, Release 1.4dev
4.4 calc
The calc command reads files as arrays, evaluates lisp-like expressions in their context, and writes the result as a new
file. Members of the numpy module and arithmetic and logical operators are available builtin functions and operators. It
is intended for simple calculations; any calculations requiring multiple steps is better done in Python using the Rasterio
and Numpy APIs.
Input files may have different numbers of bands but should have the same number of rows and columns. The output
file will have the same number of rows and columns as the inputs and one band per element of the expression result.
An expression involving arithmetic operations on N-D arrays will produce a N-D array and result in an N-band output
file.
The following produces a 3-band GeoTIFF with all values scaled by 0.95 and incremented by 2. In the expression,
(read 1) evaluates to the first input dataset (3 bands) as a 3-D array.
The following produces a 3-band GeoTIFF in which the first band is copied from the first band of the input and the next
two bands are scaled (down) by the ratio of the first band’s mean to their own means. The --name option is used to
bind datasets to a name within the expression. (take a 1) gets the first band of the dataset named a as a 2-D array
and (asarray ...) collects a sequence of 2-D arrays into a 3-D array for output.
$ rio calc "(asarray (take a 1) (* (take a 2) (/ (mean (take a 1)) (mean (take a 2))))␣
˓→(* (take a 3) (/ (mean (take a 1)) (mean (take a 3)))))" \
The command above is also an example of a calculation that is far beyond the design of the calc command and something
that could be done much more efficiently in Python.
4.5 clip
The clip command clips a raster using bounds input directly or from a template raster.
If using --bounds, values must be in coordinate reference system of input. If using --like, bounds will automatically
be transformed to match the coordinate reference system of the input.
It can also be combined to read bounds of a feature dataset using Fiona:
4.6 convert
The convert command copies and converts raster datasets to other data types and formats (similar to
gdal_translate).
Data values may be linearly scaled when copying by using the --scale-ratio and --scale-offset options. Des-
tination raster values are calculated as
For example, to scale uint16 data with an actual range of 0-4095 to 0-255 as uint8:
4.7 edit-info
The edit-info command allows you edit a raster dataset’s metadata, namely
• coordinate reference system
• affine transformation matrix
• nodata value
• tags
• color interpretation
A TIFF created by spatially-unaware image processing software like Photoshop or Imagemagick can be turned into a
GeoTIFF by editing these metadata items.
For example, you can set or change a dataset’s coordinate reference system to Web Mercator (EPSG:3857),
$ rio edit-info --transform "[300.0, 0.0, 101985.0, 0.0, -300.0, 2826915.0]" example.tif
See rasterio.enums.ColorInterp for a full list of supported color interpretations and the color docs for more
information.
4.7. edit-info 19
rasterio Documentation, Release 1.4dev
4.8 info
More information, such as band statistics, can be had using the --verbose option.
4.9 insp
In [2]: print(src.bounds)
BoundingBox(left=101985.0, bottom=2611485.0, right=339315.0, top=2826915.0)
4.9. insp 21
rasterio Documentation, Release 1.4dev
4.10 mask
The mask command masks in pixels from all bands of a raster using features (masking out all areas not covered by
features) and optionally crops the output raster to the extent of the features. Features are assumed to be in the same
coordinate reference system as the input raster.
A common use case is masking in raster data by political or other boundaries.
GeoJSON features may be provided using stdin or specified directly as first argument, and output can be cropped to the
extent of the features.
The feature mask can be inverted to mask out pixels covered by features and keep pixels not covered by features.
4.11 merge
The merge command can be used to flatten a stack of identically structured datasets.
4.12 overview
The overview command creates overviews stored in the dataset, which can improve performance in some applications.
The decimation levels at which to build overviews can be specified as a comma separated list
Note that overviews can not currently be removed and are not automatically updated when the dataset’s primary bands
are modified.
Information about existing overviews can be printed using the –ls option.
The block size (tile width and height) used for overviews (internal or external) can be specified by setting the
GDAL_TIFF_OVR_BLOCKSIZE environment variable to a power-of-two value between 64 and 4096. The default value
is 128.
4.13 rasterize
The rasterize command rasterizes GeoJSON features into a new or existing raster.
The resulting file will have an upper left coordinate determined by the bounds of the GeoJSON (in EPSG:4326, which
is the default), with a pixel size of approximately 30 arc seconds. Pixels whose center is within the polygon or that are
selected by Bresenham’s line algorithm will be burned in with a default value of 1.
It is possible to rasterize into an existing raster and use an alternative default value:
It is also possible to rasterize using a template raster, which will be used to determine the transform, dimensions, and
coordinate reference system of the output raster:
GeoJSON features may be provided using stdin or specified directly as first argument, and dimensions may be provided
in place of pixel resolution:
4.14 rm
Invoking the shell’s $ rm <path> on a dataset can be used to delete a dataset referenced by a file path, but it won’t
handle deleting side car files. This command is aware of datasets and their sidecar files.
4.15 sample
The sample command reads x, y positions from stdin and writes the dataset values at that position to stdout.
The output of the transform command (see below) makes good input for sample.
4.13. rasterize 23
rasterio Documentation, Release 1.4dev
4.16 shapes
The shapes command extracts and writes features of a specified dataset band out as GeoJSON.
4.17 stack
The stack command stacks a number of bands from one or more input files into a multiband dataset. Input datasets
must be of a kind: same data type, dimensions, etc. The output is cloned from the first input. By default, stack will
take all bands from each input and write them in same order to the output. Optionally, bands for each input may be
specified using the following syntax:
• --bidx N takes the Nth band from the input (first band is 1).
• --bidx M,N,O takes bands M, N, and O.
• --bidx M..O takes bands M-O, inclusive.
• --bidx ..N takes all bands up to and including N.
• --bidx N.. takes all bands from N to the end.
Examples using the Rasterio testing dataset that produce a copy of it.
4.18 transform
The transform command reads a JSON array of coordinates, interleaved, and writes another array of transformed
coordinates to stdout.
To transform a longitude, latitude point (EPSG:4326 is the default) to another coordinate system with 2 decimal places
of output precision, do the following.
To transform a longitude, latitude bounding box to the coordinate system of a raster dataset, do the following.
4.19 warp
The warp command warps (reprojects) a raster based on parameters that can be obtained from a template raster, or
input directly. The output is always overwritten.
To copy coordinate reference system, transform, and dimensions from a template raster, do the following:
You can specify an output coordinate system using a PROJ.4 or EPSG:nnnn string, or a JSON text-encoded PROJ.4
object:
You can also specify dimensions, which will automatically calculate appropriate resolution based on the relationship
between the bounds in the target crs and these dimensions:
$ rio warp input.tif output.tif --dst-crs EPSG:4326 --bounds -78 22 -76 24 --res 0.1
$ rio warp input.tif output.tif --dst-crs EPSG:4326 --bounds -78 22 -76 24 --res 0.1 -- -
˓→0.1
Rio uses click-plugins to provide the ability to create additional subcommands using plugins developed outside
rasterio. This is ideal for commands that require additional dependencies beyond those used by rasterio, or that provide
functionality beyond the intended scope of rasterio.
For example, rio-mbtiles provides a command rio mbtiles to export a raster to an MBTiles file.
See click-plugins for more information on how to build these plugins in general.
To use these plugins with rio, add the commands to the rasterio.rio_plugins entry point in your setup.py file,
as described here and in rasterio/rio/main.py.
See the plugin registry for a list of available plugins.
4.19. warp 25
rasterio Documentation, Release 1.4dev
FIVE
ADVANCED TOPICS
Simple raster data processing on the command line is possible using Rasterio’s rio-calc command. It uses the snuggs
Numpy S-expression engine. The snuggs README explains how expressions are written and evaluated in general.
This document explains Rasterio-specific details of rio-calc and offers some examples.
5.1.1 Expressions
where func may be the name of any function in the module numpy or one of the rio-calc builtins: read, fillnodata,
or sieve; and operator may be any of the standard Python arithmetic or logical operators. The arguments may
themselves be expressions.
Here’s a trivial example of copying a dataset. The expression (read 1) evaluates to all bands of the first input dataset,
an array with shape (3, 718, 791) in this case.
Note: rio-calc’s indexes start at 1.
The expression (read i j) evaluates to the j-th band of the i-th input dataset. The asarray function collects bands
read in reverse order into an array with shape (3, 718, 791) for output.
27
rasterio Documentation, Release 1.4dev
Bands can be read from multiple input files. This example is another (slower) way to copy a file.
Datasets can be referenced in expressions by name and single bands picked out using the take function.
The functions read and take overlap a bit in the previous examples but are rather different. The former involves I/O
and the latter does not. You may also take from any array, as in this example.
Arithmetic operations can be performed as with Numpy. Here is an example of scaling all three bands of a dataset by
the same factors.
$ rio calc "(asarray (+ 2 (* 0.95 (read 1 1))) (+ 3 (* 0.9 (read 1 2))) (+ 4 (* 0.85␣
˓→(read 1 3))))" tests/data/RGB.byte.tif out.tif
Logical operations can be used in conjunction with arithemtic operations. In this example, the output values are 255
wherever the input values are greater than or equal to 40.
5.2 Color
GDAL builds the color interpretation based on the driver and creation options. With the GTiff driver, rasters with
exactly 3 bands of uint8 type will be RGB, 4 bands of uint8 will be RGBA by default.
Color interpretation can be set when creating a new datasource with the photometric creation option:
Mappings from 8-bit (rasterio.uint8) pixel values to RGBA values can be attached to bands using the
write_colormap() method.
import rasterio
with rasterio.Env():
5.2. Color 29
rasterio Documentation, Release 1.4dev
subprocess.call(['open', '/tmp/colormap.tif'])
The program above (on OS X, another viewer is needed with a different OS) yields the image below:
As shown above, the colormap() returns a dict holding the colormap for the given band index. For TIFF format files,
the colormap will have 256 items, and all but two of those would map to (0, 0, 0, 0) in the example above.
Rasterio affords concurrent processing of raster data. Python’s global interpreter lock (GIL) is released when calling
GDAL’s GDALRasterIO() function, which means that Python threads can read and write concurrently.
The Numpy library also often releases the GIL, e.g., in applying universal functions to arrays, and this makes it possible
to distribute processing of an array across cores of a processor.
This means that it is possible to parallelize tasks that need to be performed for a set of windows/pixels in the raster.
Reading, writing and processing can always be done concurrently. But it depends on the hardware and where the
bottlenecks are, how much of a speedup can be obtained. In the case that the processing function releases the GIL,
multiple threads processing simultaneously can lead to further speedups.
Note: If you wish to do multiprocessing that is not trivially parallelizable accross very large images that do not fit in
memory, or if you wish to do multiprocessing across multiple machines. You might want to have a look at dask and in
particular this example.
The Cython function below, included in Rasterio’s _example module, simulates a GIL-releasing CPU-intensive raster
processing function. You can also easily create GIL-releasing functions by using numba
# cython: boundscheck=False
import numpy as np
Here is the program in examples/thread_pool_executor.py. It is set up in such a way that at most 1 thread is reading and
at most 1 thread is writing at the same time. Processing is not protected by a lock and can be done by multiple threads
simultaneously.
"""thread_pool_executor.py
import concurrent.futures
import multiprocessing
import threading
import rasterio
from rasterio._example import compute
The output is the same as the input, but with band order
reversed.
"""
def process(window):
with read_lock:
(continues on next page)
with write_lock:
dst.write(result, window=window)
The code above simulates a CPU-intensive calculation that runs faster when spread over multiple cores using
concurrent.futures.ThreadPoolExecutor compared to the case of one concurrent job (-j 1),
real 0m4.277s
user 0m4.356s
sys 0m0.184s
real 0m1.251s
user 0m4.402s
sys 0m0.168s
If the function that you’d like to map over raster windows doesn’t release the GIL, you unfortunately cannot simply re-
place ThreadPoolExecutor with ProcessPoolExecutor, the DatasetReader/DatasetWriter cannot be shared
by multiple processes, which means that each process needs to open the file seperately, or you can do all the reading
and writing from the main thread, as shown in this next example. This is much less efficient memory wise, however.
with concurrent.futures.ProcessPoolExecutor(
max_workers=num_workers
) as executor:
futures = executor.map(compute, arrays)
for window, result in zip(windows, futures):
dst.write(result, window=window)
GDAL format drivers and some parts of the library are configurable.
From https://trac.osgeo.org/gdal/wiki/ConfigOptions:
ConfigOptions are normally used to alter the default behavior of GDAL and OGR drivers and in some
cases the GDAL and OGR core. They are essentially global variables the user can set.
gdal.SetConfigOption('GTIFF_FORCE_RGBA', 'YES')
ds = gdal.Open('data/stefan_full_greyalpha.tif')
gdal.SetConfigOption('GTIFF_FORCE_RGBA', None)
With GDAL’s C or Python API, you call a function once to set a global configuration option before you need it and
once again after you’re through to unset it.
Downsides of this style of configuration include:
• Options can be configured far from the code they affect.
• There is no API for finding what options are currently set.
• If gdal.Open() raises an exception in the code above, the GTIFF_FORCE_RGBA option will not be unset.
That code example can be generalized to multiple options and made to recover better from errors.
This is better, but has a lot of boilerplate. Rasterio uses elements of Python syntax, keyword arguments and the with
statement, to make this cleaner and easier to use.
5.4.2 Rasterio
The object returned when you call rasterio.Env is a context manager. It handles the GDAL configuration for a
specific block of code and resets the configuration when the block exits for any reason, success or failure. The Rasterio
with rasterio.Env() pattern organizes GDAL configuration into single statements and makes its relationship to a
block of code clear.
If you want to know what options are configured at any time, you could bind it to a name like so.
# Prints:
# ('GTIFF_FORCE_RGBA', True)
# ('CPL_DEBUG', True)
Rasterio code is often without the use of an Env context block. For instance, you could use rasterio.open() directly
without explicity creating an Env. In that case, the open() function will initialize a default environment in which to
execute the code. Often this default environment is sufficient for most use cases and you only need to create an explicit
Env if you are customizing the default GDAL or format options.
The analogy of Python file objects influences the design of Rasterio dataset objects. Datasets of a few different kinds
exist and the canonical way to obtain one is to call rasterio.open() with a path-like object or URI-like identifier, a
mode (such as “r” or “w”), and other keyword arguments.
Datasets in a computer’s filesystem are identified by paths, “file” URLs, or instances of pathlib.Path. The following
are equivalent.
• '/path/to/file.tif'
• 'file:///path/to/file.tif'
• pathlib.Path('/path/to/file.tif')
Datasets within a local zip file are identified using the “zip” scheme from Apache Commons VFS.
• 'zip:///path/to/file.zip!/folder/file.tif'
• 'zip+file:///path/to/file.zip!/folder/file.tif'
Note that ! is the separator between the path of the archive file and the path within the archive file. Also note that his
kind of identifier can’t be expressed using pathlib.
Similarly, variables of a netCDF dataset can be accessed using “netcdf” scheme identifiers.
'netcdf:/path/to/file.nc:variable'
Datasets on the web are identifed by “http” or “https” URLs such as
• 'https://example.com/file.tif'
• 'https://landsat-pds.s3.amazonaws.com/L8/139/045/LC81390452014295LGN00/
LC81390452014295LGN00_B1.TIF'
Datasets within a zip file on the web are identified using a “zip+https” scheme and paths separated by ! as above. For
example:
'zip+https://example.com/file.tif&p=x&q=y!/folder/file.tif'
Todo: error enums, context managers, converting GDAL errors to python exceptions
Note: If setting the PROJ_DEBUG environment variable inside a Python script, make sure that it is set
before importing rasterio.
import os
os.environ["PROJ_DEBUG"] = "2"
import rasterio
with rasterio.Env(CPL_DEBUG=True):
...
import logging
console_handler = logging.StreamHandler()
formatter = logging.Formatter("%(levelname)s:%(message)s")
console_handler.setFormatter(formatter)
logger = logging.getLogger("rasterio")
logger.addHandler(console_handler)
logger.setLevel(logging.DEBUG)
import logging
logging.basicConfig(format="%(levelname)s:%(message)s", level=logging.DEBUG)
Rasterio’s features module provides functions to extract shapes of raster features and to create new features by “burn-
ing” shapes into rasters: shapes() and rasterize(). These functions expose GDAL functions in a general way, using
iterators over GeoJSON-like Python objects instead of GIS layers.
import pprint
import rasterio
from rasterio import features
# pprint requires that the image dtype must be one of: int16, int32, uint8, uint16,␣
˓→float32.
# If your data comes as int8 you can cast your data to an appropriate dtype like this:
# data = data.astype('int16')
# Output
# pprint.pprint(next(shapes))
# ({'coordinates': [[(71.0, 6.0),
# (71.0, 7.0),
# (72.0, 7.0),
# (72.0, 6.0),
# (71.0, 6.0)]],
# 'type': 'Polygon'},
# 253)
The shapes iterator yields geometry, value pairs. The second item is the value of the raster feature corresponding
to the shape and the first is its geometry. The coordinates of the geometries in this case are in pixel units with origin at
the upper left of the image. If the source dataset was georeferenced, you would get similarly georeferenced geometries
like this:
To go the other direction, use rasterize() to burn values into the pixels intersecting with geometries.
image = features.rasterize(
((g, 255) for g, v in shapes),
out_shape=src.shape)
By default, only pixels whose center is within the polygon or that are selected by Bresenham’s line algorithm will be
burned in. You can specify all_touched=True to burn in all pixels touched by the geometry. The geometries will
be rasterized by the “painter’s algorithm” - geometries are handled in order and later geometries will overwrite earlier
values.
Again, to burn in georeferenced shapes, pass an appropriate transform for the image to be created.
image = features.rasterize(
((g, 255) for g, v in shapes),
out_shape=src.shape,
transform=src.transform)
The values for the input shapes are replaced with 255 in a generator expression. Areas not covered by input geometries
are replaced with an optional fill value, which defaults to 0. The resulting image, written to disk like this,
with rasterio.open(
'/tmp/rasterized-results.tif', 'w',
driver='GTiff',
dtype=rasterio.uint8,
count=1,
width=src.width,
height=src.height) as dst:
dst.write(image, indexes=1)
Todo: fillnodata()
5.9 Georeferencing
There are two parts to the georeferencing of raster datasets: the definition of the local, regional, or global system in
which a raster’s pixels are located; and the parameters by which pixel coordinates are transformed into coordinates in
that system.
The coordinate reference system of a dataset is accessed from its DatasetReader.crs attribute.
Rasterio follows pyproj and uses PROJ.4 syntax in dict form as its native CRS syntax. If you want a WKT representation
of the CRS, see: CRS.to_wkt():
>>> src.crs.to_wkt()
'PROJCS["WGS 84 / UTM zone 18N",GEOGCS["WGS 84",DATUM["WGS_1984",SPHEROID["WGS 84",
˓→6378137,298.257223563,AUTHORITY["EPSG","7030"]],AUTHORITY["EPSG","6326"]],PRIMEM[
˓→"Greenwich",0,AUTHORITY["EPSG","8901"]],UNIT["degree",0.0174532925199433,AUTHORITY[
˓→"EPSG","9122"]],AUTHORITY["EPSG","4326"]],PROJECTION["Transverse_Mercator"],PARAMETER[
˓→"latitude_of_origin",0],PARAMETER["central_meridian",-75],PARAMETER["scale_factor",0.
˓→9996],PARAMETER["false_easting",500000],PARAMETER["false_northing",0],UNIT["metre",1,
˓→AUTHORITY["EPSG","9001"]],AXIS["Easting",EAST],AXIS["Northing",NORTH],AUTHORITY["EPSG",
˓→"32618"]]'
When opening a new file for writing, you may also use a CRS string as an argument.
>>> profile = {'driver': 'GTiff', 'height': 100, 'width': 100, 'count': 1, 'dtype':␣
˓→rasterio.uint8}
This section describes the three primary kinds of georefencing metadata supported by rasterio.
Affine
A dataset’s pixel coordinate system has its origin at the “upper left” (imagine it displayed on your screen). Column index
increases to the right, and row index increases downward. The mapping of these coordinates to “world” coordinates in
the dataset’s reference system is typically done with an affine transformation matrix.
>>> src.transform
Affine(300.0379266750948, 0.0, 101985.0,
0.0, -300.041782729805, 2826915.0)
The Affine object is a named tuple with elements a, b, c, d, e, f corresponding to the elements in the matrix
equation below, in which a pixel’s image coordinates are x, y and its world coordinates are x', y'.:
| x' | | a b c | | x |
| y' | = | d e f | | y |
| 1 | | 0 0 1 | | 1 |
The Affine class has some useful properties and methods described at https://github.com/sgillies/affine.
Some datasets may not have an affine transformation matrix, but are still georeferenced.
A ground control point (GCP) is the mapping of a dataset’s row and pixel coordinate to a single world x, y, and
optionally z coordinate. Typically a dataset will have multiple GCPs distributed across the image. Rasterio can calculate
an affine transformation matrix from a collection of GCPs using the rasterio.transform.from_gcps() method.
Alternatively GCP interpolation can also be used for coordinate transforms.
A dataset may also be georeferenced with a set of rational polynomial coefficients (RPCs) which can be used to compute
pixel coordinates from x, y, and z coordinates. The RPCs are an application of the Rigorous Projection Model which
uses four sets of 20 term cubic polynomials and several normalizing parameters to establish a relationship between
image and world coordinates. RPCs are defined with image coordinates in pixel units and world coordinates in decimal
degrees of longitude and latitude and height above the WGS84 ellipsoid (EPSG:4326).
RPCs are usually provided by the dataset provider and are only well behaved over the extent of the image. Addi-
tionally, accurate height values are required for the best results. Datasets with low terrain variation may use an
average height over the extent of the image, while datasets with higher terrain variation should use a digital eleva-
tion model to sample height values.The coordinate transformation from world to pixel coordinates is exact while the
reverse is not, and must be computed iteratively. For more details on coordinate transformations using RPCs see
GDALCreateRPCTransformerV2().
5.10 Options
GDAL’s format drivers have many configuration options. These options come in two flavors:
• Configuration options are used to alter the default behavior of GDAL and OGR and are generally treated as
global environment variables by GDAL. These are set through a rasterio.Env context block in Python.
• Creation options are passed into the driver at dataset creation time as keyword arguments to rasterio.
open(mode='w').
GDAL options are typically set as environment variables. While environment variables will influence the behavior of
rasterio, we highly recommended avoiding them in favor of defining behavior programatically.
The preferred way to set options for rasterio is via rasterio.Env. Options set on entering the context are deleted on
exit.
import rasterio
with rasterio.Env(GDAL_TIFF_INTERNAL_MASK=True):
# GeoTIFFs written here will have internal masks, not the
# .msk sidecars.
# ...
Use native Python forms (True and False) for boolean options. Rasterio will convert them GDAL’s internal forms.
See the configuration options page for a complete list of available options.
Each format has it’s own set of driver-specific creation options that can be used to fine tune the output rasters. For
details on a particular driver, see the formats list.
For the purposes of this document, we will focus on the GeoTIFF creation options. Some of the common GeoTIFF
creation options include:
• TILED, BLOCKXSIZE, and BLOCKYSIZE to define the internal tiling
• COMPRESS to define the compression method
• PHOTOMETRIC to define the band’s color interpretation
To specify these creation options in python code, you pass them as keyword arguments to the rasterio.open()
command in write mode.
Note: The GeoTIFF format requires that blockxsize and blockysize be multiples of 16.
5.10. Options 41
rasterio Documentation, Release 1.4dev
On the command line, rio commands will accept multiple --co options.
Attention: Some options may at a glance appear to be boolean, but are not. The GeoTIFF format’s BIGTIFF
option is one of these. The value must be YES, NO, IF_NEEDED, or IF_SAFER.
Note: Some configuration options also have an effect on driver behavior at creation time.
5.11 Interoperability
Some python image processing software packages organize arrays differently than rasterio. The interpretation of a
3-dimension array read from rasterio is:
while image processing software like scikit-image, pillow and matplotlib are generally ordered:
The number of rows defines the dataset’s height, the columns are the dataset’s width.
Numpy provides a way to efficiently swap the axis order and you can use the following reshape functions to convert
between raster and image axis order:
Using rasterio with fiona, it is simple to open a shapefile, read geometries, and mask out regions of a raster that
are outside the polygons defined in the shapefile.
import fiona
import rasterio
import rasterio.mask
This shapefile contains a single polygon, a box near the center of the raster, so in this case, our list of features is one
element long.
Using matplotlib.pyplot.plot() and matplotlib.pyplot.imshow(), we can see the region defined by the
shapefile in red overlaid on the original raster.
Applying the features in the shapefile as a mask on the raster sets all pixels outside of the features to be zero. Since
crop=True in this example, the extent of the raster is also set to be the extent of the features in the shapefile. We can
then use the updated spatial transform and raster height and width to write the masked raster to a new file.
out_meta.update({"driver": "GTiff",
"height": out_image.shape[1],
"width": out_image.shape[2],
"transform": out_transform})
Nodata masks allow you to identify regions of valid data values. In using Rasterio, you’ll encounter two different kinds
of masks.
One is the the valid data mask from GDAL, an unsigned byte array with the same number of rows and columns as
the dataset in which non-zero elements (typically 255) indicate that the corresponding data elements are valid. Other
elements are invalid, or nodata elements.
The other kind of mask is a numpy.ma.MaskedArray which has the inverse sense: True values in a masked array’s
mask indicate that the corresponding data elements are invalid. With care, you can safely navigate convert between the
two mask types.
Consider Rasterio’s RGB.byte.tif test dataset. It has 718 rows and 791 columns of pixels. Each pixel has 3 8-bit (uint8)
channels or bands. It has a trapezoid of image data within a rectangular background of 0,0,0 value pixels.
Metadata in the dataset declares that values of 0 will be interpreted as invalid data or nodata pixels. In, e.g., merging
the image with adjacent scenes, we’d like to ignore the nodata pixels and have only valid image data in our final mosaic.
Let’s look at the two kinds of masks and their inverse relationship in the context of RGB.byte.tif.
For every band of a dataset there is a mask. These masks can be had as arrays using the dataset’s read_masks()
method. Below, msk is the valid data mask corresponding to the first dataset band.
This 2D array is a valid data mask in the sense of GDAL RFC 15. The 0 values in its corners represent nodata regions.
Zooming in on the interior of the mask array shows the 255 values that indicate valid data regions.
>>> msk[200:205,200:205]
array([[255, 255, 255, 255, 255],
[255, 255, 255, 255, 255],
[255, 255, 255, 255, 255],
[255, 255, 255, 255, 255],
[255, 255, 255, 255, 255]], dtype=uint8)
Wait, what are these 0 values in the mask interior? This is an example of a problem inherent in 8-bit raster data: lack
of dynamic range. The dataset creator has said that 0 values represent missing data (see the nodatavals property
in the first code block of this document), but some of the valid data have values so low they’ve been rounded during
processing to zero. This can happen in scaling 16-bit data to 8 bits. There’s no magic nodata value bullet for this.
Using 16 bits per band helps, but you really have to be careful with 8-bit per band datasets and their nodata values.
Writing a mask that applies to all dataset bands is just as straightforward: pass an ndarray with True (or values that
evaluate to True to indicate valid data and False to indicate no data to write_mask(). Consider a copy of the test
data opened in “r+” (update) mode.
To mark that all pixels of all bands are valid (i.e., to override nodata metadata values that can’t be unset), you’d do this.
>>> src.write_mask(True)
>>> src.read_masks(1).all()
True
No data have been altered, nor have the dataset’s nodata values been changed. A new band has been added to the dataset
to store the valid data mask. By default it is saved to a “sidecar” GeoTIFF alongside the dataset file. When such a .msk
GeoTIFF exists, Rasterio will ignore the nodata metadata values and return mask arrays based on the .msk file.
$ ls -l copy.tif*
-rw-r--r--@ 1 sean staff 1713704 Mar 24 14:19 copy.tif
-rw-r--r-- 1 sean staff 916 Mar 24 14:25 copy.tif.msk
Can Rasterio help fix buggy nodata masks like the ones in RGB.byte.tif? It certainly can. Consider a fresh copy of that
file.
>>> src.close()
>>> tmp = shutil.copy("tests/data/RGB.byte.tif", "/tmp/RGB.byte.tif")
>>> src = rasterio.open(tmp, mode="r+")
This time we’ll read all 3 band masks (based on the nodata values, not a .msk GeoTIFF) and show them as an RGB
image (with the help of numpy.dstack()):
Colored regions appear where valid data pixels don’t quite coincide. This is, again, an artifact of scaling data down to
8 bits per band. We’ll begin by constructing a new mask array from the logical conjunction of the three band masks
we’ve read.
Now we’ll use sieve() to shake out the small buggy regions of the mask. I’ve found the right value for the size
argument empirically.
>>> src.write_mask(sieved_msk)
>>> src.close()
The result is a properly masked dataset that allows some 0 value pixels to be considered valid.
As mentioned earlier, this mask is the inverse of the GDAL band mask. To get a mask conforming to GDAL RFC 15,
do this:
You can rely on this Rasterio identity for any integer value N.
>>> N = 1
>>> (~src.read(N, masked=True).mask * 255 == src.read_masks(N)).all()
True
Sometimes a per-band mask is not appropriate. In this case you can either construct a mask out of the component bands
(or other auxillary data) manually or use the Rasterio dataset’s dataset_mask() function. This returns a 2D array
with a GDAL-style mask determined by the following criteria, in order of precedence:
1. If a .msk file, dataset-wide alpha or internal mask exists, it will be used as the dataset mask.
2. If a 4-band RGBA with a shadow nodata value, band 4 will be used as the dataset mask.
3. If a nodata value exists, use the binary OR (|) of the band masks
4. If no nodata value exists, return a mask filled with all valid data (255)
Note that this differs from read_masks and GDAL RFC15 in that it applies per-dataset, not per-band.
The storage and representation of nodata differs depending on the data format and configuration options. While Rasterio
provides an abstraction for those details when reading, it’s often important to understand the differences when creating,
manipulating and writing raster data.
• Nodata values: the nodata value is used to define which pixels should be masked.
• Alpha band: with RGB imagery, an additional 4th band (containing a GDAL-style 8-bit mask) is sometimes
provided to explictly define the mask.
• Internal mask band: GDAL provides the ability to store an additional boolean 1-bit mask that is stored internally
to the dataset. This option relies on a GDAL environment with GDAL_TIFF_INTERNAL_MASK=True. Otherwise
the mask will be written externally.
• External mask band: Same as above but the mask band is stored in a sidecar .msk file (default).
Other sections of this documentation have explained how Rasterio can access data stored in existing files on disk written
by other programs or write files to be used by other GIS programs. Filenames have been the typical inputs and files on
disk have been the typical outputs.
There are different options for Python programs that have streams of bytes, e.g., from a network socket, as their input
or output instead of filenames. One is the use of a temporary file on disk.
import tempfile
The MemoryFile class behaves a bit like BytesIO and NamedTemporaryFile(). A GeoTIFF file in a sequence of
data bytes can be opened in memory as shown below.
This code can be several times faster than the code using NamedTemporaryFile() at roughly double the price in
memory.
These two modes are incompatible: a MemoryFile initialized with a sequence of bytes cannot be extended.
An empty MemoryFile can also be written to using dataset API methods.
Like BytesIO, MemoryFile implements the Python file protocol and provides read(), seek(), and tell() methods.
Instances are thus suitable as arguments for methods like requests.post().
requests.post('https://example.com/upload', data=memfile)
One of the biggest API changes on the road to Rasterio 1.0 is the full deprecation of GDAL-style geotransforms in favor
of the affine library. For reference, an affine.Affine() looks like:
affine.Affine(a, b, c,
d, e, f)
(c, a, b, f, d, e)
Fundamentally these two constructs provide the same information, but the Affine() object is more useful.
Here’s a history of this feature:
1. Originally, functions with a transform argument expected a GDAL geotransform.
2. The introduction of the affine library involved creating a temporary affine argument for rasterio.open() and
a src.affine property. Users could pass an Affine() to affine or transform, but a GDAL geotransform
passed to transform would issue a deprecation warning.
3. src.transform remained a GDAL geotransform, but issued a warning. Users were pointed to src.affine
during the transition phase.
4. Since the above changes, several functions have been added to Rasterio that accept a transform argument.
Rather than add an affine argument to each, the transform argument could be either an Affine() object or
a GDAL geotransform, the latter issuing the same deprecation warning.
The original plan was to remove the affine argument + property, and assume that the object passed to transform is
an Affine(). However, after further discussion it was determined that since Affine() and GDAL geotransforms are
both 6 element tuples users may experience unexplained errors and outputs, so an exception is raised instead to better
highlight the error.
Before 1.0b1:
• rasterio.open() will still accept affine and transform, but the former now issues a deprecation warning
and the latter raises an exception if it does not receive an Affine().
• If rasterio.open() receives both affine and transform a warning is issued and transform is used.
• src.affine remains but issues a deprecation warning.
• src.transform returns an Affine().
• All other Rasterio functions with a transform argument now raise an exception if they receive a GDAL geo-
transform.
Tickets
I/O Operations
Methods related to reading band data and dataset masks have changed in 1.0.
Beginning with version 1.0b1, there is no longer a read_mask method, only read_masks. Datasets may be opened in
read-write “w+” mode when their formats allow and a warning will be raised when band data or masks are read from
datasets opened in “w” mode.
Beginning with 1.0.0, the “w” mode will become write-only and reading data or masks from datasets opened in “w”
will be prohibited.
Previously users could register GDAL’s drivers and open a datasource with:
import rasterio
with rasterio.drivers():
but Rasterio 1.0 contains more interactions with GDAL’s environment, so rasterio.drivers() has been replaced
with:
import rasterio
import rasterio.env
with rasterio.Env():
Tickets
Removed: src.read_band()
The read_band() method has been replaced by read(), which allows for faster I/O and reading multiple bands into
a single numpy.ndarray.
For example:
import numpy as np
import rasterio
is now:
import rasterio
Tickets
• # 83 - Introduction of src.read().
• #96, #284 - Deprecation of src.read_band().
Removed: src.read_mask()
The src.read_mask() method produced a single mask for the entire datasource, but could not handle producing a
single mask per band, so it was deprecated in favor of src.read_masks(), although it has no direct replacement.
Tickets
Several functions in the top level rasterio namespace for working with dataset windows have been moved to
rasterio.windows.*:
• rasterio.get_data_window()
• rasterio.window_union()
• rasterio.window_intersection()
• rasterio.windows_intersect()
Tickets
This module has been removed completely and its contents have been moved to several different locations:
Tickets
This module has been removed completely and its contents have been moved to several different locations:
Tickets
For both rasterio.features.sieve() and rasterio.features.rasterize() the output argument has been
replaced with out. Previously the use of output issued a deprecation warning.
Methods get_crs, set_crs, set_nodatavals, set_descriptions, set_units, and set_gcps are deprecated and will be removed
in version 1.0. They have been replaced by fully settable dataset properties crs, nodatavals, descriptions, units, and
gcps.
In the cases of units and descriptions, set_band_unit and set_band_description methods remain to support the rio-edit-
info command.
Rasterio no longer saves dataset creation options to the metadata of created datasets and will ignore such metadata
starting in version 1.0. Users may opt in to this by setting RIO_IGNORE_CREATION_KWDS=TRUE in their envi-
ronments.
5.16 Overviews
Overviews are reduced resolution versions of your dataset that can speed up rendering when you don’t need full reso-
lution. By precomputing the upsampled pixels, rendering can be significantly faster when zoomed out.
Overviews can be stored internally or externally, depending on the file format.
In some cases we may want to make a copy of the test data to avoid altering the original.
We must specify the zoom factors for which to build overviews. Commonly these are exponents of 2
To control the visual quality of the overviews, the ‘nearest’, ‘cubic’, ‘average’, ‘mode’, and ‘gauss’ resampling
alogrithms are available. These are available through the Resampling enum
Creating overviews requires opening a dataset in r+ mode, which gives us access to update the data in place. By
convention we also add a tag in the rio_overview namespace so that readers can determine what resampling method
was used.
We can read the updated dataset and confirm that the overviews are present
5.16. Overviews 57
rasterio Documentation, Release 1.4dev
And to leverage the overviews, we can perform a decimated read at a reduced resolution which should allow libgdal to
read directly from the overviews rather than compute them on-the-fly.
>>> src.read().shape
(3, 718, 791)
>>> src.read(out_shape=(3, int(src.height / 4), int(src.width / 4))).shape
(3, 179, 197)
5.17 Plotting
Rasterio reads raster data into numpy arrays so plotting a single band as two dimensional data can be accomplished
directly with pyplot.
Rasterio also provides rasterio.plot.show() to perform common tasks such as displaying multi-band images as
RGB and labeling the axes with proper geo-referenced extents.
The first argument to show() represent the data source to be plotted. This can be one of
• A dataset object opened in ‘r’ mode
• A single band of a source, represented by a (src, band_index) tuple
• A numpy.ndarray, 2D or 3D. If the array is 3D, ensure that it is in rasterio band order.
Thus the following operations for 3-band RGB data are equivalent. Note that when passing arrays, you can pass in a
transform in order to get extent labels.
and similarly for single band plots. Note that you can pass in cmap to specify a matplotlib color ramp. Any kwargs
passed to show() will be passed through to the underlying pyplot functions.
5.17. Plotting 59
rasterio Documentation, Release 1.4dev
You can create a figure with multiple subplots by passing the show(..., ax=ax1) argument. Also note that this
example demonstrates setting the overall figure size and sets a title for each subplot.
5.17. Plotting 61
rasterio Documentation, Release 1.4dev
Rasterio also provides a show_hist() function for generating histograms of single or multiband rasters:
5.17. Plotting 63
rasterio Documentation, Release 1.4dev
The rasterio.profiles module contains an example of a named profile that may be useful in applications:
class DefaultGTiffProfile(Profile):
"""Tiled, band-interleaved, LZW-compressed, 8-bit GTiff."""
defaults = {
'driver': 'GTiff',
'interleave': 'band',
'tiled': True,
'blockxsize': 256,
'blockysize': 256,
'compress': 'lzw',
'nodata': 0,
'dtype': uint8
}
It can be used to create new datasets. Note that it doesn’t count bands and that a count keyword argument needs to be
passed when creating a profile.
with rasterio.open(
'output.tif', 'w', **DefaultGTiffProfile(count=3)) as dst_dataset:
# Write data to the destination dataset.
Todo:
• Discuss and/or link to topics
– supported formats, drivers
– vsi
– tags
– profile
– crs
– transforms
– dtypes
Dataset objects provide read, read-write, and write access to raster data files and are obtained by calling rasterio.
open(). That function mimics Python’s built-in open() and the dataset objects it returns mimic Python file objects.
If you try to access a nonexistent path, rasterio.open() does the same thing as open(), raising an exception imme-
diately.
>>> open('/lol/wut.tif')
Traceback (most recent call last):
...
FileNotFoundError: [Errno 2] No such file or directory: '/lol/wut.tif'
>>> rasterio.open('/lol/wut.tif')
Traceback (most recent call last):
...
rasterio.errors.RasterioIOError: No such file or directory
Datasets generally have one or more bands (or layers). Following the GDAL convention, these are indexed starting
with the number 1. The first band of a file can be read like this:
The returned object is a 2-dimensional numpy.ndarray. The representation of that array at the Python prompt is a
summary; the GeoTIFF file that Rasterio uses for testing has 0 values in the corners, but has nonzero values elsewhere.
Instead of reading single bands, all bands of the input dataset can be read into a 3-dimensonal ndarray. Note that the
interpretation of the 3 axes is (bands, rows, columns). See Image processing software for more details on how to
convert to the ordering expected by some software.
In order to read smaller chunks of the dataset, refer to Windowed reading and writing.
The indexes, Numpy data types, and nodata values of all a dataset’s bands can be had from its indexes, dtypes, and
nodatavals attributes.
>>> src.close()
>>> src
<closed DatasetReader name='tests/data/RGB.byte.tif' mode='r'>
>>> src.read(1)
Traceback (most recent call last):
...
ValueError: can't read closed raster file
>>> f = open('README.rst')
>>> f.close()
>>> f.read()
Traceback (most recent call last):
...
ValueError: I/O operation on closed file.
As Python file objects can, Rasterio datasets can manage the entry into and exit from runtime contexts created using
a with statement. This ensures that files are closed no matter what exceptions may be raised within the the block.
Format-specific dataset reading options may be passed as keyword arguments. For example, to turn off all types of
GeoTIFF georeference except that within the TIFF file’s keys and tags, pass GEOREF_SOURCES=’INTERNAL’.
5.20 Reprojection
Rasterio can map the pixels of a destination raster with an associated coordinate reference system and transform to
the pixels of a source image with a different coordinate reference system and transform. This process is known as
reprojection.
Rasterio’s rasterio.warp.reproject() is a geospatial-specific analog to SciPy’s scipy.ndimage.
interpolation.geometric_transform()1 .
The code below reprojects between two arrays, using no pre-existing GIS datasets. rasterio.warp.reproject()
has two positional arguments: source and destination. The remaining keyword arguments parameterize the reprojection
1 https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.geometric_transform.html#scipy.ndimage.geometric_transform
transform.
import numpy as np
import rasterio
from rasterio import Affine as A
from rasterio.warp import reproject, Resampling
with rasterio.Env():
reproject(
source,
destination,
src_transform=src_transform,
src_crs=src_crs,
dst_transform=dst_transform,
dst_crs=dst_crs,
resampling=Resampling.nearest)
See examples/reproject.py for code that writes the destination array to a GeoTIFF file. I’ve uploaded the resulting file
to a Mapbox map to show that the reprojection is correct: https://a.tiles.mapbox.com/v3/sgillies.hfek2oko/page.html?
secure=1#6/0.000/0.033. (dead link)
5.20. Reprojection 69
rasterio Documentation, Release 1.4dev
Reprojecting a GeoTIFF dataset from one coordinate reference system is a common use case. Rasterio provides a few
utilities to make this even easier:
transform_bounds() transforms the bounding coordinates of the source raster to the target coordinate reference
system, densifiying points along the edges to account for non-linear transformations of the edges.
calculate_default_transform() transforms bounds to target coordinate system, calculates resolution if not pro-
vided, and returns destination transform and dimensions.
import numpy as np
import rasterio
from rasterio.warp import calculate_default_transform, reproject, Resampling
dst_crs = 'EPSG:4326'
See rasterio/rio/warp.py for more complex examples of reprojection based on new bounds, dimensions, and
resolution (as well as a command-line interface described here).
It is also possible to use reproject() to create an output dataset zoomed out by a factor of 2. Methods of the
rasterio.Affine class help us generate the output dataset’s transform matrix and, thereby, its spatial extent.
import numpy as np
import rasterio
from rasterio import Affine as A
(continues on next page)
data = src.read()
kwargs = src.meta
kwargs['transform'] = dst_transform
reproject(
band,
dest,
src_transform=src_transform,
src_crs=src.crs,
dst_transform=dst_transform,
dst_crs=src.crs,
resampling=Resampling.nearest)
dst.write(dest, indexes=i)
Most geospatial datasets have a geotransform which can be used to reproject a dataset from one coordinate reference
system to another. Datasets may also be georeferenced by alternative metadata, namely Ground Control Points (gcps)
or Rational Polynomial Coefficients (rpcs). For details on gcps and rpcs, see Georeferencing. A common scenario is
using gcps or rpcs to geocode (orthorectify) datasets, resampling and reorienting them to a coordinate reference system
with a newly computed geotransform.
import numpy as np
import rasterio
from rasterio.warp import reproject
from rasterio.enums import Resampling
5.20. Reprojection 71
rasterio Documentation, Release 1.4dev
kwargs = {
'RPC_DEM': '/path/to/dem.tif'
}
_, dst_transform = reproject(
rasterio.band(source, 1),
destination,
rpcs=source.rpcs,
src_crs=src_crs,
dst_crs=dst_crs,
resampling=Resampling.nearest,
**kwargs
)
assert destination.any()
Note: When reprojecting a dataset with gcps or rpcs, the src_crs parameter should be supplied with the coordinate
reference system that the gcps or rpcs are referenced against. By definition rpcs are always referenced against WGS84
ellipsoid with geographic coordinates (EPSG:4326)2 .
5.20.4 References
5.21 Resampling
Resampling refers to changing the cell values due to changes in the raster cell grid. This can occur during reprojection.
Even if the projection is not changing, we may want to change the effective cell size of an existing dataset.
Upsampling refers to cases where we are converting to higher resolution/smaller cells. Downsampling is resampling
to lower resolution/larger cellsizes.
By reading from a raster source into an output array of a different size or by specifying an out_shape of a different size
you are effectively resampling the data.
Here is an example of upsampling by a factor of 2 using the bilinear resampling method.
2 http://geotiff.maptools.org/rpc_prop.html
import rasterio
from rasterio.enums import Resampling
upscale_factor = 2
When you change the raster cell grid, you must recalculate the pixel values. There is no “correct” way to do this as all
methods involve some interpolation.
The current resampling methods can be found in the rasterio.enums.Resampling class.
Of note, the default nearest method may not be suitable for continuous data. In those cases, bilinear and cubic are
better suited. Some specialized statistical resampling method exist, e.g. average, which may be useful when certain
numerical properties of the data are to be retained.
5.21. Resampling 73
rasterio Documentation, Release 1.4dev
This document is written specifically for users of GDAL’s Python bindings (osgeo.gdal) who have read about Ras-
terio’s philosophy and want to know what switching entails. The good news is that switching may not be complicated.
This document explains the key similarities and differences between these two Python packages and highlights the
features of Rasterio that can help in switching.
Rasterio and GDAL’s bindings can contend for global GDAL objects. Unless you have deep knowledge about both
packages, choose exactly one of import osgeo.gdal or import rasterio.
GDAL’s bindings (gdal for the rest of this document) and Rasterio are not entirely compatible and should not, without
a great deal of care, be imported and used in a single Python program. The reason is that the dynamic library they
each load (these are C extension modules, remember), libgdal.so on Linux, gdal.dll on Windows, has a number
of global objects and the two modules take different approaches to managing these objects.
Static linking of the GDAL library for gdal and rasterio can avoid this contention, but in practice you will almost
never see distributions of these modules that statically link the GDAL library.
Beyond the issues above, the modules have different styles – gdal reads and writes like C while rasterio is more
Pythonic – and don’t complement each other well.
GDAL library functions are excuted in a context of format drivers, error handlers, and format-specific configuration
options that this document will call the “GDAL Environment.” Rasterio has an abstraction for the GDAL environment,
gdal does not.
With gdal, this context is initialized upon import of the module. This makes sense because gdal objects are thin
wrappers around functions and classes in the GDAL dynamic library that generally require registration of drivers and
error handlers. The gdal module doesn’t have an abstraction for the environment, but it can be modified using functions
like gdal.SetErrorHandler() and gdal.UseExceptions().
Rasterio has modules that don’t require complete initialization and configuration of GDAL (rasterio.dtypes,
rasterio.profiles, and rasterio.windows, for example) and in the interest of reducing overhead doesn’t register
format drivers and error handlers until they are needed. The functions that do need fully initialized GDAL environments
will ensure that they exist. rasterio.open() is the foremost of this category of functions. Consider the example code
below.
import rasterio
# The GDAL environment has no registered format drivers or error
# handlers at this point.
Importing rasterio does not initialize the GDAL environment. Calling rasterio.open() does. This is different
from gdal where import osgeo.gdal, not osgeo.gdal.Open(), initializes the GDAL environment.
Rasterio has an abstraction for the GDAL environment, rasterio.Env, that can be invoked explicitly for more control
over the configuration of GDAL as shown below.
import rasterio
# The GDAL environment has no registered format drivers or error
# handlers at this point.
As mentioned previously, gdal has no such abstraction for the GDAL environment. The nearest approximation would
be something like the code below.
Please note that to the Env class, GDAL_CACHEMAX is strictly an integer number of bytes. GDAL’s shorthand notation
is not supported.
gdal provides objects for each of the GDAL format drivers. With Rasterio, format drivers are represented by strings
and are used only as arguments to functions like rasterio.open().
Rasterio uses URIs to identify datasets, with schemes for different protocols. The GDAL bindings have their own
special syntax.
Unix-style filenames such as /var/data/example.tif identify dataset files for both Rasterio and gdal. Rasterio
also accepts ‘file’ scheme URIs like file:///var/data/example.tif.
Rasterio identifies datasets within ZIP or tar archives using Apache VFS style identifiers like zip:///var/data/
example.zip!example.tif or tar:///var/data/example.tar!example.tif.
Datasets served via HTTPS are identified using ‘https’ URIs like https://landsat-pds.s3.amazonaws.com/L8/
139/045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF.
Datasets on AWS S3 are identified using ‘s3’ scheme identifiers like s3://landsat-pds/L8/139/045/
LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF.
With gdal, the equivalent identifiers are respectively /vsizip//var/data/example.zip/example.tif,
/vsitar//var/data/example.tar/example.tif, /vsicurl/landsat-pds.s3.amazonaws.com/L8/139/
045/LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF, and /vsis3/landsat-pds/L8/139/045/
LC81390452014295LGN00/LC81390452014295LGN00_B1.TIF.
To help developers switch, Rasterio will accept these identifiers and other format-specific connection strings, too, and
dispatch them to the proper format drivers and protocols.
Rasterio and gdal each have dataset objects. Not the same classes, of course, but not radically different ones. In each
case, you generally get dataset objects through an “opener” function: rasterio.open() or gdal.Open().
So that Python developers can spend less time reading docs, the dataset object returned by rasterio.open() is
modeled on Python’s file object. It even has the close() method that gdal lacks so that you can actively close dataset
connections.
5.22.6 Bands
gdal and Rasterio both have band objects. But unlike gdal’s band, Rasterio’s band is just a tuple of the dataset, band
index and some other band properties. Thus Rasterio never has objects with dangling dataset pointers. With Rasterio,
bands are represented by a numerical index, starting from 1 (as GDAL does), and are used as arguments to dataset
methods. To read the first band of a dataset as a numpy.ndarray, do this.
A band object can be used to represent a single band (or a sequence of bands):
Other attributes of GDAL band objects generally surface in Rasterio as tuples returned by dataset attributes, with one
value per band, in order.
Developers that want read-only band objects for their applications can create them by zipping these tuples together.
src = rasterio.open('example.tif')
bands = [Band(vals) for vals in zip(
src.indexes, src.dtypes, src.descriptions, src.units)]
5.22.7 Geotransforms
The DatasetReader.transform attribute is comparable to the GeoTransform attribute of a GDAL dataset, but
Rasterio’s has more power. It’s not just an array of affine transformation matrix elements, it’s an instance of an Affine
class and has many handy methods. For example, the spatial coordinates of the upper left corner of any raster element
is the product of the DatasetReader.transform matrix and the (column, row) index of the element.
To help developers switch, Affine instances can be created from or converted to the sequences used by gdal.
The DatasetReader.crs attribute is an instance of Rasterio’s CRS() class and works well with pyproj.
5.22.9 Tags
GDAL metadata items are called “tags” in Rasterio. The tag set for a given GDAL metadata namespace is represented
as a dict.
>>> src.tags()
{'AREA_OR_POINT': 'Area'}
>>> src.tags(ns='IMAGE_STRUCTURE')
{'INTERLEAVE': 'PIXEL'}
The semantics of the tags in GDAL’s default and IMAGE_STRUCTURE namespaces are described in https://gdal.org/user/
raster_data_model.html. Rasterio uses several namespaces of its own: rio_creation_kwds and rio_overviews,
each with their own semantics.
Rasterio adds an abstraction for subsets or windows of a raster array that GDAL does not have. A window is a pair
of tuples, the first of the pair being the raster row indexes at which the window starts and stops, the second being
the column indexes at which the window starts and stops. Row before column, as with ndarray slices. Instances of
Window are created by passing the four subset parameters used with gdal to the class constructor.
src = rasterio.open('example.tif')
xoff, yoff = 0, 0
xsize, ysize = 10, 10
subset = src.read(1, window=Window(xoff, yoff, xsize, ysize))
Rasterio provides an array for every dataset representing its valid data mask using the same indicators as GDAL: 0 for
invalid data and 255 for valid data.
Where the masked array’s mask is True, the data is invalid and has been masked “out” in the opposite sense of GDAL’s
mask.
Rasterio always raises Python exceptions when an error occurs and never returns an error code or None to indicate an er-
ror. gdal takes the opposite approach, although developers can turn on exceptions by calling gdal.UseExceptions().
GDAL’s data model includes collections of key, value pairs for major classes. In that model, these are “metadata”, but
since they don’t have to be just for metadata, these key, value pairs are called “tags” in rasterio.
I’m going to use the rasterio interactive inspector in these examples below.
Tags belong to namespaces. To get a copy of a dataset’s tags from the default namespace, call tags() with no argu-
ments.
A dataset’s bands may have tags, too. Here are the tags from the default namespace for the first band, accessed using
the positional band index argument of tags().
>>> src.tags(1)['STATISTICS_MEAN']
'29.947726688477'
These are the tags that came with the sample data I’m using to test rasterio. In practice, maintaining stats in the tags
can be unreliable as there is no automatic update of the tags when the band’s image data changes.
The 3 standard, non-default GDAL tag namespaces are ‘SUBDATASETS’, ‘IMAGE_STRUCTURE’, and ‘RPC’. You
can get the tags from these namespaces using the ns keyword of tags().
>>> src.tags(ns='IMAGE_STRUCTURE')
{'INTERLEAVE': 'PIXEL'}
>>> src.tags(ns='SUBDATASETS')
{}
>>> src.tags(ns='RPC')
{}
A special case for GDAL tag namespaces are those prefixed with ‘xml’ e.g. ‘xml:TRE’ or ‘xml:VRT’. GDAL will treat
these namespaces as a single xml string.
You can add new tags to a dataset or band, in the default or another namespace, using the update_tags() method.
Unicode tag values, too, at least for TIFF files.
import rasterio
with rasterio.open(
'/tmp/test.tif',
'w',
driver='GTiff',
count=1,
dtype=rasterio.uint8,
width=10,
height=10) as dst:
dst.update_tags(a='1', b='2')
dst.update_tags(1, c=3)
with pytest.raises(ValueError):
dst.update_tags(4, d=4)
# True
assert dst.tags() == {'a': '1', 'b': '2'}
# True
assert dst.tags(1) == {'c': '3' }
As with image data, tags aren’t written to the file on disk until the dataset is closed.
5.24 Transforms
Rasterio supports three primary methods for transforming of coordinates from image pixel (row, col) to and from
geographic/projected (x, y) coordinates. The interface for performing these coordinate transformations is available in
rasterio.transform through one of AffineTransformer, GCPTransformer, or RPCTransformer. The methods
xy() and rowcol() are responsible for converting between (row, col) -> (x, y) and (x, y) -> (row, col), respectively.
AffineTransformer takes care of coordinate transformations given an Affine transformation matrix. For example
5.24. Transforms 81
rasterio Documentation, Release 1.4dev
The dataset methods xy() and index() use rasterio.transform under the hood
For accuracy a height value is typically required when using RPCTransformer. By default, a value of 0 is assumed.
A first order correction would be to use a mean elevation value for the image
Better yet is to sample height values from a digital elevation model (DEM). RPCTransformer allows for options to be
passed to GDALCreateRPCTransformerV2()
transformer.xy(0, 0)
(-123.47954729595642, 49.5279448909449)
The AffineTransformer is a pure Python class, however GCPTransformer and RPCTransformer make use of
C/C++ GDAL objects. Explicit control of the transformer object can be achieved by use within a context manager or
by calling close() method e.g.
Note: If RPC_DEM is specified in rpc_options, GDAL will maintain an open file handle to the DEM until the
transformer is closed.
Rasterio has a WarpedVRT class that abstracts many of the details of raster warping by using an in-memory Warped
VRT. A WarpedVRT can be the easiest solution for tiling large datasets.
For example, to virtually warp the RGB.byte.tif test dataset from its proper EPSG:32618 coordinate reference system
to EPSG:3857 (Web Mercator) and extract pixels corresponding to its central zoom 9 tile, do the following.
import rasterio
from rasterio.enums import Resampling
from rasterio.vrt import WarpedVRT
A WarpedVRT can be used to normalize a stack of images with differing projections, bounds, cell sizes, or dimensions
against a regular grid in a defined bounding box.
The tests/data/RGB.byte.tif file is in UTM zone 18, so another file in a different CRS is required for demonstration.
This command will create a new image with drastically different dimensions and cell size, and reproject to WGS84.
As of this writing rio warp implements only a subset of gdalwarp’s features, so gdalwarp must be used to achieve
the desired transform:
$ gdalwarp \
-t_srs EPSG:4326 \
-te_srs EPSG:32618 \
-te 101985 2673031 339315 2801254 \
-ts 200 250 \
tests/data/RGB.byte.tif \
tests/data/WGS84-RGB.byte.tif
and this snippet demonstrates how to normalize data to consistent dimensions, CRS, and cell size within a pre-defined
bounding box:
import affine
import rasterio
from rasterio.crs import CRS
from rasterio.enums import Resampling
from rasterio import shutil as rio_shutil
from rasterio.vrt import WarpedVRT
input_files = (
# This file is in EPSG:32618
'tests/data/RGB.byte.tif',
# This file is in EPSG:4326
'tests/data/WGS84-RGB.byte.tif'
)
vrt_options = {
'resampling': Resampling.cubic,
'crs': dst_crs,
'transform': dst_transform,
'height': dst_height,
'width': dst_width,
}
Todo: Support for URIs describing zip, s3, https resources. Relationship to GDAL vsicurl, vsis3 et al.
Rasterio relies on GDAL’s virtual filesystem interface to access datasets on the web, in cloud storage, in archive files,
and in Python objects.
5.26.1 AWS S3
After you have configured your AWS credentials as explained in the boto3 guide you can read metadata and imagery
from TIFFs stored as S3 objects with no change to your code.
with rasterio.open('s3://landsat-pds/L8/139/045/LC81390452014295LGN00/
˓→LC81390452014295LGN00_B1.TIF') as src:
print(src.profile)
# Printed:
# {'blockxsize': 512,
# 'blockysize': 512,
# 'compress': 'deflate',
# 'count': 1,
# 'crs': {'init': u'epsg:32645'},
# 'driver': u'GTiff',
# 'dtype': 'uint16',
# 'height': 7791,
# 'interleave': 'band',
# 'nodata': None,
# 'tiled': True,
# 'transform': Affine(30.0, 0.0, 381885.0,
# 0.0, -30.0, 2512815.0),
# 'width': 7621}
Note: AWS pricing concerns While this feature can reduce latency by reading fewer bytes from S3 compared to
downloading the entire TIFF and opening locally, it does make at least 3 GET requests to fetch a TIFF’s profile as
shown above and likely many more to fetch all the imagery from the TIFF. Consult the AWS S3 pricing guidelines
before deciding if aws.Session is for you.
Datasets stored in proprietary systems or addressable only through protocols not directly supported by GDAL can
be accessed using the opener keyword argument of rasterio.open. Here is an example of using fs_s3fs to ac-
cess the dataset in sentinel-s2-l2a-cogs/45/C/VQ/2022/11/S2B_45CVQ_20221102_0_L2A/B01.tif from the
sentinel-cogs AWS S3 bucket. Rasterio can access this without using the opener argument, but it makes a good
usage example. Other custom openers would work in the same way.
import rasterio
from fs_s3fs import S3FS
fs = S3FS(
bucket_name="sentinel-cogs",
dir_path="sentinel-s2-l2a-cogs/45/C/VQ/2022/11/S2B_45CVQ_20221102_0_L2A",
aws_access_key_id=AWS_ACCESS_KEY_ID,
aws_secret_access_key=AWS_SECRET_ACCESS_KEY,
)
Where AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are placeholders for the appropriate credentials.
Beginning in rasterio 0.3, you can read and write “windows” of raster files. This feature allows you to work on rasters
that are larger than your computers RAM or process chunks of large rasters in parallel.
5.27.1 Windows
A Window is a view onto a rectangular subset of a raster dataset and is described in rasterio by column and row offsets
and width and height in pixels. These may be ints or floats.
Windows may also be constructed from numpy array index tuples or slice objects. Only int values are permitted in
these cases.
If height and width keyword arguments are passed to from_slices(), relative and open-ended slices may be used.
5.27.2 Reading
Here is an example of reading a 256 row x 512 column subset of the rasterio test file.
Attention: In getting data to fill a window Rasterio will read the entirety of one or more chunks of data from
the dataset. If you’re reading from a GeoTIFF with 512 x 512 pixel chunks (blocks), that determines the minimum
number of bytes that will be read from disk or copied over your network, even if your read window is only 1 x 1
pixels. In the case that your source dataset does not use chunks (rare, but possible) Rasterio will read the entire
dataset in order to fill even a 1 x 1 pixel window. In practice, it’s important to chunk the data you create and store
for your applications.
5.27.3 Writing
Writing works similarly. The following creates a blank 500 column x 300 row GeoTIFF and plops 37,500 pixels with
value 127 into a window 30 pixels down from and 50 pixels to the right of the upper left corner of the GeoTIFF.
with rasterio.open(
'/tmp/example.tif', 'w',
driver='GTiff', width=500, height=300, count=1,
dtype=image.dtype) as dst:
dst.write(image, window=Window(50, 30, 250, 150), indexes=1)
The result:
5.27.4 Decimation
If the write window is smaller than the data, the data will be decimated. Below, the window is scaled to one third of
the source image.
with rasterio.open(
'/tmp/example.tif', 'w',
driver='GTiff', width=500, height=300, count=3,
dtype=r.dtype) as dst:
for k, arr in [(1, b), (2, g), (3, r)]:
dst.write(arr, indexes=k, window=write_window)
Sometimes it is desirable to crop off an outer boundary of NODATA values around a dataset. You can do this with
get_data_window():
kwargs = src.meta.copy()
kwargs.update({
'height': window.height,
'width': window.width,
'transform': rasterio.windows.transform(window, src.transform)})
The affine transform of a window can be accessed using a dataset’s window_transform() method:
Basic union and intersection operations are available for windows, to streamline operations across dynamically created
windows for a series of bands or datasets with the same full extent.
5.27.8 Blocks
Raster datasets are generally composed of multiple blocks of data and windowed reads and writes are most efficient
when the windows match the dataset’s own block structure. When a file is opened to read, the shape of blocks for any
band can be had from the block_shapes property.
The block windows themselves can be had from the block_windows function.
This function returns an iterator that yields a pair of values. The second is a window tuple that can be used in calls to
read() or write(). The first is the pair of row and column indexes of this block within all blocks of the dataset.
You may read windows of data from a file block-by-block like this.
Well-bred files have identically blocked bands, but GDAL allows otherwise and it’s a good idea to test this assumption
in your code.
The block_shapes property is a band-ordered list of block shapes and set(src.block_shapes) gives you the set of unique
shapes. Asserting that there is only one item in the set is effectively the same as asserting that all bands have the same
block structure. If they do, you can use the same windows for each.
Todo:
• appending to existing data
• context manager
• write 3d vs write 2d
• document issues with writing compressed files (per #77)
• discuss and refer to topics
– creation options
– transforms
– dtypes
– block windows
Opening a file in writing mode is a little more complicated than opening a text file in Python. The dimensions of the
raster dataset, the data types, and the specific format must be specified.
Here’s an example of basic rasterio functionality. An array is written to a new single band TIFF.
Writing data mostly works as with a Python file. There are a few format- specific differences.
GTiff is the only driver that supports writing directly to disk. GeoTiffs use the RasterUpdater and leverage the
full capabilities of the GDALCreate() function. We highly recommend using GeoTiff driver for writing as it is the
best-tested and best-supported format.
Some other formats that are writable by GDAL can also be written by Rasterio. These use an
IndirectRasterUpdater which does not create directly but uses a temporary in-memory dataset and
GDALCreateCopy() to produce the final output.
Some formats are known to produce invalid results using the IndirectRasterUpdater. These formats will raise a
RasterioIOError if you attempt to write to the. Currently this applies to the netCDF driver but please let us know if
you experience problems writing other formats.
SIX
6.1.1 Subpackages
rio CLI
rio blocks
95
rasterio Documentation, Release 1.4dev
Options
Arguments
INPUT
Required argument
rio bounds
Write bounding boxes to stdout as GeoJSON for use with, e.g., geojsonio
$ rio bounds *.tif | geojsonio
If a destination crs is passed via dst_crs, it takes precedence over the projection parameter.
Options
--precision <precision>
Decimal precision of coordinates.
--indent <indent>
Indentation level for JSON output
--compact, --not-compact
Use compact separators (‘,’, ‘:’).
--geographic
Output in geographic coordinates (the default).
--projected
Output in dataset’s own, projected coordinates.
--mercator
Output in Web Mercator coordinates.
--dst-crs <EPSG:NNNN>
Output in specified coordinates.
--sequence, --collection
Write a LF-delimited sequence of texts containing individual objects (the default) or write a single JSON text
containing a feature collection object.
--rs, --no-rs
Use RS (0x1E) as a prefix for individual texts in a sequence as per http://tools.ietf.org/html/
draft-ietf-json-text-sequence-13 (default is False).
--feature
Output as GeoJSON feature(s).
--bbox
Output as GeoJSON bounding box array(s).
Arguments
INPUT
Required argument(s)
rio calc
Example:
The command above produces a 3-band GeoTIFF with all values scaled by 0.95 and incremented by 2.
The command above produces a 3-band RGB GeoTIFF, with red levels incremented by 125, from the single-band input.
The maximum amount of memory used to perform caculations defaults to 64 MB. This number can be increased to
improve speed of calculation.
Options
Arguments
COMMAND
Required argument
INPUTS... OUTPUT
Required argument(s)
rio clip
Examples
Options
--overwrite
Always overwrite an existing output file.
--co, --profile <NAME=VALUE>
Driver specific creation options. See the documentation for the selected output driver for more information.
--with-complement, --without-complement
Include the relative complement of the raster in the given bounds (giving a larger result), else return results only
from the intersection of the raster and the bounds (the default).
Arguments
INPUT OUTPUT
Required argument(s)
rio convert
Copy and convert raster datasets to other data types and formats.
Data values may be linearly scaled when copying by using the –scale-ratio and –scale-offset options. Destination raster
values are calculated as
dst = scale_ratio * src + scale_offset
For example, to scale uint16 data with an actual range of 0-4095 to 0-255 as uint8:
$ rio convert in16.tif out8.tif –dtype uint8 –scale-ratio 0.0625
Format specific creation options may also be passed using –co. To tile a new GeoTIFF output file, do the following.
–co tiled=true –co blockxsize=256 –co blockysize=256
To compress it using the LZW method, add
–co compress=LZW
Options
--scale-offset <scale_offset>
Source to destination scaling offset.
--rgb
Set RGB photometric interpretation.
--overwrite
Always overwrite an existing output file.
--co, --profile <NAME=VALUE>
Driver specific creation options. See the documentation for the selected output driver for more information.
Arguments
INPUTS... OUTPUT
Required argument(s)
rio edit_info
Edit a dataset’s metadata: coordinate reference system, affine transformation matrix, nodata value, and tags.
The coordinate reference system may be either a PROJ.4 or EPSG:nnnn string,
–crs ‘EPSG:4326’
or a JSON text-encoded PROJ.4 object.
–crs ‘{“proj”: “utm”, “zone”: 18, . . . }’
Transforms are JSON-encoded Affine objects like:
–transform ‘[300.038, 0.0, 101985.0, 0.0, -300.042, 2826915.0]’
Prior to Rasterio 1.0 GDAL geotransforms were supported for –transform, but are no longer supported.
Metadata items may also be read from an existing dataset using a combination of the –like option with at least one of
–all, –crs like, –nodata like, and –transform like.
rio edit-info example.tif –like template.tif –all
To get just the transform from the template:
rio edit-info example.tif –like template.tif –transform like
Options
--crs <crs>
New coordinate reference system
--unset-crs
Unset the dataset’s CRS value.
--transform <transform>
New affine transform matrix
--units <units>
Edit units of a band (requires –bidx)
--description <description>
Edit description of a band (requires –bidx)
--tag <KEY=VAL>
New tag.
--all
Copy all metadata items from the template file.
--colorinterp <name[,name,...]|RGB|RGBA|like>
Set color interpretation for all bands like ‘red,green,blue,alpha’. Can also use ‘RGBA’ as shorthand for
‘red,green,blue,alpha’ and ‘RGB’ for the same sans alpha band. Use ‘like’ to inherit color interpretation from
‘–like’.
--like <like>
Raster dataset to use as a template for obtaining affine transform (bounds and resolution), crs, and nodata values.
Arguments
INPUT
Required argument
rio env
Options
--formats
Enumerate the available formats.
--credentials
Print credentials.
--gdal-data
Print GDAL data path.
--proj-data
Print PROJ data path.
rio gcps
row, col:
row (or line) and col (or pixel) coordinates.
x, y, z:
x, y, and z spatial coordinates.
crs:
The coordinate reference system for x, y, and z.
id:
A unique (within the dataset) identifier for the control
point.
info:
A brief description of the control point.
Options
--collection
Output as GeoJSON feature collection(s).
--feature
Output as GeoJSON feature(s).
--geographic
Output in geographic coordinates (the default).
--projected
Output in dataset’s own, projected coordinates.
--precision <precision>
Decimal precision of coordinates.
--rs, --no-rs
Use RS (0x1E) as a prefix for individual texts in a sequence as per http://tools.ietf.org/html/
draft-ietf-json-text-sequence-13 (default is False).
--indent <indent>
Indentation level for JSON output
--compact, --not-compact
Use compact separators (‘,’, ‘:’).
Arguments
INPUT
Required argument
rio info
Options
--meta
Show data file structure (default).
--tags
Show data file tags.
--namespace <namespace>
Select a tag namespace.
--indent <indent>
Indentation level for pretty printed output
--count
Print the count of bands.
-t, --dtype
Print the dtype name.
--nodata
Print the nodata value.
-f, --format, --driver
Print the format driver.
--shape
Print the (height, width) shape.
--height
Print the height (number of rows).
--width
Print the width (number of columns).
--crs
Print the CRS as a PROJ.4 string.
--bounds
Print the boundary coordinates (left, bottom, right, top).
-r, --res
Print pixel width and height.
--lnglat
Print longitude and latitude at center.
--stats
Print statistics (min, max, mean) of a single band (use –bidx).
--checksum
Print integer checksum of a single band (use –bidx).
--subdatasets
Print subdataset identifiers.
-v, --tell-me-more, --verbose
Output extra information.
-b, --bidx <bidx>
Input file band index (default: 1).
--masked, --not-masked
Evaluate expressions using masked arrays (the default) or ordinary numpy arrays.
Arguments
INPUT
Required argument
rio insp
Options
--ipython
Use IPython as interpreter.
-m, --mode <mode>
File mode (default ‘r’).
Options
r | r+
Arguments
INPUT
Required argument
rio mask
Masks in raster using GeoJSON features (masks out all areas not covered by features), and optionally crops the output
raster to the extent of the features. Features are assumed to be in the same coordinate reference system as the input
raster.
GeoJSON must be the first input file or provided from stdin:
> rio mask input.tif output.tif –geojson-mask features.json
> rio mask input.tif output.tif –geojson-mask - < features.json
If the output raster exists, it will be completely overwritten with the results of this operation.
The result is always equal to or within the bounds of the input raster.
–crop and –invert options are mutually exclusive.
–crop option is not valid if features are completely outside extent of input raster.
Options
Arguments
INPUTS... OUTPUT
Required argument(s)
rio merge
Options
Options
first | last | min | max | sum | count
--nodata <NUMBER|nan>
Set a Nodata value.
-t, --dtype <dtype>
Output data type.
Options
ubyte | uint8 | uint16 | int16 | uint32 | int32 | float32 | float64
-b, --bidx <bidx>
Indexes of input file bands.
--overwrite
Always overwrite an existing output file.
--precision <precision>
Unused, deprecated, and will be removed in 2.0.0.
--co, --profile <NAME=VALUE>
Driver specific creation options. See the documentation for the selected output driver for more information.
Arguments
INPUTS... OUTPUT
Required argument(s)
rio overview
Options
--build <f1,f2,...|b^min..max|auto>
A sequence of decimation factors specified as comma-separated list of numbers or a base and range of exponents,
or ‘auto’ to automatically determine the maximum factor.
--ls
Print the overviews for each band.
--rebuild
Reconstruct existing overviews.
--resampling <resampling>
Resampling algorithm.
Default
nearest
Options
nearest | bilinear | cubic | cubic_spline | lanczos | average | mode | gauss | rms
Arguments
INPUT
Required argument
rio rasterize
Note
The GeoJSON is not projected to match the coordinate reference system of the output or –like rasters at this time. This
functionality may be added in the future.
Options
Arguments
INPUTS... OUTPUT
Optional argument(s)
rio rm
Delete a dataset.
Invoking the shell’s ‘$ rm <path>’ on a dataset can be used to delete a dataset referenced by a file path, but it won’t
handle deleting side car files. This command is aware of datasets and their sidecar files.
Options
--yes
Confirm delete without prompting
-f, --format, --driver <driver>
Explicitly delete with this driver rather than probing for the appropriate driver.
Arguments
PATH
Required argument
rio sample
By default, rio-sample will sample all bands. Optionally, bands may be specified using a simple syntax:
–bidx N samples the Nth band (first band is 1).
—bidx M,N,0 samples bands M, N, and O.
Options
Arguments
rio shapes
Extracts shapes from one band or mask of a dataset and writes them out as GeoJSON. Unless otherwise specified, the
shapes will be transformed to WGS 84 coordinates.
The default action of this command is to extract shapes from the first band of the input dataset. The shapes are polygons
bounding contiguous regions (or features) of the same raster value. This command performs poorly for int16 or float
type datasets.
Bands other than the first can be specified using the –bidx option:
$ rio shapes –bidx 3 tests/data/RGB.byte.tif
The valid data footprint of a dataset’s i-th band can be extracted by using the –mask and –bidx options:
$ rio shapes –mask –bidx 1 tests/data/RGB.byte.tif
Omitting the –bidx option results in a footprint extracted from the conjunction of all band masks. This is generally
smaller than any individual band’s footprint.
A dataset band may be analyzed as though it were a binary mask with the –as-mask option:
$ rio shapes –as-mask –bidx 1 tests/data/RGB.byte.tif
Options
Arguments
INPUT
Required argument
rio stack
Stack a number of bands from one or more input files into a multiband dataset.
Input datasets must be of a kind: same data type, dimensions, etc. The output is cloned from the first input.
By default, rio-stack will take all bands from each input and write them in same order to the output. Optionally, bands
for each input may be specified using a simple syntax:
–bidx N takes the Nth band from the input (first band is 1).
—bidx M,N,0 takes bands M, N, and O.
–bidx M..O takes bands M-O, inclusive.
—bidx ..N takes all bands up to and including N.
–bidx N.. takes all bands from N to the end.
Examples, using the Rasterio testing dataset, which produce a copy.
rio stack RGB.byte.tif -o stacked.tif
rio stack RGB.byte.tif –bidx 1,2,3 -o stacked.tif
rio stack RGB.byte.tif –bidx 1..3 -o stacked.tif
rio stack RGB.byte.tif –bidx ..2 RGB.byte.tif –bidx 3.. -o stacked.tif
Options
Arguments
INPUTS... OUTPUT
Required argument(s)
rio transform
Options
Arguments
INPUT
Optional argument
rio warp
The destination’s coordinate reference system may be an authority name, PROJ4 string, JSON-encoded PROJ4, or
WKT.
–dst-crs EPSG:4326
–dst-crs ‘+proj=longlat +ellps=WGS84 +datum=WGS84’
–dst-crs ‘{“proj”: “utm”, “zone”: 18, . . . }’
If –dimensions are provided, –res and –bounds are not applicable and an exception will be raised. Resolution is calcu-
lated based on the relationship between the raster bounds in the target coordinate system and the dimensions, and may
produce rectangular rather than square pixels.
If –bounds are provided, –res is required if –dst-crs is provided (defaults to source raster resolution otherwise).
Options
--resampling <resampling>
Resampling method.
Default
nearest
Options
nearest | bilinear | cubic | cubic_spline | lanczos | average | mode | max | min | med | q1 | q3 | sum
| rms
--src-nodata <src_nodata>
Manually override source nodata
--dst-nodata <dst_nodata>
Manually override destination nodata
--threads <threads>
Number of processing threads.
--check-invert-proj, --no-check-invert-proj
Constrain output to valid coordinate region in dst-crs
--target-aligned-pixels, --no-target-aligned-pixels
align the output bounds based on the resolution
--overwrite
Always overwrite an existing output file.
--co, --profile <NAME=VALUE>
Driver specific creation options. See the documentation for the selected output driver for more information.
--to, --wo, --transformer-option, --warper-option <NAME=VALUE>
GDAL warper and coordinate transformer options.
--dry-run
Do not create an output file, but report on its expected size and other characteristics.
Arguments
INPUTS... OUTPUT
Required argument(s)
6.1.2 Submodules
rasterio.control module
rasterio.coords module
left
Left coordinate
bottom
Bottom coordinate
right
Right coordinate
top
Top coordinate
bottom
Alias for field number 1
left
Alias for field number 0
right
Alias for field number 2
top
Alias for field number 3
rasterio.coords.disjoint_bounds(bounds1, bounds2)
Compare two bounds and determine if they are disjoint.
Parameters
• bounds1 (4-tuple) – rasterio bounds tuple (left, bottom, right, top)
• bounds2 (4-tuple) – rasterio bounds tuple
Returns
• boolean
• True if bounds are disjoint,
• False if bounds overlap
rasterio.crs module
Examples
data
A PROJ4 dict representation of the CRS.
static from_authority(auth_name, code)
Make a CRS from an authority name and code.
New in version 1.1.7.
Parameters
• auth_name (str) – The name of the authority.
• code (int or str) – The code used by the authority.
Return type
CRS
Raises
CRSError –
Notes
is_epsg_code
Test if the CRS is defined by an EPSG code.
Return type
bool
is_geographic
Test if the CRS is a geographic coordinate reference system.
Return type
bool
Raises
CRSError –
is_projected
Test if the CRS is a projected coordinate reference system.
Return type
bool
Raises
CRSError –
is_valid
Test that the CRS is a geographic or projected CRS.
Return type
bool
items(self )
linear_units
Get a short name for the linear units of the CRS.
Returns
units – “m”, “ft”, etc.
Return type
str
Raises
CRSError –
linear_units_factor
Get linear units and the conversion factor to meters of the CRS.
Returns
• units (str) – “m”, “ft”, etc.
• factor (float) – Ratio of one unit to one meter.
Raises
CRSError –
to_authority(self , confidence_threshold=70)
Convert to the best match authority name and code.
For a CRS created using an EPSG code, that same value is returned. For other CRS, including custom
CRS, an attempt is made to match it to definitions in authority files. Matches with a confidence below the
threshold are discarded.
Parameters
confidence_threshold (int) – Percent match confidence threshold (0-100).
Returns
• name (str) – Authority name.
• code (str) – Code from the authority file.
• or None
to_dict(self , projjson=False)
Convert CRS to a PROJ dict.
Note: If there is a corresponding EPSG code, it will be used when returning PROJ parameter dict.
Parameters
confidence_threshold (int) – Percent match confidence threshold (0-100).
Return type
int or None
Raises
CRSError –
to_proj4(self )
Convert to a PROJ4 representation.
Return type
str
to_string(self )
Convert to a PROJ4 or WKT string.
The output will be reduced as much as possible by attempting a match to CRS defined in authority files.
Notes
Mapping keys are tested against the all_proj_keys list. Values of True are omitted, leaving the key bare:
{‘no_defs’: True} -> “+no_defs” and items where the value is otherwise not a str, int, or float are omitted.
Return type
str
Raises
CRSError –
to_wkt(self , morph_to_esri_dialect=False, version=None)
Convert to a OGC WKT representation.
New in version 1.3.0: version
Parameters
• morph_to_esri_dialect (bool, optional) – Whether or not to morph to the Esri
dialect of WKT Only applies to GDAL versions < 3. This parameter will be removed in a
future version of rasterio.
• version (WktVersion or str, optional) – The version of the WKT output. Only
works with GDAL 3+. Default is WKT1_GDAL.
Return type
str
Raises
CRSError –
units_factor
Get units and the conversion factor of the CRS.
Returns
• units (str) – “m”, “ft”, etc.
• factor (float) – Ratio of one unit to one radian if the CRS is geographic otherwise, it is to
one meter.
Raises
CRSError –
wkt
An OGC WKT representation of the CRS
Return type
str
rasterio.crs.epsg_treats_as_latlong(input_crs)
Test if the CRS is in latlon order
From GDAL docs:
> This method returns TRUE if EPSG feels this geographic coordinate system should be treated as having lat/long
coordinate ordering.
> Currently this returns TRUE for all geographic coordinate systems with an EPSG code set, and axes set defining
it as lat, long.
> FALSE will be returned for all coordinate systems that are not geographic, or that do not have an EPSG code
set.
> Note
> Important change of behavior since GDAL 3.0. In previous versions, geographic CRS imported with import-
FromEPSG() would cause this method to return FALSE on them, whereas now it returns TRUE, since import-
FromEPSG() is now equivalent to importFromEPSGA().
Parameters
input_crs (CRS) – Coordinate reference system, as a rasterio CRS object Example: CRS({‘init’:
‘EPSG:4326’})
Return type
bool
rasterio.crs.epsg_treats_as_northingeasting(input_crs)
Test if the CRS should be treated as having northing/easting coordinate ordering
From GDAL docs:
> This method returns TRUE if EPSG feels this projected coordinate system should be treated as having nor-
thing/easting coordinate ordering.
> Currently this returns TRUE for all projected coordinate systems with an EPSG code set, and axes set defining
it as northing, easting.
> FALSE will be returned for all coordinate systems that are not projected, or that do not have an EPSG code set.
> Note
> Important change of behavior since GDAL 3.0. In previous versions, projected CRS with northing, easting
axis order imported with importFromEPSG() would cause this method to return FALSE on them, whereas now
it returns TRUE, since importFromEPSG() is now equivalent to importFromEPSGA().
Parameters
input_crs (CRS) – Coordinate reference system, as a rasterio CRS object Example: CRS({‘init’:
‘EPSG:4326’})
Return type
bool
rasterio.drivers module
Returns
Map of extensions to the driver.
Return type
dict
rasterio.dtypes module
Parameters
values (list-like) –
Return type
rasterio dtype string
rasterio.dtypes.in_dtype_range(value, dtype)
Test if the value is within the dtype’s range of values, Nan, or Inf.
rasterio.dtypes.is_ndarray(array)
Check if array is a ndarray.
rasterio.dtypes.validate_dtype(values, valid_dtypes)
Test if dtype of values is one of valid_dtypes.
Parameters
• values (list-like) –
• valid_dtypes (list-like) – list of valid dtype strings, e.g., (‘int16’, ‘int32’)
Returns
True if dtype of values is one of valid_dtypes
Return type
boolean
rasterio.enums module
Enumerations.
class rasterio.enums.ColorInterp(value, names=None, *, module=None, qualname=None, type=None,
start=1, boundary=None)
Bases: IntEnum
Raster band color interpretation.
Cb = 15
Cr = 16
Y = 14
alpha = 6
black = 13
blue = 5
cyan = 10
gray = 1
green = 4
grey = 1
hue = 7
lightness = 9
magenta = 11
palette = 2
red = 3
saturation = 8
undefined = 0
yellow = 12
ccittfax4 = 'CCITTFAX4'
ccittrle = 'CCITTRLE'
deflate = 'DEFLATE'
jpeg = 'JPEG'
jpeg2000 = 'JPEG2000'
lerc = 'LERC'
lerc_deflate = 'LERC_DEFLATE'
lerc_zstd = 'LERC_ZSTD'
lzma = 'LZMA'
lzw = 'LZW'
none = 'NONE'
packbits = 'PACKBITS'
webp = 'WEBP'
zstd = 'ZSTD'
line = 'LINE'
pixel = 'PIXEL'
all_valid = 1
alpha = 4
nodata = 8
per_dataset = 2
replace = 'REPLACE'
cielab = 'CIELAB'
cmyk = 'CMYK'
icclab = 'ICCLAB'
itulab = 'ITULAB'
rgb = 'RGB'
white = 'MINISWHITE'
ycbcr = 'YCbCr'
average
Average resampling, computes the weighted average of all non-NODATA contributing pixels.
mode
Mode resampling, selects the value which appears most often of all the sampled points.
gauss
Gaussian resampling, Note: not available to the functions in rio.warp.
max
Maximum resampling, selects the maximum value from all non-NODATA contributing pixels. (GDAL >=
2.0)
min
Minimum resampling, selects the minimum value from all non-NODATA contributing pixels. (GDAL >=
2.0)
med
Median resampling, selects the median value of all non-NODATA contributing pixels. (GDAL >= 2.0)
q1
Q1, first quartile resampling, selects the first quartile value of all non-NODATA contributing pixels. (GDAL
>= 2.0)
q3
Q3, third quartile resampling, selects the third quartile value of all non-NODATA contributing pixels.
(GDAL >= 2.0)
sum
Sum, compute the weighted sum of all non-NODATA contributing pixels. (GDAL >= 3.1)
rms
RMS, root mean square / quadratic mean of all non-NODATA contributing pixels. (GDAL >= 3.3)
Notes
The first 8, ‘nearest’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘lanczos’, ‘average’, ‘mode’, and ‘gauss’, are available for
making dataset overviews.
‘max’, ‘min’, ‘med’, ‘q1’, ‘q3’ are only supported in GDAL >= 2.0.0.
‘nearest’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘lanczos’, ‘average’, ‘mode’ are always available (GDAL >= 1.10).
‘sum’ is only supported in GDAL >= 3.1.
‘rms’ is only supported in GDAL >= 3.3.
Note: ‘gauss’ is not available to the functions in rio.warp.
average = 5
bilinear = 1
cubic = 2
cubic_spline = 3
gauss = 7
lanczos = 4
max = 8
med = 10
min = 9
mode = 6
nearest = 0
q1 = 11
q3 = 12
rms = 14
sum = 13
Notes
The convention for transform direction for RPC based coordinate transform is typically the opposite of what is
previously described. For consistency all coordinate transforms methods use the same convention.
forward = 1
reverse = 0
gcps = 'gcps'
rpcs = 'rpcs'
WKT1_ESRI = 'WKT1_ESRI'
WKT Version 1 ESRI Style
WKT1_GDAL = 'WKT1_GDAL'
WKT Version 1 GDAL Style
WKT2 = 'WKT2'
Alias for latest WKT Version 2
WKT2_2015 = 'WKT2_2015'
WKT Version 2 from 2015
WKT2_2019 = 'WKT2_2018'
WKT Version 2 from 2019
rasterio.env module
Example
credentialize()
Get credentials and configure GDAL
Note well: this method is a no-op if the GDAL environment already has credentials, unless session is not
None.
Return type
None
classmethod default_options()
Default configuration options
Parameters
None –
Return type
dict
drivers()
Return a mapping of registered drivers.
classmethod from_defaults(*args, **kwargs)
Create an environment with default config options
Parameters
• args (optional) – Positional arguments for Env()
• kwargs (optional) – Keyword arguments for Env()
Return type
Env
Notes
major
minor
classmethod parse(input)
Parses input tuple or string to GDALVersion. If input is a GDALVersion instance, it is returned.
Parameters
input (tuple of (major, minor), string, or instance of GDALVersion) –
Return type
GDALVersion instance
classmethod runtime()
Return GDALVersion of current GDAL runtime
class rasterio.env.NullContextManager
Bases: object
class rasterio.env.ThreadEnv
Bases: _local
rasterio.env.defenv(**options)
Create a default environment if necessary.
rasterio.env.delenv()
Delete options in the existing environment.
rasterio.env.ensure_env(f )
A decorator that ensures an env exists before a function calls any GDAL C functions.
rasterio.env.ensure_env_credentialled(f )
DEPRECATED alias for ensure_env_with_credentials
rasterio.env.ensure_env_with_credentials(f )
Ensures a config environment exists and is credentialized
Parameters
f (function) – A function.
Return type
A function wrapper.
Notes
The function wrapper checks the first argument of f and credentializes the environment if the first argument is a
URI with scheme “s3”.
rasterio.env.env_ctx_if_needed()
Return an Env if one does not exist
Return type
Env or a do-nothing context manager
rasterio.env.getenv()
Get a mapping of current options.
rasterio.env.hascreds()
rasterio.env.hasenv()
Examples
@require_gdal_version('2.2')
def some_func():
calling some_func with a runtime version of GDAL that is < 2.2 raises a GDALVersionErorr.
@require_gdal_version('2.2', param='foo')
def some_func(foo='bar'):
calling some_func with parameter foo of any value on GDAL < 2.2 raises a GDALVersionError.
calling some_func with parameter foo and value bar on GDAL < 2.2 raises a GDALVersionError.
Parameters
• version (tuple, string, or GDALVersion) –
• param (string (optional, default: None)) – If values are absent, then all use of
this parameter with a value other than default value requires at least GDAL version.
• values (tuple, list, or set (optional, default: None)) – contains values
that require at least GDAL version. param is required for values.
• is_max_version (bool (optional, default: False)) – if True indicates that the
version provided is the maximum version allowed, instead of requiring at least that version.
• reason (string (optional: default: '')) – custom error message presented to user
in addition to message about GDAL version. Use this to provide an explanation of what
changed if necessary context to the user.
Return type
wrapped function
rasterio.env.setenv(**options)
Set options in the existing environment.
rasterio.errors module
exception rasterio.errors.DriverRegistrationError
Bases: ValueError
Raised when a format driver is requested but is not registered.
exception rasterio.errors.EnvError
Bases: RasterioError
Raised when the state of GDAL/AWS environment cannot be created or modified.
exception rasterio.errors.FileOverwriteError(message)
Bases: FileError
Raised when Rasterio’s CLI refuses to clobber output files.
exception rasterio.errors.GDALBehaviorChangeException
Bases: RuntimeError
Raised when GDAL’s behavior differs from the given arguments. For example, antimeridian cutting is always on
as of GDAL 2.2.0. Users expecting it to be off will be presented with a MultiPolygon when the rest of their code
expects a Polygon.
Examples
exception rasterio.errors.GDALOptionNotImplementedError
Bases: RasterioError
A dataset opening or dataset creation option can’t be supported
This will be raised from Rasterio’s shim modules. For example, when a user passes arguments to open_dataset()
that can’t be evaluated by GDAL 1.x.
exception rasterio.errors.GDALVersionError
Bases: RasterioError
Raised if the runtime version of GDAL does not meet the required version of GDAL.
exception rasterio.errors.InvalidArrayError
Bases: RasterioError
Raised when methods are passed invalid arrays
exception rasterio.errors.NodataShadowWarning
Bases: UserWarning
Warn that a dataset’s nodata attribute is shadowing its alpha band.
exception rasterio.errors.NotGeoreferencedWarning
Bases: UserWarning
Warn that a dataset isn’t georeferenced.
exception rasterio.errors.OverviewCreationError
Bases: RasterioError
Raised when creation of an overview fails
exception rasterio.errors.PathError
Bases: RasterioError
Raised when a dataset path is malformed or invalid
exception rasterio.errors.RPCError
Bases: ValueError
Raised when RPC transformation is invalid
exception rasterio.errors.RasterBlockError
Bases: RasterioError
Raised when raster block access fails
exception rasterio.errors.RasterioDeprecationWarning
Bases: FutureWarning
Rasterio module deprecations
Following https://www.python.org/dev/peps/pep-0565/#additional-use-case-for-futurewarning we base this on
FutureWarning while continuing to support Python < 3.7.
exception rasterio.errors.RasterioError
Bases: Exception
Root exception class
exception rasterio.errors.RasterioIOError
Bases: OSError
Raised when a dataset cannot be opened using one of the registered format drivers.
exception rasterio.errors.ResamplingAlgorithmError
Bases: RasterioError
Raised when a resampling algorithm is invalid or inapplicable
exception rasterio.errors.ShapeSkipWarning
Bases: UserWarning
Warn that an invalid or empty shape in a collection has been skipped
exception rasterio.errors.StatisticsError
Bases: RasterioError
Raised when dataset statistics cannot be computed.
exception rasterio.errors.TransformError
Bases: RasterioError
Raised when transform arguments are invalid
exception rasterio.errors.TransformWarning
Bases: UserWarning
Warn that coordinate transformations may behave unexpectedly
exception rasterio.errors.UnsupportedOperation
Bases: RasterioError
Raised when reading from a file opened in ‘w’ mode
exception rasterio.errors.WarpOperationError
Bases: RasterioError
Raised when a warp operation fails.
exception rasterio.errors.WarpOptionsError
Bases: RasterioError
Raised when options for a warp operation are invalid
exception rasterio.errors.WarpedVRTError
Bases: RasterioError
Raised when WarpedVRT can’t be initialized
exception rasterio.errors.WindowError
Bases: RasterioError
Raised when errors occur during window operations
exception rasterio.errors.WindowEvaluationError
Bases: ValueError
Raised when window evaluation fails
rasterio.features module
Notes
• north_up (optional) – This parameter is ignored since version 1.2.1. A deprecation warn-
ing will be emitted in 1.3.0.
• rotated (optional) – This parameter is ignored since version 1.2.1. A deprecation warn-
ing will be emitted in 1.3.0.
• pixel_precision (int or float, optional) – Number of places of rounding preci-
sion or absolute precision for evaluating bounds of shapes.
• boundless (bool, optional) – Whether to allow a boundless window or not.
Return type
rasterio.windows.Window
rasterio.features.is_valid_geom(geom)
Checks to see if geometry is a valid GeoJSON geometry type or GeometryCollection. Geometry must be GeoJ-
SON or implement the geo interface.
Geometries must be non-empty, and have at least x, y coordinates.
Note: only the first coordinate is checked for validity.
Parameters
geom (an object that implements the geo interface or GeoJSON-like object)
–
Returns
bool
Return type
True if object is a valid GeoJSON geometry type
rasterio.features.rasterize(shapes, out_shape=None, fill=0, out=None, transform=(1.0, 0.0, 0.0, 0.0, 1.0,
0.0, 0.0, 0.0, 1.0), all_touched=False, merge_alg=MergeAlg.replace,
default_value=1, dtype=None)
Return an image array with input geometries burned in.
Warnings will be raised for any invalid or empty geometries, and an exception will be raised if there are no valid
shapes to rasterize.
Parameters
• shapes (iterable of (geometry, value) pairs or geometries) – The geometry can either be an
object that implements the geo interface or GeoJSON-like object. If no value is provided the
default_value will be used. If value is None the fill value will be used.
• out_shape (tuple or list with 2 integers) – Shape of output numpy.ndarray.
• fill (int or float, optional) – Used as fill value for all areas not covered by input
geometries.
• out (numpy.ndarray, optional) – Array in which to store results. If not provided,
out_shape and dtype are required.
• transform (Affine transformation object, optional) – Transformation from
pixel coordinates of source to the coordinate system of the input shapes. See the transform
property of dataset objects.
• all_touched (boolean, optional) – If True, all pixels touched by geometries will be
burned in. If false, only pixels whose center is within the polygon or that are selected by
Bresenham’s line algorithm will be burned in.
• merge_alg (MergeAlg, optional) –
Notes
Valid data types for fill, default_value, out, dtype and shape values are “int16”, “int32”, “uint8”, “uint16”,
“uint32”, “float32”, and “float64”.
This function requires significant memory resources. The shapes iterator will be materialized to a Python list and
another C copy of that list will be made. The out array will be copied and additional temporary raster memory
equal to 2x the smaller of out data or GDAL’s max cache size (controlled by GDAL_CACHEMAX, default is
5% of the computer’s physical memory) is required.
If GDAL max cache size is smaller than the output data, the array of shapes will be iterated multiple times.
Performance is thus a linear function of buffer size. For maximum speed, ensure that GDAL_CACHEMAX is
larger than the size of out or out_shape.
rasterio.features.shapes(source, mask=None, connectivity=4, transform=(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
1.0))
Get shapes and values of connected regions in a dataset or array.
Parameters
• source (numpy.ndarray, dataset object, Band, or tuple(dataset, bidx)) –
Data type must be one of rasterio.int16, rasterio.int32, rasterio.uint8, rasterio.uint16, or ras-
terio.float32.
• mask (numpy.ndarray or rasterio Band object, optional) – Must evaluate to
bool (rasterio.bool_ or rasterio.uint8). Values of False or 0 will be excluded from feature
generation. Note well that this is the inverse sense from Numpy’s, where a mask value of
True indicates invalid data in an array. If source is a numpy.ma.MaskedArray and mask is
None, the source’s mask will be inverted and used in place of mask.
• connectivity (int, optional) – Use 4 or 8 pixel connectivity for grouping pixels into
features
• transform (Affine transformation, optional) – If not provided, feature coordi-
nates will be generated based on pixel coordinates
Yields
polygon, value – A pair of (polygon, value) for each feature found in the image. Polygons are
GeoJSON-like dicts and the values are the associated value from the image, in the data type of
the image. Note: due to floating point precision issues, values returned from a floating point
image may not exactly match the original values.
Notes
The amount of memory used by this algorithm is proportional to the number and complexity of polygons pro-
duced. This algorithm is most appropriate for simple thematic data. Data with high pixel-to-pixel variability,
such as imagery, may produce one polygon per pixel and consume large amounts of memory.
Because the low-level implementation uses either an int32 or float32 buffer, uint32 and float64 data cannot be
operated on without truncation issues.
rasterio.features.sieve(source, size, out=None, mask=None, connectivity=4)
Remove small polygon regions from a raster.
Polygons are found for each set of neighboring pixels of the same value.
Parameters
• source (ndarray, dataset, or Band) – The source is a 2 or 3-D ndarray, a dataset
opened in “r” mode, or a single or a multiple Rasterio Band object. Must be of type raste-
rio.int16, rasterio.int32, rasterio.uint8, rasterio.uint16, or rasterio.float32
• size (int) – minimum polygon size (number of pixels) to retain.
• out (numpy ndarray, optional) – Array of same shape and data type as source in which
to store results.
• mask (numpy ndarray or rasterio Band object, optional) – Values of False or
0 will be excluded from feature generation Must evaluate to bool (rasterio.bool_ or raste-
rio.uint8)
• connectivity (int, optional) – Use 4 or 8 pixel connectivity for grouping pixels into
features
Returns
out – Result
Return type
numpy.ndarray
Notes
GDAL only supports values that can be cast to 32-bit integers for this operation.
The amount of memory used by this algorithm is proportional to the number and complexity of polygons found in
the image. This algorithm is most appropriate for simple thematic data. Data with high pixel-to-pixel variability,
such as imagery, may produce one polygon per pixel and consume large amounts of memory.
rasterio.fill module
rasterio.io module
block_size(bidx, i, j)
Returns the size in bytes of a particular block
Only useful for TIFF formatted datasets.
Parameters
• bidx (int) – Band index, starting with 1.
• i (int) – Row index of the block, starting with 0.
• j (int) – Column index of the block, starting with 0.
Return type
int
block_window(bidx, i, j)
Returns the window for a particular block
Parameters
• bidx (int) – Band index, starting with 1.
• i (int) – Row index of the block, starting with 0.
• j (int) – Column index of the block, starting with 0.
Return type
Window
block_windows(bidx=0)
Iterator over a band’s blocks and their windows
The primary use of this method is to obtain windows to pass to read() for highly efficient access to raster
block data.
The positional parameter bidx takes the index (starting at 1) of the desired band. This iterator yields blocks
“left to right” and “top to bottom” and is similar to Python’s enumerate() in that the first element is the
block index and the second is the dataset window.
Blocks are built-in to a dataset and describe how pixels are grouped within each band and provide a mech-
anism for efficient I/O. A window is a range of pixels within a single band defined by row start, row stop,
column start, and column stop. For example, ((0, 2), (0, 2)) defines a 2 x 2 window at the upper
left corner of a raster band. Blocks are referenced by an (i, j) tuple where (0, 0) would be a band’s
upper left block.
Raster I/O is performed at the block level, so accessing a window spanning multiple rows in a striped raster
requires reading each row. Accessing a 2 x 2 window at the center of a 1800 x 3600 image requires
reading 2 rows, or 7200 pixels just to get the target 4. The same image with internal 256 x 256 blocks
would require reading at least 1 block (if the window entire window falls within a single block) and at most
4 blocks, or at least 512 pixels and at most 2048.
Given an image that is 512 x 512 with blocks that are 256 x 256, its blocks and windows would look
like:
Blocks:
0 256 512
0 +--------+--------+
| | |
| (0, 0) | (0, 1) |
| | |
(continues on next page)
Windows:
Parameters
bidx (int, optional) – The band index (using 1-based indexing) from which to extract
windows. A value less than 1 uses the first band if all bands have homogeneous windows and
raises an exception otherwise.
Yields
block, window
bounds
Returns the lower left and upper right bounds of the dataset in the units of its coordinate reference system.
The returned value is a tuple: (lower left x, lower left y, upper right x, upper right y)
build_overviews(factors, resampling=Resampling.nearest)
Build overviews at one or more decimation factors for all bands of the dataset.
checksum(bidx, window=None)
Compute an integer checksum for the stored band
Parameters
• bidx (int) – The band’s index (1-indexed).
• window (tuple, optional) – A window of the band. Default is the entire extent of the
band.
Return type
An int.
close()
Close the dataset and unwind attached exit stack.
closed
Test if the dataset is closed
Return type
bool
colorinterp
A sequence of ColorInterp.<enum> in band order.
Return type
tuple
colormap(bidx)
Returns a dict containing the colormap for a band.
Parameters
bidx (int) – Index of the band whose colormap will be returned. Band index starts at 1.
Returns
Mapping of color index value (starting at 0) to RGBA color as a 4-element tuple.
Return type
dict
Raises
• ValueError – If no colormap is found for the specified band (NULL color table).
• IndexError – If no band exists for the provided index.
compression
count
The number of raster bands in the dataset
Return type
int
crs
The dataset’s coordinate reference system
In setting this property, the value may be a CRS object or an EPSG:nnnn or WKT string.
Return type
CRS
dataset_mask(out=None, out_shape=None, window=None, boundless=False,
resampling=Resampling.nearest)
Get the dataset’s 2D valid data mask.
Parameters
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional reference
to an output array with the same dimensions and shape into which data will be placed.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
Cannot be combined with out_shape.
• out_shape (tuple, optional) – A tuple describing the output array’s shape. Allows
for decimated reads without constructing an output Numpy array.
Cannot be combined with out.
• window (a pair (tuple) of pairs of ints or Window, optional) – The op-
tional window argument is a 2 item tuple. The first item is a tuple containing the indexes
of the rows at which the window starts and stops and the second is a tuple containing the
indexes of the columns at which the window starts and stops. For example, ((0, 2), (0, 2))
defines a 2x2 window at the upper left of the raster dataset.
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
Returns
The dtype of this array is uint8. 0 = nodata, 255 = valid data.
Return type
Numpy ndarray or a view on a Numpy ndarray
Notes
Note: as with Numpy ufuncs, an object is returned even if you use the optional out argument and the return
value shall be preferentially used by callers.
The dataset mask is calculated based on the individual band masks according to the following logic, in
order of precedence:
1. If a .msk file, dataset-wide alpha, or internal mask exists it will be used for the dataset mask.
2. Else if the dataset is a 4-band with a shadow nodata value, band 4 will be used as the dataset mask.
3. If a nodata value exists, use the binary OR (|) of the band masks 4. If no nodata value exists, return a
mask filled with 255.
Note that this differs from read_masks and GDAL RFC15 in that it applies per-dataset, not per-band (see
https://trac.osgeo.org/gdal/wiki/rfc15_nodatabitmask)
descriptions
Descriptions for each dataset band
To set descriptions, one for each band is required.
Return type
tuple[str | None, . . . ]
driver
dtypes
The data types of each band in index order
Return type
list of str
files
Returns a sequence of files associated with the dataset.
Return type
tuple
gcps
ground control points and their coordinate reference system.
This property is a 2-tuple, or pair: (gcps, crs).
gcps
[list of GroundControlPoint] Zero or more ground control points.
crs: CRS
The coordinate reference system of the ground control points.
get_gcps()
Get GCPs and their associated CRS.
get_nodatavals()
interleaving
is_tiled
Examples
For a 3 band dataset that has masks derived from nodata values:
>>> dataset.mask_flag_enums
([<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>])
>>> band1_flags = dataset.mask_flag_enums[0]
>>> rasterio.enums.MaskFlags.nodata in band1_flags
True
>>> rasterio.enums.MaskFlags.alpha in band1_flags
False
meta
The basic metadata of this dataset.
mode
name
nodata
The dataset’s single nodata value
Notes
May be set.
Return type
float
nodatavals
Nodata values for each band
Notes
overviews(bidx)
photometric
profile
Basic metadata and creation options of this dataset.
May be passed as keyword arguments to rasterio.open() to create a clone of this dataset.
read(indexes=None, out=None, window=None, masked=False, out_shape=None, boundless=False,
resampling=Resampling.nearest, fill_value=None, out_dtype=None)
Read band data and, optionally, mask as an array.
A smaller (or larger) region of the dataset may be specified and it may be resampled and/or converted to a
different data type.
Parameters
• indexes (int or list, optional) – If indexes is a list, the result is a 3D array, but is
a 2D array if it is a band index number.
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional refer-
ence to an output array into which data will be placed. If the height and width of out differ
from that of the specified window (see below), the raster image will be decimated or repli-
cated using the specified resampling method (also see below). This parameter cannot be
combined with out_shape.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
• out_dtype (str or numpy.dtype) – The desired output data type. For example: ‘uint8’
or rasterio.uint16.
• out_shape (tuple, optional) – A tuple describing the shape of a new output array.
See out (above) for notes on image decimation and replication. This parameter cannot be
combined with out.
• window (Window, optional) – The region (slice) of the dataset from which data will be
read. The default is the entire dataset.
• masked (bool, optional) – If masked is True the return value will be a masked array.
Otherwise (the default) the return value will be a regular array. Masks will be exactly the
inverse of the GDAL RFC 15 conforming arrays returned by read_masks().
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
• fill_value (scalar) – Fill value applied in the boundless=True case only. Like the
fill_value of numpy.ma.MaskedArray, should be value valid for the dataset’s data type.
Return type
Numpy ndarray or a view on a Numpy ndarray
Raises
RasterioIOError – If the write fails.
Notes
This data is read from the dataset’s band cache, which means that repeated reads of the same windows may
avoid I/O.
As with Numpy ufuncs, an object is returned even if you use the optional out argument and the return value
shall be preferentially used by callers.
read_crs()
Return the GDAL dataset’s stored CRS
read_masks(indexes=None, out=None, out_shape=None, window=None, boundless=False,
resampling=Resampling.nearest)
Read band masks as an array.
A smaller (or larger) region of the dataset may be specified and it may be resampled and/or converted to a
different data type.
Parameters
• indexes (int or list, optional) – If indexes is a list, the result is a 3D array, but is
a 2D array if it is a band index number.
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional refer-
ence to an output array into which data will be placed. If the height and width of out differ
from that of the specified window (see below), the raster image will be decimated or repli-
cated using the specified resampling method (also see below). This parameter cannot be
combined with out_shape.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
• out_shape (tuple, optional) – A tuple describing the shape of a new output array.
See out (above) for notes on image decimation and replication. This parameter cannot be
combined with out.
• window (Window, optional) – The region (slice) of the dataset from which data will be
read. The default is the entire dataset.
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
Return type
Numpy ndarray or a view on a Numpy ndarray
Raises
RasterioIOError – If the write fails.
Notes
This data is read from the dataset’s band cache, which means that repeated reads of the same windows may
avoid I/O.
As with Numpy ufuncs, an object is returned even if you use the optional out argument and the return value
shall be preferentially used by callers.
read_transform()
Return the stored GDAL GeoTransform
res
Returns the (width, height) of pixels in the units of its coordinate reference system.
rpcs
Rational polynomial coefficients mapping between pixel and geodetic coordinates.
This property is a dict-like object.
rpcs : RPC instance containing coefficients. Empty if dataset does not have any metadata in the “RPC”
domain.
sample(xy, indexes=None, masked=False)
Get the values of a dataset at certain positions
Values are from the nearest pixel. They are not interpolated.
Parameters
• xy (iterable) – Pairs of x, y coordinates (floats) in the dataset’s reference system.
• indexes (int or list of int) – Indexes of dataset bands to sample.
• masked (bool, default: False) – Whether to mask samples that fall outside the ex-
tent of the dataset.
Returns
Arrays of length equal to the number of specified indexes containing the dataset values for
the bands corresponding to those indexes.
Return type
iterable
scales
Raster scale for each dataset band
To set scales, one for each band is required.
Return type
list of float
set_band_description(bidx, value)
Sets the description of a dataset band.
Parameters
• bidx (int) – Index of the band (starting with 1).
• value (string) – A description of the band.
Return type
None
set_band_unit(bidx, value)
Sets the unit of measure of a dataset band.
Parameters
• bidx (int) – Index of the band (starting with 1).
• value (str) – A label for the band’s unit of measure such as ‘meters’ or ‘degC’. See the
Pint project for a suggested list of units.
Return type
None
shape
start()
Start the dataset’s life cycle
statistics(bidx, approx=False, clear_cache=False)
Get min, max, mean, and standard deviation of a raster band.
Parameters
• bidx (int) – The band’s index (1-indexed).
• approx (bool, optional) – If True, statistics will be calculated from reduced resolution
data.
• clear_cache (bool, optional) – If True, saved stats will be deleted and statistics will
be recomputed. Requires GDAL version >= 3.2.
Return type
Statistics
Notes
GDAL will preferentially use statistics kept in raster metadata like images tags or an XML sidecar. If that
metadata is out of date, the statistics may not correspond to the actual data.
Additionally, GDAL will save statistics to file metadata as a side effect if that metadata does not already
exist.
stop()
Close the GDAL dataset handle
subdatasets
Sequence of subdatasets
tag_namespaces(bidx=0)
Get a list of the dataset’s metadata domains.
Returned items may be passed as ns to the tags method.
Parameters
• int (bidx) – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
• optional – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
Return type
list of str
tags(bidx=0, ns=None)
Returns a dict containing copies of the dataset or band’s tags.
Tags are pairs of key and value strings. Tags belong to namespaces. The standard namespaces are: default
(None) and ‘IMAGE_STRUCTURE’. Applications can create their own additional namespaces.
The optional bidx argument can be used to select the tags of a specific band. The optional ns argument can
be used to select a namespace other than the default.
transform
The dataset’s georeferencing transformation matrix
This transform maps pixel row/column coordinates to coordinates in the dataset’s coordinate reference
system.
Return type
Affine
units
one units string for each dataset band
Possible values include ‘meters’ or ‘degC’. See the Pint project for a suggested list of units.
To set units, one for each band is required.
Return type
list of str
Type
A list of str
If given a Numpy MaskedArray and masked is True, the input’s data and mask will be written to the dataset’s
bands and band mask. If masked is False, no band mask is written. Instead, the input array’s masked values
are filled with the dataset’s nodata value (if defined) or the input’s own fill value.
Parameters
• arr (array-like) – This may be a numpy.ma.MaskedArray.
• indexes (int or list, optional) – Which bands of the dataset to write to. The de-
fault is all.
• window (Window, optional) – The region (slice) of the dataset to which arr will be
written. The default is the entire dataset.
• masked (bool, optional) – Whether or not to write to the dataset’s band mask.
Return type
None
Raises
RasterioIOError – If the write fails.
write_band(bidx, src, window=None)
Write the src array into the bidx band.
Band indexes begin with 1: read_band(1) returns the first band.
The optional window argument takes a tuple like:
((row_start, row_stop), (col_start, col_stop))
specifying a raster subset to write into.
write_colormap(bidx, colormap)
Write a colormap for a band to the dataset.
A colormap maps pixel values of a single-band dataset to RGB or RGBA colors.
Parameters
• bidx (int) – Index of the band (starting with 1).
• colormap (Mapping) – Keys are integers and values are 3 or 4-tuples of ints.
Return type
None
write_mask(mask_array, window=None)
Write to the dataset’s band mask.
Values > 0 represent valid data.
Parameters
• mask_array (ndarray) – Values of 0 represent invalid or missing data. Values > 0 rep-
resent valid data.
• window (Window, optional) – A subset of the dataset’s band mask.
Return type
None
Raises
RasterioIOError – When no mask is written.
write_transform(transform)
Blocks:
0 256 512
0 +--------+--------+
| | |
| (0, 0) | (0, 1) |
| | |
256 +--------+--------+
| | |
| (1, 0) | (1, 1) |
| | |
512 +--------+--------+
Windows:
Parameters
bidx (int, optional) – The band index (using 1-based indexing) from which to extract
windows. A value less than 1 uses the first band if all bands have homogeneous windows and
raises an exception otherwise.
Yields
block, window
bounds
Returns the lower left and upper right bounds of the dataset in the units of its coordinate reference system.
The returned value is a tuple: (lower left x, lower left y, upper right x, upper right y)
checksum(bidx, window=None)
Compute an integer checksum for the stored band
Parameters
• bidx (int) – The band’s index (1-indexed).
• window (tuple, optional) – A window of the band. Default is the entire extent of the
band.
Return type
An int.
close()
Close the dataset and unwind attached exit stack.
closed
Test if the dataset is closed
Return type
bool
colorinterp
A sequence of ColorInterp.<enum> in band order.
Return type
tuple
colormap(bidx)
Returns a dict containing the colormap for a band.
Parameters
bidx (int) – Index of the band whose colormap will be returned. Band index starts at 1.
Returns
Mapping of color index value (starting at 0) to RGBA color as a 4-element tuple.
Return type
dict
Raises
• ValueError – If no colormap is found for the specified band (NULL color table).
• IndexError – If no band exists for the provided index.
compression
count
The number of raster bands in the dataset
Return type
int
crs
The dataset’s coordinate reference system
In setting this property, the value may be a CRS object or an EPSG:nnnn or WKT string.
Return type
CRS
dataset_mask(out=None, out_shape=None, window=None, boundless=False,
resampling=Resampling.nearest)
Get the dataset’s 2D valid data mask.
Parameters
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional reference
to an output array with the same dimensions and shape into which data will be placed.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
Cannot be combined with out_shape.
• out_shape (tuple, optional) – A tuple describing the output array’s shape. Allows
for decimated reads without constructing an output Numpy array.
Cannot be combined with out.
• window (a pair (tuple) of pairs of ints or Window, optional) – The op-
tional window argument is a 2 item tuple. The first item is a tuple containing the indexes
of the rows at which the window starts and stops and the second is a tuple containing the
indexes of the columns at which the window starts and stops. For example, ((0, 2), (0, 2))
defines a 2x2 window at the upper left of the raster dataset.
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
Returns
The dtype of this array is uint8. 0 = nodata, 255 = valid data.
Return type
Numpy ndarray or a view on a Numpy ndarray
Notes
Note: as with Numpy ufuncs, an object is returned even if you use the optional out argument and the return
value shall be preferentially used by callers.
The dataset mask is calculated based on the individual band masks according to the following logic, in
order of precedence:
1. If a .msk file, dataset-wide alpha, or internal mask exists it will be used for the dataset mask.
2. Else if the dataset is a 4-band with a shadow nodata value, band 4 will be used as the dataset mask.
3. If a nodata value exists, use the binary OR (|) of the band masks 4. If no nodata value exists, return a
mask filled with 255.
Note that this differs from read_masks and GDAL RFC15 in that it applies per-dataset, not per-band (see
https://trac.osgeo.org/gdal/wiki/rfc15_nodatabitmask)
descriptions
Descriptions for each dataset band
To set descriptions, one for each band is required.
Return type
tuple[str | None, . . . ]
driver
dtypes
The data types of each band in index order
Return type
list of str
files
Returns a sequence of files associated with the dataset.
Return type
tuple
gcps
ground control points and their coordinate reference system.
This property is a 2-tuple, or pair: (gcps, crs).
gcps
[list of GroundControlPoint] Zero or more ground control points.
crs: CRS
The coordinate reference system of the ground control points.
get_gcps()
Get GCPs and their associated CRS.
get_nodatavals()
height
is_tiled
Examples
For a 3 band dataset that has masks derived from nodata values:
>>> dataset.mask_flag_enums
([<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>])
>>> band1_flags = dataset.mask_flag_enums[0]
>>> rasterio.enums.MaskFlags.nodata in band1_flags
True
>>> rasterio.enums.MaskFlags.alpha in band1_flags
False
meta
The basic metadata of this dataset.
mode
name
nodata
The dataset’s single nodata value
Notes
May be set.
Return type
float
nodatavals
Nodata values for each band
Notes
options
overviews(bidx)
photometric
profile
Basic metadata and creation options of this dataset.
May be passed as keyword arguments to rasterio.open() to create a clone of this dataset.
read(indexes=None, out=None, window=None, masked=False, out_shape=None, boundless=False,
resampling=Resampling.nearest, fill_value=None, out_dtype=None)
Read band data and, optionally, mask as an array.
A smaller (or larger) region of the dataset may be specified and it may be resampled and/or converted to a
different data type.
Parameters
• indexes (int or list, optional) – If indexes is a list, the result is a 3D array, but is
a 2D array if it is a band index number.
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional refer-
ence to an output array into which data will be placed. If the height and width of out differ
from that of the specified window (see below), the raster image will be decimated or repli-
cated using the specified resampling method (also see below). This parameter cannot be
combined with out_shape.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
• out_dtype (str or numpy.dtype) – The desired output data type. For example: ‘uint8’
or rasterio.uint16.
• out_shape (tuple, optional) – A tuple describing the shape of a new output array.
See out (above) for notes on image decimation and replication. This parameter cannot be
combined with out.
• window (Window, optional) – The region (slice) of the dataset from which data will be
read. The default is the entire dataset.
• masked (bool, optional) – If masked is True the return value will be a masked array.
Otherwise (the default) the return value will be a regular array. Masks will be exactly the
inverse of the GDAL RFC 15 conforming arrays returned by read_masks().
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
• fill_value (scalar) – Fill value applied in the boundless=True case only. Like the
fill_value of numpy.ma.MaskedArray, should be value valid for the dataset’s data type.
Return type
Numpy ndarray or a view on a Numpy ndarray
Raises
RasterioIOError – If the write fails.
Notes
This data is read from the dataset’s band cache, which means that repeated reads of the same windows may
avoid I/O.
As with Numpy ufuncs, an object is returned even if you use the optional out argument and the return value
shall be preferentially used by callers.
read_crs()
Return the GDAL dataset’s stored CRS
read_masks(indexes=None, out=None, out_shape=None, window=None, boundless=False,
resampling=Resampling.nearest)
Read band masks as an array.
A smaller (or larger) region of the dataset may be specified and it may be resampled and/or converted to a
different data type.
Parameters
• indexes (int or list, optional) – If indexes is a list, the result is a 3D array, but is
a 2D array if it is a band index number.
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional refer-
ence to an output array into which data will be placed. If the height and width of out differ
from that of the specified window (see below), the raster image will be decimated or repli-
cated using the specified resampling method (also see below). This parameter cannot be
combined with out_shape.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
• out_shape (tuple, optional) – A tuple describing the shape of a new output array.
See out (above) for notes on image decimation and replication. This parameter cannot be
combined with out.
• window (Window, optional) – The region (slice) of the dataset from which data will be
read. The default is the entire dataset.
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
Return type
Numpy ndarray or a view on a Numpy ndarray
Raises
RasterioIOError – If the write fails.
Notes
This data is read from the dataset’s band cache, which means that repeated reads of the same windows may
avoid I/O.
As with Numpy ufuncs, an object is returned even if you use the optional out argument and the return value
shall be preferentially used by callers.
read_transform()
Return the stored GDAL GeoTransform
res
Returns the (width, height) of pixels in the units of its coordinate reference system.
rpcs
Rational polynomial coefficients mapping between pixel and geodetic coordinates.
This property is a dict-like object.
rpcs : RPC instance containing coefficients. Empty if dataset does not have any metadata in the “RPC”
domain.
sample(xy, indexes=None, masked=False)
Get the values of a dataset at certain positions
Values are from the nearest pixel. They are not interpolated.
Parameters
• xy (iterable) – Pairs of x, y coordinates (floats) in the dataset’s reference system.
• indexes (int or list of int) – Indexes of dataset bands to sample.
• masked (bool, default: False) – Whether to mask samples that fall outside the ex-
tent of the dataset.
Returns
Arrays of length equal to the number of specified indexes containing the dataset values for
the bands corresponding to those indexes.
Return type
iterable
scales
Raster scale for each dataset band
To set scales, one for each band is required.
Return type
list of float
shape
start()
Start the dataset’s life cycle
statistics(bidx, approx=False, clear_cache=False)
Get min, max, mean, and standard deviation of a raster band.
Parameters
• bidx (int) – The band’s index (1-indexed).
• approx (bool, optional) – If True, statistics will be calculated from reduced resolution
data.
• clear_cache (bool, optional) – If True, saved stats will be deleted and statistics will
be recomputed. Requires GDAL version >= 3.2.
Return type
Statistics
Notes
GDAL will preferentially use statistics kept in raster metadata like images tags or an XML sidecar. If that
metadata is out of date, the statistics may not correspond to the actual data.
Additionally, GDAL will save statistics to file metadata as a side effect if that metadata does not already
exist.
stop()
Close the GDAL dataset handle
subdatasets
Sequence of subdatasets
tag_namespaces(bidx=0)
Get a list of the dataset’s metadata domains.
Returned items may be passed as ns to the tags method.
Parameters
• int (bidx) – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
• optional – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
Return type
list of str
tags(bidx=0, ns=None)
Returns a dict containing copies of the dataset or band’s tags.
Tags are pairs of key and value strings. Tags belong to namespaces. The standard namespaces are: default
(None) and ‘IMAGE_STRUCTURE’. Applications can create their own additional namespaces.
The optional bidx argument can be used to select the tags of a specific band. The optional ns argument can
be used to select a namespace other than the default.
transform
The dataset’s georeferencing transformation matrix
This transform maps pixel row/column coordinates to coordinates in the dataset’s coordinate reference
system.
Return type
Affine
units
one units string for each dataset band
Possible values include ‘meters’ or ‘degC’. See the Pint project for a suggested list of units.
Parameters
• row (int) – Pixel row.
• col (int) – Pixel column.
• z (float, optional) – Height associated with coordinates. Primarily used for RPC
based coordinate transformations. Ignored for affine based transformations. Default: 0.
• offset (str, optional) – Determines if the returned coordinates are for the center of
the pixel or for a corner.
• transform_method (TransformMethod, optional) – The coordinate transformation
method. Default: TransformMethod.affine.
• rpc_options (dict, optional) – Additional arguments passed to GDALCreateRPC-
Transformer
Returns
x, y
Return type
tuple
class rasterio.io.DatasetWriter
Bases: DatasetWriterBase, WindowMethodsMixin, TransformMethodsMixin
An unbuffered data and metadata writer. Its methods write data directly to disk.
block_shapes
An ordered list of block shapes for each bands
Shapes are tuples and have the same ordering as the dataset’s shape: (count of image rows, count of image
columns).
Return type
list
block_size(bidx, i, j)
Returns the size in bytes of a particular block
Only useful for TIFF formatted datasets.
Parameters
• bidx (int) – Band index, starting with 1.
• i (int) – Row index of the block, starting with 0.
• j (int) – Column index of the block, starting with 0.
Return type
int
block_window(bidx, i, j)
Returns the window for a particular block
Parameters
• bidx (int) – Band index, starting with 1.
• i (int) – Row index of the block, starting with 0.
• j (int) – Column index of the block, starting with 0.
Return type
Window
block_windows(bidx=0)
Iterator over a band’s blocks and their windows
The primary use of this method is to obtain windows to pass to read() for highly efficient access to raster
block data.
The positional parameter bidx takes the index (starting at 1) of the desired band. This iterator yields blocks
“left to right” and “top to bottom” and is similar to Python’s enumerate() in that the first element is the
block index and the second is the dataset window.
Blocks are built-in to a dataset and describe how pixels are grouped within each band and provide a mech-
anism for efficient I/O. A window is a range of pixels within a single band defined by row start, row stop,
column start, and column stop. For example, ((0, 2), (0, 2)) defines a 2 x 2 window at the upper
left corner of a raster band. Blocks are referenced by an (i, j) tuple where (0, 0) would be a band’s
upper left block.
Raster I/O is performed at the block level, so accessing a window spanning multiple rows in a striped raster
requires reading each row. Accessing a 2 x 2 window at the center of a 1800 x 3600 image requires
reading 2 rows, or 7200 pixels just to get the target 4. The same image with internal 256 x 256 blocks
would require reading at least 1 block (if the window entire window falls within a single block) and at most
4 blocks, or at least 512 pixels and at most 2048.
Given an image that is 512 x 512 with blocks that are 256 x 256, its blocks and windows would look
like:
Blocks:
0 256 512
0 +--------+--------+
| | |
| (0, 0) | (0, 1) |
| | |
256 +--------+--------+
| | |
| (1, 0) | (1, 1) |
| | |
512 +--------+--------+
Windows:
Parameters
bidx (int, optional) – The band index (using 1-based indexing) from which to extract
windows. A value less than 1 uses the first band if all bands have homogeneous windows and
raises an exception otherwise.
Yields
block, window
bounds
Returns the lower left and upper right bounds of the dataset in the units of its coordinate reference system.
The returned value is a tuple: (lower left x, lower left y, upper right x, upper right y)
build_overviews(factors, resampling=Resampling.nearest)
Build overviews at one or more decimation factors for all bands of the dataset.
checksum(bidx, window=None)
Compute an integer checksum for the stored band
Parameters
• bidx (int) – The band’s index (1-indexed).
• window (tuple, optional) – A window of the band. Default is the entire extent of the
band.
Return type
An int.
close()
Close the dataset and unwind attached exit stack.
closed
Test if the dataset is closed
Return type
bool
colorinterp
A sequence of ColorInterp.<enum> in band order.
Return type
tuple
colormap(bidx)
Returns a dict containing the colormap for a band.
Parameters
bidx (int) – Index of the band whose colormap will be returned. Band index starts at 1.
Returns
Mapping of color index value (starting at 0) to RGBA color as a 4-element tuple.
Return type
dict
Raises
• ValueError – If no colormap is found for the specified band (NULL color table).
• IndexError – If no band exists for the provided index.
compression
count
The number of raster bands in the dataset
Return type
int
crs
The dataset’s coordinate reference system
In setting this property, the value may be a CRS object or an EPSG:nnnn or WKT string.
Return type
CRS
dataset_mask(out=None, out_shape=None, window=None, boundless=False,
resampling=Resampling.nearest)
Get the dataset’s 2D valid data mask.
Parameters
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional reference
to an output array with the same dimensions and shape into which data will be placed.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
Cannot be combined with out_shape.
• out_shape (tuple, optional) – A tuple describing the output array’s shape. Allows
for decimated reads without constructing an output Numpy array.
Cannot be combined with out.
• window (a pair (tuple) of pairs of ints or Window, optional) – The op-
tional window argument is a 2 item tuple. The first item is a tuple containing the indexes
of the rows at which the window starts and stops and the second is a tuple containing the
indexes of the columns at which the window starts and stops. For example, ((0, 2), (0, 2))
defines a 2x2 window at the upper left of the raster dataset.
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
Returns
The dtype of this array is uint8. 0 = nodata, 255 = valid data.
Return type
Numpy ndarray or a view on a Numpy ndarray
Notes
Note: as with Numpy ufuncs, an object is returned even if you use the optional out argument and the return
value shall be preferentially used by callers.
The dataset mask is calculated based on the individual band masks according to the following logic, in
order of precedence:
1. If a .msk file, dataset-wide alpha, or internal mask exists it will be used for the dataset mask.
2. Else if the dataset is a 4-band with a shadow nodata value, band 4 will be used as the dataset mask.
3. If a nodata value exists, use the binary OR (|) of the band masks 4. If no nodata value exists, return a
mask filled with 255.
Note that this differs from read_masks and GDAL RFC15 in that it applies per-dataset, not per-band (see
https://trac.osgeo.org/gdal/wiki/rfc15_nodatabitmask)
descriptions
Descriptions for each dataset band
To set descriptions, one for each band is required.
Return type
tuple[str | None, . . . ]
driver
dtypes
The data types of each band in index order
Return type
list of str
files
Returns a sequence of files associated with the dataset.
Return type
tuple
gcps
ground control points and their coordinate reference system.
This property is a 2-tuple, or pair: (gcps, crs).
gcps
[list of GroundControlPoint] Zero or more ground control points.
crs: CRS
The coordinate reference system of the ground control points.
get_gcps()
Get GCPs and their associated CRS.
get_nodatavals()
is_tiled
Returns
One list of rasterio.enums.MaskFlags members per band.
Return type
list [, list*]
Examples
For a 3 band dataset that has masks derived from nodata values:
>>> dataset.mask_flag_enums
([<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>])
>>> band1_flags = dataset.mask_flag_enums[0]
>>> rasterio.enums.MaskFlags.nodata in band1_flags
True
>>> rasterio.enums.MaskFlags.alpha in band1_flags
False
meta
The basic metadata of this dataset.
mode
name
nodata
The dataset’s single nodata value
Notes
May be set.
Return type
float
nodatavals
Nodata values for each band
Notes
overviews(bidx)
photometric
profile
Basic metadata and creation options of this dataset.
May be passed as keyword arguments to rasterio.open() to create a clone of this dataset.
read(indexes=None, out=None, window=None, masked=False, out_shape=None, boundless=False,
resampling=Resampling.nearest, fill_value=None, out_dtype=None)
Read band data and, optionally, mask as an array.
A smaller (or larger) region of the dataset may be specified and it may be resampled and/or converted to a
different data type.
Parameters
• indexes (int or list, optional) – If indexes is a list, the result is a 3D array, but is
a 2D array if it is a band index number.
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional refer-
ence to an output array into which data will be placed. If the height and width of out differ
from that of the specified window (see below), the raster image will be decimated or repli-
cated using the specified resampling method (also see below). This parameter cannot be
combined with out_shape.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
• out_dtype (str or numpy.dtype) – The desired output data type. For example: ‘uint8’
or rasterio.uint16.
• out_shape (tuple, optional) – A tuple describing the shape of a new output array.
See out (above) for notes on image decimation and replication. This parameter cannot be
combined with out.
• window (Window, optional) – The region (slice) of the dataset from which data will be
read. The default is the entire dataset.
• masked (bool, optional) – If masked is True the return value will be a masked array.
Otherwise (the default) the return value will be a regular array. Masks will be exactly the
inverse of the GDAL RFC 15 conforming arrays returned by read_masks().
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
• fill_value (scalar) – Fill value applied in the boundless=True case only. Like the
fill_value of numpy.ma.MaskedArray, should be value valid for the dataset’s data type.
Return type
Numpy ndarray or a view on a Numpy ndarray
Raises
RasterioIOError – If the write fails.
Notes
This data is read from the dataset’s band cache, which means that repeated reads of the same windows may
avoid I/O.
As with Numpy ufuncs, an object is returned even if you use the optional out argument and the return value
shall be preferentially used by callers.
read_crs()
Return the GDAL dataset’s stored CRS
read_masks(indexes=None, out=None, out_shape=None, window=None, boundless=False,
resampling=Resampling.nearest)
Read band masks as an array.
A smaller (or larger) region of the dataset may be specified and it may be resampled and/or converted to a
different data type.
Parameters
• indexes (int or list, optional) – If indexes is a list, the result is a 3D array, but is
a 2D array if it is a band index number.
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional refer-
ence to an output array into which data will be placed. If the height and width of out differ
from that of the specified window (see below), the raster image will be decimated or repli-
cated using the specified resampling method (also see below). This parameter cannot be
combined with out_shape.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
• out_shape (tuple, optional) – A tuple describing the shape of a new output array.
See out (above) for notes on image decimation and replication. This parameter cannot be
combined with out.
• window (Window, optional) – The region (slice) of the dataset from which data will be
read. The default is the entire dataset.
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
Return type
Numpy ndarray or a view on a Numpy ndarray
Raises
RasterioIOError – If the write fails.
Notes
This data is read from the dataset’s band cache, which means that repeated reads of the same windows may
avoid I/O.
As with Numpy ufuncs, an object is returned even if you use the optional out argument and the return value
shall be preferentially used by callers.
read_transform()
Return the stored GDAL GeoTransform
res
Returns the (width, height) of pixels in the units of its coordinate reference system.
rpcs
Rational polynomial coefficients mapping between pixel and geodetic coordinates.
This property is a dict-like object.
rpcs : RPC instance containing coefficients. Empty if dataset does not have any metadata in the “RPC”
domain.
sample(xy, indexes=None, masked=False)
Get the values of a dataset at certain positions
Values are from the nearest pixel. They are not interpolated.
Parameters
• xy (iterable) – Pairs of x, y coordinates (floats) in the dataset’s reference system.
• indexes (int or list of int) – Indexes of dataset bands to sample.
• masked (bool, default: False) – Whether to mask samples that fall outside the ex-
tent of the dataset.
Returns
Arrays of length equal to the number of specified indexes containing the dataset values for
the bands corresponding to those indexes.
Return type
iterable
scales
Raster scale for each dataset band
To set scales, one for each band is required.
Return type
list of float
set_band_description(bidx, value)
Sets the description of a dataset band.
Parameters
• bidx (int) – Index of the band (starting with 1).
• value (string) – A description of the band.
Return type
None
set_band_unit(bidx, value)
Sets the unit of measure of a dataset band.
Parameters
• bidx (int) – Index of the band (starting with 1).
• value (str) – A label for the band’s unit of measure such as ‘meters’ or ‘degC’. See the
Pint project for a suggested list of units.
Return type
None
shape
start()
Start the dataset’s life cycle
statistics(bidx, approx=False, clear_cache=False)
Get min, max, mean, and standard deviation of a raster band.
Parameters
• bidx (int) – The band’s index (1-indexed).
• approx (bool, optional) – If True, statistics will be calculated from reduced resolution
data.
• clear_cache (bool, optional) – If True, saved stats will be deleted and statistics will
be recomputed. Requires GDAL version >= 3.2.
Return type
Statistics
Notes
GDAL will preferentially use statistics kept in raster metadata like images tags or an XML sidecar. If that
metadata is out of date, the statistics may not correspond to the actual data.
Additionally, GDAL will save statistics to file metadata as a side effect if that metadata does not already
exist.
stop()
Close the GDAL dataset handle
subdatasets
Sequence of subdatasets
tag_namespaces(bidx=0)
Get a list of the dataset’s metadata domains.
Returned items may be passed as ns to the tags method.
Parameters
• int (bidx) – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
• optional – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
Return type
list of str
tags(bidx=0, ns=None)
Returns a dict containing copies of the dataset or band’s tags.
Tags are pairs of key and value strings. Tags belong to namespaces. The standard namespaces are: default
(None) and ‘IMAGE_STRUCTURE’. Applications can create their own additional namespaces.
The optional bidx argument can be used to select the tags of a specific band. The optional ns argument can
be used to select a namespace other than the default.
transform
The dataset’s georeferencing transformation matrix
This transform maps pixel row/column coordinates to coordinates in the dataset’s coordinate reference
system.
Return type
Affine
units
one units string for each dataset band
Possible values include ‘meters’ or ‘degC’. See the Pint project for a suggested list of units.
To set units, one for each band is required.
Return type
list of str
Type
A list of str
update_tags(bidx=0, ns=None, **kwargs)
Updates the tags of a dataset or one of its bands.
Tags are pairs of key and value strings. Tags belong to namespaces. The standard namespaces are: default
(None) and ‘IMAGE_STRUCTURE’. Applications can create their own additional namespaces.
The optional bidx argument can be used to select the dataset band. The optional ns argument can be used
to select a namespace other than the default.
width
Return type
Window
window_bounds(window)
Get the bounds of a window
Parameters
window (rasterio.windows.Window) – Dataset window
Returns
bounds – x_min, y_min, x_max, y_max for the given window
Return type
tuple
window_transform(window)
Get the affine transform for a dataset window.
Parameters
window (rasterio.windows.Window) – Dataset window
Returns
transform – The affine transform matrix for the given window
Return type
Affine
write(arr, indexes=None, window=None, masked=False)
Write the arr array into indexed bands of the dataset.
If given a Numpy MaskedArray and masked is True, the input’s data and mask will be written to the dataset’s
bands and band mask. If masked is False, no band mask is written. Instead, the input array’s masked values
are filled with the dataset’s nodata value (if defined) or the input’s own fill value.
Parameters
• arr (array-like) – This may be a numpy.ma.MaskedArray.
• indexes (int or list, optional) – Which bands of the dataset to write to. The de-
fault is all.
• window (Window, optional) – The region (slice) of the dataset to which arr will be
written. The default is the entire dataset.
• masked (bool, optional) – Whether or not to write to the dataset’s band mask.
Return type
None
Raises
RasterioIOError – If the write fails.
write_band(bidx, src, window=None)
Write the src array into the bidx band.
Band indexes begin with 1: read_band(1) returns the first band.
The optional window argument takes a tuple like:
((row_start, row_stop), (col_start, col_stop))
specifying a raster subset to write into.
write_colormap(bidx, colormap)
Write a colormap for a band to the dataset.
A colormap maps pixel values of a single-band dataset to RGB or RGBA colors.
Parameters
• bidx (int) – Index of the band (starting with 1).
• colormap (Mapping) – Keys are integers and values are 3 or 4-tuples of ints.
Return type
None
write_mask(mask_array, window=None)
Write to the dataset’s band mask.
Values > 0 represent valid data.
Parameters
• mask_array (ndarray) – Values of 0 represent invalid or missing data. Values > 0 rep-
resent valid data.
• window (Window, optional) – A subset of the dataset’s band mask.
Return type
None
Raises
RasterioIOError – When no mask is written.
write_transform(transform)
Examples
A GeoTIFF can be loaded in memory and accessed using the GeoTIFF format driver
close()
exists()
Test if the in-memory file exists.
Returns
True if the in-memory file exists.
Return type
bool
getbuffer()
Return a view on bytes of the file.
open(driver=None, width=None, height=None, count=None, crs=None, transform=None, dtype=None,
nodata=None, sharing=False, **kwargs)
Open the file and return a Rasterio dataset object.
If data has already been written, the file is opened in ‘r’ mode. Otherwise, the file is opened in ‘w’ mode.
Parameters
• parameter (Note well that there is no path ) –
• a (contains a single dataset and there is no need to specify) –
• path. –
• the (Other parameters are optional and have the same semantics as) –
• rasterio.open(). (parameters of ) –
read(size=-1)
Read bytes from MemoryFile.
Parameters
size (int) – Number of bytes to read. Default is -1 (all bytes).
Returns
String of bytes read.
Return type
bytes
seek(offset, whence=0)
tell()
write(data)
Write data bytes to MemoryFile.
Parameters
data (bytes) –
Returns
Number of bytes written.
Return type
int
class rasterio.io.ZipMemoryFile(file_or_bytes=None)
Bases: MemoryFile
A read-only BytesIO-like object backed by an in-memory zip file.
This allows a zip file containing formatted files to be read without I/O.
close()
exists()
Test if the in-memory file exists.
Returns
True if the in-memory file exists.
Return type
bool
getbuffer()
Return a view on bytes of the file.
open(path, driver=None, sharing=False, **kwargs)
Open a dataset within the zipped stream.
Parameters
• path (str) – Path to a dataset in the zip file, relative to the root of the archive.
• the (Other parameters are optional and have the same semantics as) –
• rasterio.open(). (parameters of ) –
Return type
A Rasterio dataset object
read(size=-1)
Read bytes from MemoryFile.
Parameters
size (int) – Number of bytes to read. Default is -1 (all bytes).
Returns
String of bytes read.
Return type
bytes
seek(offset, whence=0)
tell()
write(data)
Write data bytes to MemoryFile.
Parameters
data (bytes) –
Returns
Number of bytes written.
Return type
int
rasterio.io.get_writer_for_driver(driver)
Return the writer class appropriate for the specified driver.
rasterio.io.get_writer_for_path(path, driver=None)
Return the writer class appropriate for the existing dataset.
rasterio.mask module
• nodata (int or float (opt)) – Value representing nodata within each raster band. If
not set, defaults to the nodata value for the input raster. If there is no set nodata value for the
raster, it defaults to 0.
• filled (bool (opt)) – If True, the pixels outside the features will be set to nodata. If
False, the output array will contain the original pixel data, and only the mask will be based
on shapes. Defaults to True.
• crop (bool (opt)) – Whether to crop the raster to the extent of the shapes. Defaults to
False.
• pad (bool (opt)) – If True, the features will be padded in each direction by one half of a
pixel prior to cropping raster. Defaults to False.
• indexes (list of ints or a single int (opt)) – If indexes is a list, the result is a
3D array, but is a 2D array if it is a band index number.
Returns
Two elements:
masked
[numpy.ndarray or numpy.ma.MaskedArray] Data contained in the raster after apply-
ing the mask. If filled is True and invert is False, the return will be an array where
pixels outside shapes are set to the nodata value (or nodata inside shapes if invert is
True).
If filled is False, the return will be a MaskedArray in which pixels outside shapes are
True (or False if invert is True).
out_transform
[affine.Affine()] Information for mapping pixel coordinates in masked to another co-
ordinate system.
Return type
tuple
rasterio.mask.raster_geometry_mask(dataset, shapes, all_touched=False, invert=False, crop=False,
pad=False, pad_width=0.5)
Create a mask from shapes, transform, and optional window within original raster.
By default, mask is intended for use as a numpy mask, where pixels that overlap shapes are False.
If shapes do not overlap the raster and crop=True, a ValueError is raised. Otherwise, a warning is raised, and a
completely True mask is returned (if invert is False).
Parameters
• dataset (a dataset object opened in 'r' mode) – Raster for which the mask will be
created.
• shapes (iterable object) – The values must be a GeoJSON-like dict or an object that
implements the Python geo interface protocol (such as a Shapely Polygon).
• all_touched (bool (opt)) – Include a pixel in the mask if it touches any of the shapes. If
False (default), include a pixel only if its center is within one of the shapes, or if it is selected
by Bresenham’s line algorithm.
• invert (bool (opt)) – If False (default), mask will be False inside shapes and True out-
side. If True, mask will be True inside shapes and False outside.
• crop (bool (opt)) – Whether to crop the dataset to the extent of the shapes. Defaults to
False.
• pad (bool (opt)) – If True, the features will be padded in each direction by one half of a
pixel prior to cropping dataset. Defaults to False.
• pad_width (float (opt)) – If pad is set (to maintain back-compatibility), then this will
be the pixel-size width of the padding around the mask.
Returns
Three elements:
mask
[numpy ndarray of type ‘bool’] Mask that is True outside shapes, and False within
shapes.
out_transform
[affine.Affine()] Information for mapping pixel coordinates in masked to another co-
ordinate system.
window: rasterio.windows.Window instance
Window within original raster covered by shapes. None if crop is False.
Return type
tuple
rasterio.merge module
Returns
Two elements:
dest: numpy.ndarray
Contents of all input rasters in single array
out_transform: affine.Affine()
Information for mapping pixel coordinates in dest to another coordinate system
Return type
tuple
rasterio.path module
rasterio.plot module
Parameters
• source (numpy.ndarray or dataset object opened in 'r' mode) – If array, data in
the order rows, columns and optionally bands. If array is band order (bands in the first
dimension), use arr[0]
• transform (Affine, required if source is array) – Defines the affine transform
if source is an array
Returns
left, right, bottom, top
Return type
tuple of float
rasterio.plot.reshape_as_image(arr)
Returns the source array reshaped into the order expected by image processing and visualization software (mat-
plotlib, scikit-image, etc) by swapping the axes order from (bands, rows, columns) to (rows, columns, bands)
Parameters
arr (array-like of shape (bands, rows, columns)) – image to reshape
rasterio.plot.reshape_as_raster(arr)
Returns the array in a raster order by swapping the axes order from (rows, columns, bands) to (bands, rows,
columns)
Parameters
arr (array-like in the image form of (rows, columns, bands)) – image to re-
shape
rasterio.plot.show(source, with_bounds=True, contour=False, contour_label_kws=None, ax=None,
title=None, transform=None, adjust=False, **kwargs)
Display a raster or raster band using matplotlib.
Parameters
• source (array or dataset object opened in 'r' mode or Band or
tuple(dataset, bidx)) – If Band or tuple (dataset, bidx), display the selected
band. If raster dataset display the rgb image as defined in the colorinterp metadata, or
default to first band.
• with_bounds (bool (opt)) – Whether to change the image extent to the spatial bounds of
the image, rather than pixel coordinates. Only works when source is (raster dataset, bidx) or
raster dataset.
• contour (bool (opt)) – Whether to plot the raster data as contours
• contour_label_kws (dictionary (opt)) – Keyword arguments for labeling the con-
tours, empty dictionary for no labels.
• ax (matplotlib.axes.Axes, optional) – Axes to plot on, otherwise uses current axes.
• title (str, optional) – Title for the figure.
• transform (Affine, optional) – Defines the affine transform if source is an array
• adjust (bool) – If the plotted data is an RGB image, adjust the values of each band so that
they fall between 0 and 1 before plotting. If True, values will be adjusted by the min / max
of each band. If False, no adjustment will be applied.
• **kwargs (key, value pairings optional) – These will be passed to the
matplotlib.pyplot.imshow() or matplotlib.pyplot.contour() contour method
depending on contour argument.
Returns
ax – Axes with plot.
Return type
matplotlib.axes.Axes
rasterio.plot.show_hist(source, bins=10, masked=True, title='Histogram', ax=None, label=None, **kwargs)
Easily display a histogram with matplotlib.
Parameters
rasterio.profiles module
rasterio.rpc module
err_rand
classmethod from_gdal(rpcs)
Deserialize dict values to float or list.
Return type
RPC
height_off
height_scale
lat_off
lat_scale
line_den_coeff
line_num_coeff
line_off
line_scale
long_off
long_scale
samp_den_coeff
samp_num_coeff
samp_off
samp_scale
to_dict()
Return a dictionary representation of RPC
to_gdal()
Serialize RPC attribute name and values in a form expected by GDAL.
Return type
dict
Notes
The err_bias and err_rand are optional, and are not written to datasets by GDAL.
rasterio.sample module
rasterio.session module
credentials
The session credentials.
Type
dict
static aws_or_dummy(*args, **kwargs)
Create an AWSSession if boto3 is available, else DummySession
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
static cls_from_path(path)
Find the session class suited to the data at path.
Parameters
path (str) – A dataset path or identifier.
Return type
class
static from_environ(*args, **kwargs)
Create a session object suited to the environment.
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
static from_foreign_session(session, cls=None)
Create a session object matching the foreign session.
Parameters
• session (obj) – A foreign session object.
• cls (Session class, optional) – The class to return.
Return type
Session
static from_path(path, *args, **kwargs)
Create a session object suited to the data at path.
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
get_credential_options()
Get credentials as GDAL configuration options
Return type
dict
classmethod hascreds(config)
Determine if the given configuration has proper credentials
Parameters
• cls (class) – A Session class.
• config (dict) – GDAL configuration as a dict.
Return type
bool
class rasterio.session.GSSession(google_application_credentials=None)
Bases: Session
Configures access to secured resources stored in Google Cloud Storage
static aws_or_dummy(*args, **kwargs)
Create an AWSSession if boto3 is available, else DummySession
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
static cls_from_path(path)
Find the session class suited to the data at path.
Parameters
path (str) – A dataset path or identifier.
Return type
class
property credentials
The session credentials as a dict
static from_environ(*args, **kwargs)
Create a session object suited to the environment.
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
static cls_from_path(path)
Find the session class suited to the data at path.
Parameters
path (str) – A dataset path or identifier.
Return type
class
property credentials
The session credentials as a dict
static from_environ(*args, **kwargs)
Create a session object suited to the environment.
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
static from_foreign_session(session, cls=None)
Create a session object matching the foreign session.
Parameters
• session (obj) – A foreign session object.
• cls (Session class, optional) – The class to return.
Return type
Session
static from_path(path, *args, **kwargs)
Create a session object suited to the data at path.
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
get_credential_options()
Get credentials as GDAL configuration options
Return type
dict
classmethod hascreds(config)
Determine if the given configuration has proper credentials
Parameters
• cls (class) – A Session class.
Notes
property credentials
The session credentials as a dict
static from_environ(*args, **kwargs)
Create a session object suited to the environment.
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
static from_foreign_session(session, cls=None)
Create a session object matching the foreign session.
Parameters
• session (obj) – A foreign session object.
• cls (Session class, optional) – The class to return.
Return type
Session
static from_path(path, *args, **kwargs)
Create a session object suited to the data at path.
Parameters
• path (str) – A dataset path or identifier.
• args (sequence) – Positional arguments for the foreign session constructor.
• kwargs (dict) – Keyword arguments for the foreign session constructor.
Return type
Session
get_credential_options()
Get credentials as GDAL configuration options :rtype: dict
classmethod hascreds(config)
Determine if the given configuration has proper credentials :param cls: A Session class. :type cls: class
:param config: GDAL configuration as a dict. :type config: dict
Return type
bool
rasterio.session.parse_bool(v)
CPLTestBool equivalent
rasterio.shutil module
rasterio.tools module
rasterio.transform module
Geospatial transforms
class rasterio.transform.AffineTransformer(affine_transform)
Bases: TransformerBase
A pure Python class related to affine based coordinate transformations.
class rasterio.transform.GCPTransformer(gcps)
Bases: GCPTransformerBase, GDALTransformerBase
Class related to Ground Control Point (GCPs) based coordinate transformations.
Uses GDALCreateGCPTransformer and GDALGCPTransform for computations. Ensure that GDAL transformer
objects are destroyed by calling close() method or using context manager interface.
class rasterio.transform.GDALTransformerBase
Bases: TransformerBase
close()
Notes
rasterio.transform.guard_transform(transform)
Return an Affine transformation instance.
rasterio.transform.rowcol(transform, xs, ys, zs=None, op=<built-in function floor>, precision=None,
**rpc_options)
Get rows and cols of the pixels containing (x, y).
Parameters
• transform (Affine or sequence of GroundControlPoint or RPC) – Transform
suitable for input to AffineTransformer, GCPTransformer, or RPCTransformer.
• xs (list or float) – x values in coordinate reference system.
• ys (list or float) – y values in coordinate reference system.
• zs (list or float, optional) – Height associated with coordinates. Primarily used for
RPC based coordinate transformations. Ignored for affine based transformations. Default:
0.
• op (function) – Function to convert fractional pixels to whole numbers (floor, ceiling,
round).
• precision (int or float, optional) – This parameter is unused, deprecated in raste-
rio 1.3.0, and will be removed in version 2.0.0.
• rpc_options (dict, optional) – Additional arguments passed to GDALCreateRPC-
Transformer.
Returns
• rows (list of ints) – list of row indices
• cols (list of ints) – list of column indices
rasterio.transform.tastes_like_gdal(seq)
Return True if seq matches the GDAL geotransform pattern.
rasterio.transform.xy(transform, rows, cols, zs=None, offset='center', **rpc_options)
Get the x and y coordinates of pixels at rows and cols.
The pixel’s center is returned by default, but a corner can be returned by setting offset to one of ul, ur, ll, lr.
Supports affine, Ground Control Point (GCP), or Rational Polynomial Coefficients (RPC) based coordinate trans-
formations.
Parameters
• transform (Affine or sequence of GroundControlPoint or RPC) – Transform
suitable for input to AffineTransformer, GCPTransformer, or RPCTransformer.
• rows (list or int) – Pixel rows.
• cols (int or sequence of ints) – Pixel columns.
• zs (list or float, optional) – Height associated with coordinates. Primarily used for
RPC based coordinate transformations. Ignored for affine based transformations. Default:
0.
• offset (str, optional) – Determines if the returned coordinates are for the center of the
pixel or for a corner.
• rpc_options (dict, optional) – Additional arguments passed to GDALCreateRPC-
Transformer.
Returns
• xs (float or list of floats) – x coordinates in coordinate reference system
• ys (float or list of floats) – y coordinates in coordinate reference system
rasterio.vrt module
Examples
block_shapes
An ordered list of block shapes for each bands
Shapes are tuples and have the same ordering as the dataset’s shape: (count of image rows, count of image
columns).
Return type
list
block_size(bidx, i, j)
Returns the size in bytes of a particular block
Only useful for TIFF formatted datasets.
Parameters
• bidx (int) – Band index, starting with 1.
• i (int) – Row index of the block, starting with 0.
• j (int) – Column index of the block, starting with 0.
Return type
int
block_window(bidx, i, j)
Returns the window for a particular block
Parameters
• bidx (int) – Band index, starting with 1.
• i (int) – Row index of the block, starting with 0.
• j (int) – Column index of the block, starting with 0.
Return type
Window
block_windows(bidx=0)
Iterator over a band’s blocks and their windows
The primary use of this method is to obtain windows to pass to read() for highly efficient access to raster
block data.
The positional parameter bidx takes the index (starting at 1) of the desired band. This iterator yields blocks
“left to right” and “top to bottom” and is similar to Python’s enumerate() in that the first element is the
block index and the second is the dataset window.
Blocks are built-in to a dataset and describe how pixels are grouped within each band and provide a mech-
anism for efficient I/O. A window is a range of pixels within a single band defined by row start, row stop,
column start, and column stop. For example, ((0, 2), (0, 2)) defines a 2 x 2 window at the upper
left corner of a raster band. Blocks are referenced by an (i, j) tuple where (0, 0) would be a band’s
upper left block.
Raster I/O is performed at the block level, so accessing a window spanning multiple rows in a striped raster
requires reading each row. Accessing a 2 x 2 window at the center of a 1800 x 3600 image requires
reading 2 rows, or 7200 pixels just to get the target 4. The same image with internal 256 x 256 blocks
would require reading at least 1 block (if the window entire window falls within a single block) and at most
4 blocks, or at least 512 pixels and at most 2048.
Given an image that is 512 x 512 with blocks that are 256 x 256, its blocks and windows would look
like:
Blocks:
0 256 512
0 +--------+--------+
| | |
| (0, 0) | (0, 1) |
| | |
256 +--------+--------+
| | |
| (1, 0) | (1, 1) |
| | |
512 +--------+--------+
Windows:
Parameters
bidx (int, optional) – The band index (using 1-based indexing) from which to extract
windows. A value less than 1 uses the first band if all bands have homogeneous windows and
raises an exception otherwise.
Yields
block, window
bounds
Returns the lower left and upper right bounds of the dataset in the units of its coordinate reference system.
The returned value is a tuple: (lower left x, lower left y, upper right x, upper right y)
checksum(bidx, window=None)
Compute an integer checksum for the stored band
Parameters
• bidx (int) – The band’s index (1-indexed).
• window (tuple, optional) – A window of the band. Default is the entire extent of the
band.
Return type
An int.
close()
Close the dataset and unwind attached exit stack.
closed
Test if the dataset is closed
Return type
bool
colorinterp
A sequence of ColorInterp.<enum> in band order.
Return type
tuple
colormap(bidx)
Returns a dict containing the colormap for a band.
Parameters
bidx (int) – Index of the band whose colormap will be returned. Band index starts at 1.
Returns
Mapping of color index value (starting at 0) to RGBA color as a 4-element tuple.
Return type
dict
Raises
• ValueError – If no colormap is found for the specified band (NULL color table).
• IndexError – If no band exists for the provided index.
compression
count
The number of raster bands in the dataset
Return type
int
crs
The dataset’s coordinate reference system
dataset_mask(out=None, out_shape=None, window=None, boundless=False,
resampling=Resampling.nearest)
Get the dataset’s 2D valid data mask.
Parameters
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional reference
to an output array with the same dimensions and shape into which data will be placed.
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
Cannot be combined with out_shape.
• out_shape (tuple, optional) – A tuple describing the output array’s shape. Allows
for decimated reads without constructing an output Numpy array.
Cannot be combined with out.
• window (a pair (tuple) of pairs of ints or Window, optional) – The op-
tional window argument is a 2 item tuple. The first item is a tuple containing the indexes
of the rows at which the window starts and stops and the second is a tuple containing the
indexes of the columns at which the window starts and stops. For example, ((0, 2), (0, 2))
defines a 2x2 window at the upper left of the raster dataset.
• boundless (bool, optional (default False)) – If True, windows that extend beyond the
dataset’s extent are permitted and partially or completely filled arrays will be returned as
appropriate.
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
Returns
The dtype of this array is uint8. 0 = nodata, 255 = valid data.
Return type
Numpy ndarray or a view on a Numpy ndarray
Notes
Note: as with Numpy ufuncs, an object is returned even if you use the optional out argument and the return
value shall be preferentially used by callers.
The dataset mask is calculated based on the individual band masks according to the following logic, in
order of precedence:
1. If a .msk file, dataset-wide alpha, or internal mask exists it will be used for the dataset mask.
2. Else if the dataset is a 4-band with a shadow nodata value, band 4 will be used as the dataset mask.
3. If a nodata value exists, use the binary OR (|) of the band masks 4. If no nodata value exists, return a
mask filled with 255.
Note that this differs from read_masks and GDAL RFC15 in that it applies per-dataset, not per-band (see
https://trac.osgeo.org/gdal/wiki/rfc15_nodatabitmask)
descriptions
Descriptions for each dataset band
To set descriptions, one for each band is required.
Return type
tuple[str | None, . . . ]
driver
dtypes
The data types of each band in index order
Return type
list of str
files
Returns a sequence of files associated with the dataset.
Return type
tuple
gcps
ground control points and their coordinate reference system.
This property is a 2-tuple, or pair: (gcps, crs).
gcps
[list of GroundControlPoint] Zero or more ground control points.
crs: CRS
The coordinate reference system of the ground control points.
get_gcps()
Get GCPs and their associated CRS.
get_nodatavals()
indexes
The 1-based indexes of each band in the dataset
For a 3-band dataset, this property will be [1, 2, 3].
Return type
list of int
interleaving
is_tiled
Examples
For a 3 band dataset that has masks derived from nodata values:
>>> dataset.mask_flag_enums
([<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>], [<MaskFlags.nodata: 8>])
>>> band1_flags = dataset.mask_flag_enums[0]
>>> rasterio.enums.MaskFlags.nodata in band1_flags
True
>>> rasterio.enums.MaskFlags.alpha in band1_flags
False
meta
The basic metadata of this dataset.
mode
name
nodata
The dataset’s single nodata value
Notes
May be set.
Return type
float
nodatavals
Nodata values for each band
Notes
overviews(bidx)
photometric
profile
Basic metadata and creation options of this dataset.
May be passed as keyword arguments to rasterio.open() to create a clone of this dataset.
read(indexes=None, out=None, window=None, masked=False, out_shape=None,
resampling=Resampling.nearest, fill_value=None, out_dtype=None, **kwargs)
Read a dataset’s raw pixels as an N-d array
This data is read from the dataset’s band cache, which means that repeated reads of the same windows may
avoid I/O.
Parameters
• indexes (list of ints or a single int, optional) – If indexes is a list, the re-
sult is a 3D array, but is a 2D array if it is a band index number.
• out (numpy.ndarray, optional) – As with Numpy ufuncs, this is an optional reference
to an output array into which data will be placed. If the height and width of out differ from
that of the specified window (see below), the raster image will be decimated or replicated
using the specified resampling method (also see below).
Note: the method’s return value may be a view on this array. In other words, out is likely
to be an incomplete representation of the method’s results.
This parameter cannot be combined with out_shape.
• out_dtype (str or numpy.dtype) – The desired output data type. For example: ‘uint8’
or rasterio.uint16.
• out_shape (tuple, optional) – A tuple describing the shape of a new output array.
See out (above) for notes on image decimation and replication.
Cannot combined with out.
• window (a pair (tuple) of pairs of ints or Window, optional) – The op-
tional window argument is a 2 item tuple. The first item is a tuple containing the indexes
of the rows at which the window starts and stops and the second is a tuple containing the
indexes of the columns at which the window starts and stops. For example, ((0, 2), (0, 2))
defines a 2x2 window at the upper left of the raster dataset.
• masked (bool, optional) – If masked is True the return value will be a masked array.
Otherwise (the default) the return value will be a regular array. Masks will be exactly the
inverse of the GDAL RFC 15 conforming arrays returned by read_masks().
• resampling (Resampling) – By default, pixel values are read raw or interpolated using
a nearest neighbor algorithm from the band cache. Other resampling algorithms may be
specified. Resampled pixels are not cached.
• fill_value (scalar) – Fill value applied in the boundless=True case only.
• kwargs (dict) – This is only for backwards compatibility. No keyword arguments are
supported other than the ones named above.
Returns
• numpy.ndarray or a view on a numpy.ndarray
• Note (as with Numpy ufuncs, an object is returned even if you)
• use the optional out argument and the return value shall be
• preferentially used by callers.
read_crs()
Return the GDAL dataset’s stored CRS
read_masks(indexes=None, out=None, out_shape=None, window=None, resampling=Resampling.nearest,
**kwargs)
Read raster band masks as a multidimensional array
read_transform()
Return the stored GDAL GeoTransform
res
Returns the (width, height) of pixels in the units of its coordinate reference system.
rpcs
Rational polynomial coefficients mapping between pixel and geodetic coordinates.
This property is a dict-like object.
rpcs : RPC instance containing coefficients. Empty if dataset does not have any metadata in the “RPC”
domain.
sample(xy, indexes=None, masked=False)
Get the values of a dataset at certain positions
Values are from the nearest pixel. They are not interpolated.
Parameters
• xy (iterable) – Pairs of x, y coordinates (floats) in the dataset’s reference system.
• indexes (int or list of int) – Indexes of dataset bands to sample.
• masked (bool, default: False) – Whether to mask samples that fall outside the ex-
tent of the dataset.
Returns
Arrays of length equal to the number of specified indexes containing the dataset values for
the bands corresponding to those indexes.
Return type
iterable
scales
Raster scale for each dataset band
To set scales, one for each band is required.
Return type
list of float
shape
start()
Start the dataset’s life cycle
statistics(bidx, approx=False, clear_cache=False)
Get min, max, mean, and standard deviation of a raster band.
Parameters
• bidx (int) – The band’s index (1-indexed).
• approx (bool, optional) – If True, statistics will be calculated from reduced resolution
data.
• clear_cache (bool, optional) – If True, saved stats will be deleted and statistics will
be recomputed. Requires GDAL version >= 3.2.
Return type
Statistics
Notes
GDAL will preferentially use statistics kept in raster metadata like images tags or an XML sidecar. If that
metadata is out of date, the statistics may not correspond to the actual data.
Additionally, GDAL will save statistics to file metadata as a side effect if that metadata does not already
exist.
stop()
Close the GDAL dataset handle
subdatasets
Sequence of subdatasets
tag_namespaces(bidx=0)
Get a list of the dataset’s metadata domains.
Returned items may be passed as ns to the tags method.
Parameters
• int (bidx) – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
• optional – Can be used to select a specific band, otherwise the dataset’s general metadata
domains are returned.
Return type
list of str
tags(bidx=0, ns=None)
Returns a dict containing copies of the dataset or band’s tags.
Tags are pairs of key and value strings. Tags belong to namespaces. The standard namespaces are: default
(None) and ‘IMAGE_STRUCTURE’. Applications can create their own additional namespaces.
The optional bidx argument can be used to select the tags of a specific band. The optional ns argument can
be used to select a namespace other than the default.
transform
The dataset’s georeferencing transformation matrix
This transform maps pixel row/column coordinates to coordinates in the dataset’s coordinate reference
system.
Return type
Affine
units
one units string for each dataset band
Possible values include ‘meters’ or ‘degC’. See the Pint project for a suggested list of units.
To set units, one for each band is required.
Return type
list of str
Type
A list of str
width
rasterio.warp module
• rpcs (RPC or dict, optional) – Instead of a bounding box for the source, rational poly-
nomial coefficients may be provided.
• resolution (tuple (x resolution, y resolution) or float, optional) –
Target resolution, in units of target coordinate reference system.
• dst_width (int, optional) – Output file size in pixels and lines. Cannot be used together
with resolution.
• dst_height (int, optional) – Output file size in pixels and lines. Cannot be used to-
gether with resolution.
• kwargs (dict, optional) – Additional arguments passed to transformation function.
Returns
• transform (Affine) – Output affine transformation matrix
• width, height (int) – Output dimensions
Notes
• src_crs (CRS or dict) – Source coordinate reference system, as a rasterio CRS object.
Example: CRS({‘init’: ‘EPSG:4326’})
• dst_crs (CRS or dict) – Target coordinate reference system.
• xs (array_like) – Contains x values. Will be cast to double floating point values.
• ys (array_like) – Contains y values.
• zs (array_like, optional) – Contains z values. Assumed to be all 0 if absent.
Returns
out – Tuple of x, y, and optionally z vectors, transformed into the target coordinate reference
system.
Return type
tuple of array_like, (xs, ys, [zs])
rasterio.warp.transform_bounds(src_crs, dst_crs, left, bottom, right, top, densify_pts=21)
Transform bounds from src_crs to dst_crs.
Optionally densifying the edges (to account for nonlinear transformations along these edges) and extracting the
outermost bounds.
Note: antimeridian support added in version 1.3.0
Parameters
• src_crs (CRS or dict) – Source coordinate reference system, in rasterio dict format. Ex-
ample: CRS({‘init’: ‘EPSG:4326’})
• dst_crs (CRS or dict) – Target coordinate reference system.
• left (float) – Bounding coordinates in src_crs, from the bounds property of a raster.
• bottom (float) – Bounding coordinates in src_crs, from the bounds property of a raster.
• right (float) – Bounding coordinates in src_crs, from the bounds property of a raster.
• top (float) – Bounding coordinates in src_crs, from the bounds property of a raster.
• densify_pts (uint, optional) – Number of points to add to each edge to account for
nonlinear edges produced by the transform process. Large numbers will produce worse per-
formance. Default: 21 (gdal default).
Returns
left, bottom, right, top – Outermost coordinates in target coordinate reference system.
Return type
float
rasterio.warp.transform_geom(src_crs, dst_crs, geom, antimeridian_cutting=True, antimeridian_offset=10.0,
precision=-1)
Transform geometry from source coordinate reference system into target.
Parameters
• src_crs (CRS or dict) – Source coordinate reference system, in rasterio dict format. Ex-
ample: CRS({‘init’: ‘EPSG:4326’})
• dst_crs (CRS or dict) – Target coordinate reference system.
• geom (GeoJSON like dict object or iterable of GeoJSON like objects.) –
rasterio.windows module
Notes
Previously the lengths were called ‘num_cols’ and ‘num_rows’ but this is a bit confusing in the new float precision
world and the attributes have been changed. The originals are deprecated.
col_off
crop(height, width)
Return a copy cropped to height and width
flatten()
A flattened form of the window.
Returns
col_off, row_off, width, height – Window offsets and lengths.
Return type
float
classmethod from_slices(rows, cols, height=-1, width=-1, boundless=False)
Construct a Window from row and column slices or tuples / lists of start and stop indexes. Converts the
rows and cols to offsets, height, and width.
In general, indexes are defined relative to the upper left corner of the dataset: rows=(0, 10), cols=(0, 4)
defines a window that is 4 columns wide and 10 rows high starting from the upper left.
Start indexes may be None and will default to 0. Stop indexes may be None and will default to width or
height, which must be provided in this case.
Negative start indexes are evaluated relative to the lower right of the dataset: rows=(-2, None), cols=(-2,
None) defines a window that is 2 rows high and 2 columns wide starting from the bottom right.
Parameters
• rows (slice, tuple, or list) – Slices or 2 element tuples/lists containing start, stop
indexes.
• cols (slice, tuple, or list) – Slices or 2 element tuples/lists containing start, stop
indexes.
• height (float) – A shape to resolve relative values against. Only used when a start or
stop index is negative or a stop index is None.
• width (float) – A shape to resolve relative values against. Only used when a start or stop
index is negative or a stop index is None.
• boundless (bool, optional) – Whether the inputs are bounded (default) or not.
Return type
Window
height
intersection(other)
Return the intersection of this window and another
Parameters
other (Window) – Another window
Return type
Window
round_lengths(**kwds)
Return a copy with width and height rounded.
Lengths are rounded to the nearest whole number. The offsets are not changed.
Parameters
kwds (dict) – Collects keyword arguments that are no longer used.
Return type
Window
round_offsets(**kwds)
Return a copy with column and row offsets rounded.
Offsets are rounded to the preceding whole number. The lengths are not changed.
Parameters
kwds (dict) – Collects keyword arguments that are no longer used.
Return type
Window
round_shape(**kwds)
row_off
todict()
A mapping of attribute names and values.
Return type
dict
toranges()
Makes an equivalent pair of range tuples
toslices()
Slice objects for use as an ndarray indexer.
Returns
row_slice, col_slice – A pair of slices in row, column order
Return type
slice
width
class rasterio.windows.WindowMethodsMixin
Bases: object
Mixin providing methods for window-related calculations. These methods are wrappers for the functionality in
rasterio.windows module.
A subclass with this mixin MUST provide the following properties: transform, height and width.
window(left, bottom, right, top, precision=None)
Get the window corresponding to the bounding coordinates.
The resulting window is not cropped to the row and column limits of the dataset.
Parameters
• left (float) – Left (west) bounding coordinate
• bottom (float) – Bottom (south) bounding coordinate
Return type
Window
rasterio.windows.evaluate(window, height, width, boundless=False)
Evaluates a window tuple that may contain relative index values.
The height and width of the array the window targets is the context for evaluation.
Parameters
• window (Window or tuple of (rows, cols).) – The input window.
• height (int) – The number of rows or columns in the array that the window targets.
• width (int) – The number of rows or columns in the array that the window targets.
Returns
A new Window object with absolute index values.
Return type
Window
rasterio.windows.from_bounds(left, bottom, right, top, transform=None, height=None, width=None,
precision=None)
Get the window corresponding to the bounding coordinates.
Parameters
• left (float, required) – Left (west) bounding coordinates
• bottom (float, required) – Bottom (south) bounding coordinates
• right (float, required) – Right (east) bounding coordinates
• top (float, required) – Top (north) bounding coordinates
• transform (Affine, required) – Affine transform matrix.
• precision (int, optional) – These parameters are unused, deprecated in rasterio 1.3.0,
and will be removed in version 2.0.0.
• height (int, optional) – These parameters are unused, deprecated in rasterio 1.3.0, and
will be removed in version 2.0.0.
• width (int, optional) – These parameters are unused, deprecated in rasterio 1.3.0, and
will be removed in version 2.0.0.
Returns
A new Window.
Return type
Window
Raises
WindowError – If a window can’t be calculated.
rasterio.windows.get_data_window(arr, nodata=None)
Window covering the input array’s valid data pixels.
Parameters
• arr (numpy ndarray, <= 3 dimensions) –
• nodata (number) – If None, will either return a full window if arr is not a masked array, or
will use the mask to determine non-nodata pixels. If provided, it must be a number within
the valid range of the dtype of the input array.
Return type
Window
rasterio.windows.intersect(*windows)
Test if all given windows intersect.
Parameters
windows (sequence) – One or more Windows.
Returns
True if all windows intersect.
Return type
bool
rasterio.windows.intersection(*windows)
Innermost extent of window intersections.
Will raise WindowError if windows do not intersect.
Parameters
windows (sequence) – One or more Windows.
Return type
Window
rasterio.windows.iter_args(function)
Decorator to allow function to take either *args or a single iterable which gets expanded to *args.
rasterio.windows.round_window_to_full_blocks(window, block_shapes, height=0, width=0)
Round window to include full expanse of intersecting tiles.
Parameters
• window (Window) – The input window.
• block_shapes (tuple of block shapes) – The input raster’s block shape. All bands
must have the same block/stripe structure
Return type
Window
rasterio.windows.shape(window, height=-1, width=-1)
The shape of a window.
height and width arguments are optional if there are no negative values in the window.
Parameters
• window (Window) – The input window.
• height (int, optional) – The number of rows or columns in the array that the window
targets.
• width (int, optional) – The number of rows or columns in the array that the window
targets.
Returns
The number of rows and columns of the window.
Return type
num_rows, num_cols
rasterio.windows.toranges(window)
Normalize Windows to range tuples
rasterio.windows.transform(window, transform)
Construct an affine transform matrix relative to a window.
Parameters
• window (Window) – The input window.
• transform (Affine) – an affine transform matrix.
Returns
The affine transform matrix for the given window
Return type
Affine
rasterio.windows.union(*windows)
Union windows and return the outermost extent they cover.
Parameters
windows (sequence) – One or more Windows.
Return type
Window
rasterio.windows.validate_length_value(instance, attribute, value)
Rasterio
class rasterio.Band(ds, bidx, dtype, shape)
Bases: tuple
Band of a Dataset.
Parameters
• ds (dataset object) – An opened rasterio dataset object.
• bidx (int or sequence of ints) – Band number(s), index starting at 1.
• dtype (str) – rasterio data type of the data.
• shape (tuple) – Width, height of band.
bidx
Alias for field number 1
ds
Alias for field number 0
dtype
Alias for field number 2
shape
Alias for field number 3
rasterio.band(ds, bidx)
A dataset and one or more of its bands
Parameters
• ds (dataset object) – An opened rasterio dataset object.
• bidx (int or sequence of ints) – Band number(s), index starting at 1.
Return type
Band
rasterio.open(fp, mode='r', driver=None, width=None, height=None, count=None, crs=None, transform=None,
dtype=None, nodata=None, sharing=False, opener=None, **kwargs)
Open a dataset for reading or writing.
The dataset may be located in a local file, in a resource located by a URL, or contained within a stream of bytes.
This function accepts different types of fp parameters. However, it is almost always best to pass a string that has
a dataset name as its value. These are passed directly to GDAL protocol and format handlers. A path to a zipfile
is more efficiently used by GDAL than a Python ZipFile object, for example.
In read (‘r’) or read/write (‘r+’) mode, no keyword arguments are required: these attributes are supplied by the
opened dataset.
In write (‘w’ or ‘w+’) mode, the driver, width, height, count, and dtype keywords are strictly required.
Parameters
• fp (str, os.PathLike, file-like, or rasterio.io.MemoryFile) – A filename
or URL, a file object opened in binary (‘rb’) mode, a Path object, or one of the rasterio classes
that provides the dataset-opening interface (has an open method that returns a dataset). Use
a string when possible: GDAL can more efficiently access a dataset if it opens it natively.
• mode (str, optional) – ‘r’ (read, the default), ‘r+’ (read/write), ‘w’ (write), or ‘w+’
(write/read).
• driver (str, optional) – A short format driver name (e.g. “GTiff” or “JPEG”) or a
list of such names (see GDAL docs at https://gdal.org/drivers/raster/index.html). In ‘w’ or
‘w+’ modes a single name is required. In ‘r’ or ‘r+’ modes the driver can usually be omitted.
Registered drivers will be tried sequentially until a match is found. When multiple drivers are
available for a format such as JPEG2000, one of them can be selected by using this keyword
argument.
• width (int, optional) – The number of columns of the raster dataset. Required in ‘w’
or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
• height (int, optional) – The number of rows of the raster dataset. Required in ‘w’ or
‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
• count (int, optional) – The count of dataset bands. Required in ‘w’ or ‘w+’ modes, it
is ignored in ‘r’ or ‘r+’ modes.
• crs (str, dict, or CRS, optional) – The coordinate reference system. Required in
‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
• transform (affine.Affine, optional) – Affine transformation mapping the pixel
space to geographic space. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
• dtype (str or numpy.dtype, optional) – The data type for bands. For example:
‘uint8’ or rasterio.uint16. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
• nodata (int, float, or nan, optional) – Defines the pixel value to be interpreted
as not valid data. Required in ‘w’ or ‘w+’ modes, it is ignored in ‘r’ or ‘r+’ modes.
• sharing (bool, optional) – To reduce overhead and prevent programs from running out
of file descriptors, rasterio maintains a pool of shared low level dataset handles. If True this
function will use a shared handle if one is available. Multithreaded programs must avoid
sharing and should set sharing to False.
• opener (callable, optional) – A custom dataset opener which can serve GDAL’s vir-
tual filesystem machinery via Python file-like objects. The underlying file-like object is ob-
tained by calling opener with (fp, mode) or (fp, mode + “b”) depending on the format driver’s
native mode. opener must return a Python file-like object that provides read, seek, tell, and
close methods.
• kwargs (optional) – These are passed to format drivers as directives for creating or inter-
preting datasets. For example: in ‘w’ or ‘w+’ modes a tiled=True keyword argument will
direct the GeoTIFF format driver to create a tiled, rather than striped, TIFF.
Returns
• rasterio.io.DatasetReader – If mode is “r”.
• rasterio.io.DatasetWriter – If mode is “r+”, “w”, or “w+”.
Raises
• TypeError – If arguments are of the wrong Python type.
• rasterio.errors.RasterioIOError – If the dataset can not be opened. Such as when
there is no dataset with the given name.
• rasterio.errors.DriverCapabilityError – If the detected format driver does not
support the requested opening mode.
Examples
To open a local GeoTIFF dataset for reading using standard driver discovery and no directives:
To create a new 8-band, 16-bit unsigned, tiled, and LZW-compressed GeoTIFF with a global extent and 0.5
degree resolution:
Notes
SEVEN
CONTRIBUTING
First of all: the Rasterio project has a code of conduct. Please read the CODE_OF_CONDUCT.txt file, it’s important
to all of us.
7.2 Rights
235
rasterio Documentation, Release 1.4dev
Rasterio’s API is both similar to and different from GDAL’s API and this is intentional.
• Rasterio is a library for reading and writing raster datasets. Rasterio uses GDAL but is not a “Python binding for
GDAL.”
• Rasterio aims to hide, or at least contain, the complexity of GDAL.
• Rasterio always prefers Python’s built-in protocols and types or Numpy protocols and types over concepts from
GDAL’s data model.
• Rasterio keeps I/O separate from other operations. rasterio.open() is the only library function that operates
on filenames and URIs. dataset.read(), dataset.write(), and their mask counterparts are the methods
that perform I/O.
• Rasterio methods and functions should be free of side-effects and hidden inputs. This is challenging in practice
because GDAL embraces global variables.
• Rasterio leans on analogies to other familiar Python APIs.
Our term for the kind of object that allows read and write access to raster data is dataset object. A dataset object
might be an instance of DatasetReader or DatasetWriter. The canonical way to create a dataset object is by using the
rasterio.open() function.
This is analogous to Python’s use of file object.
A path object specifies the name and address of a dataset within some space (filesystem, internet, cloud) along with
optional parameters. The first positional argument of rasterio.open() is a path. Some path objects also have an
open method which can used used to create a dataset object.
Unlike GDAL’s original original data model, rasterio has no band objects. In this way it’s more like GDAL’s multi-
dimensional API. A dataset’s read() method returns N-D arrays.
GDAL depends on some global context: a format driver registry, dataset connection pool, a raster block cache, a file
header cache. Rasterio depends on this, too, but unlike GDAL’s official Python bindings, delays initializing this context
as long as possible and abstracts it with the help of a Python context manager.
We use a variant of centralized workflow described in the Git Book. Since Rasterio 1.0 we tag and release versions in
the form: x.y.z version from the maint-x.y branch.
Work on features in a new branch of the mapbox/rasterio repo or in a branch on a fork. Create a GitHub pull request when
the changes are ready for review. We recommend creating a pull request as early as possible to give other developers a
heads up and to provide an opportunity for valuable early feedback.
7.10 Conventions
The rasterio namespace contains both Python and C extension modules. All C extension modules are written using
Cython. The Cython language is a superset of Python. Cython files end with .pyx and .pxd and are where we keep
all the code that calls GDAL’s C functions.
Rasterio works with Python versions 3.6 through 3.9.
We strongly prefer code adhering to PEP8.
Tests are mandatory for new code. We use pytest. Use pytest’s parameterization feature.
We aspire to 100% coverage for Python modules but coverage of the Cython code is a future aspiration (#515).
Use darker to reformat code as you change it. We aren’t going to run black on everything all at once.
Type hints are welcome as a part of refactoring work or new feature development. We aren’t going to make a large
initiative about adding hints to everything.
Changes should be noted in CHANGES.txt. New entries go above older entries.
Rasterio has a new Dockerfile that can be used to create images and containers for exploring or testing the package.
The command make dockertest will build a Docker image based on one of the official GDAL images, start a con-
tainer that mounts the working directory, and run python setup.py develop && python -m pytest in the con-
tainer.
If you prefer not to use the new development environment you may install rasterio’s dependencies directly onto your
computer.
Developing Rasterio requires Python 3.6 or any final release after and including 3.10. We prefer developing with the
most recent version of Python but recognize this is not possible for all contributors. A C compiler is also required to
leverage existing protocols for extending Python with C or C++. See the Windows install instructions in the readme
for more information about building on Windows.
Development should occur within a virtual environment to better isolate development work from custom environments.
In some cases installing a library with an accompanying executable inside a virtual environment causes the shell to
initially look outside the environment for the executable. If this occurs try deactivating and reactivating the environment.
The GDAL library and its headers are required to build Rasterio. We do not have currently have guidance for any
platforms other than Linux and OS X.
On Linux, GDAL and its headers should be available through your distro’s package manager. For Ubuntu the commands
are:
Provision a virtualenv with Rasterio’s build requirements. Rasterio’s setup.py script will not run unless Cython and
Numpy are installed, so do this first from the Rasterio repo directory.
Linux users may need to install some additional Numpy dependencies:
then:
Rasterio, its Cython extensions, normal dependencies, and dev dependencies can be installed with $ pip. Installing
Rasterio in editable mode while developing is very convenient but only affects the Python files. Specifying the [test]
extra in the command below tells $ pip to also install Rasterio’s dev dependencies.
Any time a Cython (.pyx or .pxd) file is edited the extension modules need to be recompiled, which is most easily
achieved with:
$ pip install -e .
When switching between Python versions the extension modules must be recompiled, which can be forced with $
touch rasterio/*.pyx and then re-installing with the command above. If this is not done an error claiming that an
object has the wrong size, try recompiling is raised.
The dependencies required to build the docs can be installed with:
Rasterio’s tests live in tests <tests/> and generally match the main package layout.
To run the entire suite and the code coverage report:
Note: rasterio must be installed in editable mode in order to run tests.
A single test:
EIGHT
This exception is raised when none of rasterio’s format drivers can successfully open the specified dataset. In some
cases it might be because the path is malformed, or the file is corrupted. Often, it is because your installation of rasterio
does not provide the format driver. ECW, for example, is an optional format driver and is not provided by the rasterio
wheels in the Python Package Index. We’d like to keep the size of wheels to < ~20MB, and that means some GDAL
features and format drivers must be left out. Other distribution channels for rasterio, such as conda-forge, may have
different and larger sets of format drivers.
To see a list of the format drivers provided by your rasterio installation, run in your shell
The full message is “ERROR 4: Unable to open EPSG support file gcs.csv. Try setting the GDAL_DATA environment
variable to point to the directory containing EPSG csv files.” The GDAL/OGR library prints this text to your process’s
stdout stream when it can not find the gcs.csv data file it needs to interpret spatial reference system information stored
with a dataset. If you’ve never seen this before, you can summon this message by setting GDAL_DATA to a bogus value
in your shell and running a command like ogrinfo:
241
rasterio Documentation, Release 1.4dev
If you’re using GDAL software installed by a package management system like apt or yum, or Homebrew, or if you’ve
built and installed it using configure; make; make install, you don’t need to set the GDAL_DATA environment
variable. That software has the right directory path built in. If you see this error, it’s likely a sign that GDAL_DATA is
set to a bogus value. Unset GDAL_DATA if it exists and see if that eliminates the error condition and the message.
Important: Activate your conda environments. The GDAL conda package will set GDAL_DATA to the proper value
if you activate your conda environment. If you don’t activate your conda enviornment, you are likely to see the error
message shown above.
8.3 Why can’t rasterio find proj.db (rasterio versions < 1.2.0)?
Important: Activate your conda environments. The PROJ conda package will set PROJ_LIB (PROJ < 9.1) |
PROJ_DATA (PROJ 9.1+) to the proper value if you activate your conda environment. If you don’t activate your conda
enviornment, you are likely to see the exception shown above.
8.4 Why can’t rasterio find proj.db (rasterio from PyPI versions >=
1.2.0)?
Starting with version 1.2.0, rasterio wheels on PyPI include PROJ 7.x and GDAL 3.x. The libraries and modules in
these wheels are incompatible with older versions of PROJ that may be installed on your system. If PROJ_LIB (PROJ
< 9.1) | PROJ_DATA (PROJ 9.1+) is set in your program’s environment and points to an older version of PROJ, you
must unset this variable. Rasterio will then use the version of PROJ contained in the wheel.
NINE
• genindex
• modindex
• search
243
rasterio Documentation, Release 1.4dev
r
rasterio, 231
rasterio.control, 117
rasterio.coords, 118
rasterio.crs, 119
rasterio.drivers, 125
rasterio.dtypes, 125
rasterio.enums, 126
rasterio.env, 131
rasterio.errors, 134
rasterio.features, 137
rasterio.fill, 142
rasterio.io, 142
rasterio.mask, 184
rasterio.merge, 186
rasterio.path, 188
rasterio.plot, 188
rasterio.profiles, 190
rasterio.rpc, 190
rasterio.sample, 192
rasterio.session, 193
rasterio.shutil, 203
rasterio.tools, 203
rasterio.transform, 204
rasterio.vrt, 208
rasterio.warp, 221
rasterio.windows, 225
245
rasterio Documentation, Release 1.4dev
247
rasterio Documentation, Release 1.4dev
248 Index
rasterio Documentation, Release 1.4dev
rio-shapes command line option, 113 rio-merge command line option, 107
--not-masked rio-rasterize command line option, 110
rio-calc command line option, 98 rio-warp command line option, 116
rio-info command line option, 105 --resampling
--output rio-merge command line option, 107
rio-blocks command line option, 96 rio-overview command line option, 109
rio-calc command line option, 98 rio-warp command line option, 116
rio-clip command line option, 99 --rgb
rio-convert command line option, 100 rio-convert command line option, 101
rio-mask command line option, 106 rio-stack command line option, 114
rio-merge command line option, 107 --rs
rio-rasterize command line option, 110 rio-blocks command line option, 96
rio-shapes command line option, 113 rio-bounds command line option, 97
rio-stack command line option, 114 rio-gcps command line option, 103
rio-warp command line option, 116 rio-shapes command line option, 113
--overwrite --sampling
rio-calc command line option, 98 rio-shapes command line option, 113
rio-clip command line option, 99 --scale-offset
rio-convert command line option, 101 rio-convert command line option, 100
rio-mask command line option, 106 --scale-ratio
rio-merge command line option, 108 rio-convert command line option, 100
rio-rasterize command line option, 110 --sequence
rio-stack command line option, 114 rio-blocks command line option, 96
rio-warp command line option, 117 rio-bounds command line option, 97
--precision rio-shapes command line option, 113
rio-blocks command line option, 96 --shape
rio-bounds command line option, 96 rio-info command line option, 104
rio-gcps command line option, 103 --src_crs
rio-merge command line option, 108 rio-rasterize command line option, 110
rio-shapes command line option, 113 rio-transform command line option, 115
rio-transform command line option, 115 --src-bounds
--profile rio-warp command line option, 116
rio-calc command line option, 98 --src-crs
rio-clip command line option, 100 rio-rasterize command line option, 110
rio-convert command line option, 101 rio-transform command line option, 115
rio-mask command line option, 106 --src-nodata
rio-merge command line option, 108 rio-warp command line option, 117
rio-rasterize command line option, 110 --stats
rio-stack command line option, 114 rio-info command line option, 105
rio-warp command line option, 117 --subdatasets
--proj-data rio-info command line option, 105
rio-env command line option, 102 --tag
--projected rio-edit_info command line option, 102
rio-blocks command line option, 96 --tags
rio-bounds command line option, 97 rio-info command line option, 104
rio-clip command line option, 99 --target-aligned-pixels
rio-gcps command line option, 103 rio-warp command line option, 117
rio-shapes command line option, 113 --tell-me-more
--property rio-info command line option, 105
rio-rasterize command line option, 110 --threads
--rebuild rio-warp command line option, 117
rio-overview command line option, 109 --to
--res rio-warp command line option, 117
rio-info command line option, 104 --to-data-window
Index 249
rasterio Documentation, Release 1.4dev
250 Index
rasterio Documentation, Release 1.4dev
Index 251
rasterio Documentation, Release 1.4dev
252 Index
rasterio Documentation, Release 1.4dev
Index 253
rasterio Documentation, Release 1.4dev
254 Index
rasterio Documentation, Release 1.4dev
Index 255
rasterio Documentation, Release 1.4dev
256 Index
rasterio Documentation, Release 1.4dev
Index 257
rasterio Documentation, Release 1.4dev
258 Index
rasterio Documentation, Release 1.4dev
Index 259
rasterio Documentation, Release 1.4dev
260 Index
rasterio Documentation, Release 1.4dev
Index 261
rasterio Documentation, Release 1.4dev
262 Index
rasterio Documentation, Release 1.4dev
Z
ZipMemoryFile (class in rasterio.io), 183
zstd (rasterio.enums.Compression attribute), 127
Index 263