Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Zooscan

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Digital zooplankton image analysis using


the ZooScan integrated system
GABY GORSKY1,2*, MARK D. OHMAN3, MARC PICHERAL1,2, STÉPHANE GASPARINI1,2, LARS STEMMANN1,2, JEAN-
BAPTISTE ROMAGNAN3, ALISON CAWOOD3, STÉPHANE PESANT4, CARMEN GARCÍA-COMAS1,2,5 AND FRANCK PREJGER1,2
1 2
UPMC UNIVERSITY OF PARIS 06, UMR 7093, LOV, OBSERVATOIRE OCÉANOGRAPHIQUE F-06234, VILLEFRANCHE/MER, FRANCE, CNRS, UMR 7093, LOV,
3
OBSERVATOIRE OCÉANOGRAPHIQUE, F-06234 VILLEFRANCHE/MER, FRANCE, CALIFORNIA CURRENT ECOSYSTEM LTER SITE, SCRIPPS INSTITUTION OF
4
OCEANOGRAPHY, LA JOLLA, CA 92093-0218, USA, MARUM, INSTITUTE FOR MARINE ENVIRONMENTAL SCIENCES, UNIVERSITY BREMEN, LEOBENER STRASSE,
5

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


POP 330 440, 28359 BREMEN, GERMANY AND STAZIONE ZOOLOGICA ANTON DOHRN, VILLA COMUNALE, 80121 NAPOLI, ITALY

*CORRESPONDING AUTHOR: gorsky@obs-vlfr.fr

Received July 14, 2009; accepted in principle October 26, 2009; accepted for publication November 14, 2009

Corresponding editor: Roger Harris

ZooScan with ZooProcess and Plankton Identifier (PkID) software is an integrated


analysis system for acquisition and classification of digital zooplankton images
from preserved zooplankton samples. Zooplankton samples are digitized by the
ZooScan and processed by ZooProcess and PkID in order to detect, enumerate,
measure and classify the digitized objects. Here we present a semi-automatic
approach that entails automated classification of images followed by manual vali-
dation, which allows rapid and accurate classification of zooplankton and abiotic
objects. We demonstrate this approach with a biweekly zooplankton time series
from the Bay of Villefranche-sur-mer, France. The classification approach pro-
posed here provides a practical compromise between a fully automatic method
with varying degrees of bias and a manual but accurate classification of zooplank-
ton. We also evaluate the appropriate number of images to include in digital learn-
ing sets and compare the accuracy of six classification algorithms. We evaluate the
accuracy of the ZooScan for automated measurements of body size and present
relationships between machine measures of size and C and N content of selected
zooplankton taxa. We demonstrate that the ZooScan system can produce useful
measures of zooplankton abundance, biomass and size spectra, for a variety of
ecological studies.

I N T RO D U C T I O N Recent advances in image processing and pattern


recognition of plankton have made it possible to auto-
Historically, zooplankton have been sampled primarily
matically or semi-automatically identify and quantify
by surveys that use nets, pumps or water bottles to
the composition of plankton assemblages at a relatively
collect specimens for quantifying distributional pat-
coarse taxonomic level (Benfield et al., 2007). The
terns. While such surveys provide invaluable infor-
importance of this approach was recognized by the
mation on species and life stages, their temporal and
Scientific Committee on Oceanic Research (SCOR),
spatial resolution is usually limited, owing to the time
who created an international working group to
and resources required for sample analysis by trained
evaluate the state of Automatic Visual Plankton
microscopists. This limited resolution of zooplankton
Identification (http://www.scor-wg130.net). The hope
data sets reduces our ability to understand processes
is that the advent of digital imaging technology,
controlling pelagic ecosystem dynamics on multiple
combined with better algorithms for machine learning
time and space scales.
and increased computer capacity, will facilitate much

doi:10.1093/plankt/fbp124, available online at www.plankt.oxfordjournals.org.


# The Author 2009. Published by Oxford University Press. All rights reserved. For permissions, please email: journals.permissions@oxfordjournals.org
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

more rapid means for characterizing plankton distri- Indicators program: (http://www.ciesm.org/marine/
butions assessed from a variety of different sampling programs/zooplankton.htm).
methods. In this paper, we first describe the overall approach
Early attempts to use optical bench-top methods for used, including ZooScan hardware together with
treatment of plankton samples were undertaken by ZooProcess and PkID software. We discuss building and
Ortner et al. (Ortner et al., 1979) who used silhouette validating training sets, the selection of classification
photography to record the contents of a plankton algorithms and the accuracy of body size and biomass
sample. Silhouette imaging of plankton samples on estimations that can be derived from the ZooScan
photographic film or video imaging and a limited system. We propose standards for long-term archiving
digitization of plankton samples followed by automatic and sharing of raw and processed images and output
identification was further developed in the 1980s files. We demonstrate a semiautomatic classification

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


(Jeffries et al., 1980, 1984; Rolke and Lenz, 1984; approach based on human validation of automated zoo-
Gorsky et al., 1989; Berman, 1990). A variety of bench plankton image analysis that provides highly reliable
top methods is now under development (e.g. Benfield results that are appropriate for quantitative ecological
et al., 2007). In addition to developments in the ocean studies. Second, we illustrate the procedures for sample
sciences, automated image analysis is commonly and data analysis through specific application of the
applied in other fields of biology and medical sciences. ZooScan system to an annual time series of zooplank-
Within the geosciences, machine learning is often ton samples from the Bay of Villefranche-sur-mer.
applied to quantify the morphology of fossils (Kennett,
1968; Bollmann et al., 2004).
A wide range of image analysis and treatment soft-
METHOD
ware exists from these various fields. Most can be
adapted for enumeration and measurement of particles, Sequential steps for sample preparation and scanning
but zooplankton pattern recognition is a much more with ZooScan hardware, together with image proces-
challenging goal. Most zooplankton taxa display high sing with ZooProcess and PkID software, are explained
shape variability. Other difficulties include the diversity below. Appendix 1 lists a glossary of terms related to
of body orientations relative to the imaging plane, image analysis.
differences in extension of appendages, damaged indi-
viduals and variable quantities of amorphous organic
aggregates that must be distinguished by automated rec- Building learning sets
ognition methods. With these challenges, it is not sur- In experiments to determine the optimal number of
prising that recent papers show relatively low objects to sort into each category when constructing a
automated zooplankton classification efficiency (Bell learning set (see Appendix 1), we selected eight cat-
and Hopcroft, 2008; Irigoien et al., 2009). egories of organisms scanned from Villefranche-sur-mer,
Progress in scanner technology has made it feasible each with more than 950 vignettes. We randomly
to digitize good quality images of large numbers of extracted subsets of 10, 20, 30, . . . 900 vignettes from
plankton individuals simultaneously. The hardware pre- each of the eight categories. These subsets were con-
sented here is not the only system based on scanner sidered independent learning sets and we ran a classifier
technology that can be used for zooplankton image on each to assess the recall ( percent true positives) and
treatment (e.g. Wiebe et al., 2004; Bell and Hopcroft, contamination ( percent false positives) as a function of
2008; Irigoien et al., 2009). We have in the past used size of the learning set.
such systems by adapting commercial scanners
(Grosjean et al., 2004). However, a series of problems
led us to build an industrialized, rugged, water-resistant Morphometric measurements and biomass
ZooScan suitable for organisms ranging in size from ZooScan analyses provide sensitive measures of body
200 mm to several centimeters, together with dedicated size, which can be converted to size spectra. To calcu-
imaging software we call ZooProcess and Plankton late the biovolume of an object from its cross-sectional
Identifier (PkID). ZooScans can be calibrated so that area, it is necessary to know the geometric shape of the
different ZooScan units produce normalized images of object, the ratio of its major and minor axes and its
identical optical characteristics that can be inter- orientation relative to the illumination system.
compared among laboratories, facilitating cooperative Copepods can be represented as ellipsoids (Herman,
sample analysis. Such a network of calibrated ZooScan 1992). The ZooScan provides estimates of body length
instruments currently exists in the Mediterranean region (here major axis of the best fitting ellipse) and
in the framework of the CIESM Zooplankton width (here minor axis). We evaluated the accuracy of

286
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

ZooScan measurements of body length and between 22 August 2007 and 8 October 2008. These
cross-sectional area. Automated measurements of pre- samples were scanned by ZooScan in two size fractions,
served zooplankton as recorded by ImageJ in ,1 mm and .1 mm, leading to a set of 60 scans. For
ZooProcess were compared with manual measurements classification, we began with a learning set of 13 cat-
of several zooplankton taxa (appendicularians, chaetog- egories (10 zooplankton þ 3 non-zooplankton) that we
naths, copepods, euphausiids, ostracods and thecosome had created previously. This can be downloaded at:
pteropods) collected in the California Current on (http://www.obs-vlfr.fr/LOV/ZooPart/ZooScan/
CalCOFI (California Cooperative Oceanic Fisheries Training_Set_Villefranche/esmeraldo_learning_set.zip).
Investigations) cruises. Specimens were collected along
CalCOFI line 80 between February 2006 and August
2008 and preserved in formaldehyde buffered with

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


sodium tetraborate. Manual measurements were made Description of the ZooScan system
using a calibrated on-screen measuring tool, and com-
pared with machine-measured feret diameter, major System overview
elliptical axis, minor elliptical axis and equivalent circu-
lar diameter (ECD) of the same individuals, identified Hardware. The ZooScan (http://www.zooscan.com) is
manually. ECD was determined from the variable “area composed of two main waterproof elements that allow
excluded,” which excludes clear regions in the interior safe processing of liquid samples. The hinged base con-
of an organism from the cross-sectional area of the tains a high resolution imaging device and a drainage
organism. Manual length measurements of curved channel that is used for sample recovery (Fig. 1). The
organisms (e.g. chaetognaths and appendicularians) top cover generates even illumination and houses an
were made by summing a series of line segments along optical density (OD) reference cell. Although the
the central axis of the organism. ZooScan permits scanning at higher resolution than
For C and N relationships, live zooplankton (cope- 2400 dpi, the optical pathway through two successive
pods, chaetognaths and euphausiids) from the interfaces (air to water and water to glass) presently
California Current were anaesthetized with carbonated limits the working resolution to this value. With a pixel
water (diluted 1:4 with seawater), scanned, manually resolution of 10.6 mm, the ZooScan is well suited for
identified and individually measured. Multiple species organisms larger than 200 mm.
were included in each higher taxon analyzed, in order The imaging area of the ZooScan is defined by the
to obtain group-specific relationships. The cross- choice of one of two transparent frames (11  24 cm or
sectional area of chaetognaths and euphausiids was 15  24 cm) inserted inside the scanning cell. Both
measured manually on-screen using multiple rectangles frames have a 5 mm step; water is added above this step
drawn within the outline of each organism. The area of to avoid forming a meniscus on the periphery of the
copepods was measured using two ellipses, one defining image. Both frames permit the acquisition and proces-
the prosome and the other the urosome. These manual sing of scans as a single image, avoiding biases that may
measurements were compared with machine measure- occur when an image is divided into multiple cells.
ments of the same individuals. Organisms were dried
overnight at 608C and the carbon and nitrogen content
of the individual organisms determined at the analytical
facility of the Scripps Institution of Oceanography using
an elemental analyzer (Costech Analytical Technologies
model 4010) calibrated with acetanilide.

Case study: Bay of Villefranche-sur-mer


To illustrate application of the ZooScan system (www
.zooscan.com), we analyzed a series of samples describ-
ing annual variation of zooplankton from Pt. B (438 41
.100 N, 78 18.940 E) in the Bay of Villefranche-sur-mer,
France. Zooplankton were sampled with a 57 cm diam-
eter WP2 net with a mesh size of 200 mm retrieved ver-
tically at 1 m s21 from a depth of 75 m to the surface,
and fixed in 4% v/v formaldehyde buffered with Fig. 1. Sample recovery from the ZooScan, illustrating the top cover,
sodium tetraborate. Thirty vertical hauls were made hinged base and sample recovery tray.

287
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Software. The sequence followed in scanning and analy- Sample treatment. The aliquot volume of a plankton
sis of zooplankton samples is shown schematically in sample to be analyzed is determined by the abundance
Fig. 2. The initial steps are completed using ZooProcess and size distribution of the organisms. It is important to
software: (i) scan and process a blank background minimize coincidence of overlapping animals on the
image, (ii) scan the sample to acquire a high quality raw optical surface. At present, a maximum of approximately
image, linked to associated metadata, (iii) normalize the 1000–1500 objects is scanned in the larger frame,
raw image and convert to full grey scale range, (iv) although this value can be exceeded. Because the abun-
process images by subtracting the background and dance of organisms usually decreases with increasing
removing frame edges, (v) extract and measure individ- body size, it is preferable to scan two (or more) separate
ual objects. Subsequent analysis steps are done with size fractions of each sample. One fraction contains
ZooProcess in combination with PkID: (vi) create a larger individuals that are less abundant, obtained from a

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


learning set comprised of representative images from larger sample aliquot, and the other includes the more
each category of organisms or objects that will be ident- numerous smaller individuals, from a smaller aliquot. A
ified, (vii) build a classifier to optimize the capability to mesh size of 1000 mm is efficient for separating large and
accurately recognize the desired categories, and create a small size fractions of mesozooplankton.
confusion matrix (CM) to verify the classifier, (ix) apply Only immobile organisms (i.e. preserved or anaesthe-
the classifier to the suite of unidentified objects and (x) tized) can be scanned, because they must remain still for
manually inspect the classified objects and move any 150 s. Prior to sample processing, the fixative is
misidentified objects to the appropriate category. These removed and replaced with either filtered sea water or
steps are explained more fully below. tap water. Water should be at room temperature in order

Fig. 2. Schematic illustration of the primary steps in the scanning and analysis of zooplankton samples with the ZooScan/ZooProcess/Plankton
Identifier system.

288
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

to avoid air bubble formation. We do not stain samples, Image processing. ZooProcess provides two methods for
in order to maintain them unaltered for future compara- removing a heterogeneous background. A daily scan of
tive studies. Although ZooProcess software provides a tool the cell filled with filtered water is recommended, because
to separate overlapping organisms once the sample has the background image provides a blank and also records
been scanned, it is important to physically separate instrument stability over time. A background scan is faster
touching organisms in the scanning frame and separate to process and requires less computer memory than the
them from the frame edges prior to digitizing the sample. second option, the rolling ball method (Sternberg, 1983),
Manual separation takes 10 min per sample. which requires no blank image to be scanned. A lower
setting of the rolling ball diameter parameter will clean
Detailed description the background, but may create artifacts in zones of
uneven contrast on the bodies of larger organisms. Apart

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


Zooprocess. ZooProcess software is based on the ImageJ from artifacts, measurements made on the same image
macro language (Abramoff et al., 2004; Rasband, 2005). In processed using the two background subtraction methods
addition to guiding the primary steps of scanning, normal- differ less than 1%, thus the rolling ball can be used as an
ization and object detection, ZooProcess provides tools for alternative when no blank image is available.
quality control and is linked to PkID software. Results pre- ZooProcess next measures the grey level in the OD
sented here are based on ZooProcess default parameters. reference disk area. This measurement is compared with
ZooProcess records true raw 16 bit grey images from the the theoretical value calibrated in the factory. ZooProcess
ZooScan charged couple device and creates blank images then detects the limits of the transparent frame and dis-
to be subtracted from normalized images. cards the irrelevant parts of the image. Objects touching
the sides are automatically removed from the data set.
This image is used for object detection and the extraction
Grey level normalization. Full grey level normalization of of measurement variables from each detected object.
scanned images allows the exchange of images, training
sets or data between different ZooScan units. Normaliza-
tion is done on both the sample image and the back- Extraction of vignettes and attributes. The final image is
ground blank image, which is subtracted later. Grey level segmented at a default level of 243, thus keeping 243
and size are among the most important variables used in grey levels for characterizing organisms. Objects having
automated plankton recognition (e.g. Hu and Davis, an ECD .0.3 mm (default) are detected and processed.
2005). More than 40 attributes (variables) are extracted from
The 16 bit raw image is converted to 8 bit source every object (Santos Filho et al., submitted for publi-
image after determination of both the white point [Wp, cation). All metadata (Appendix 3), log file and the vari-
equation (1)] and the black point [Bp, equation (2)] ables measured (Appendix 4) are stored in a text file
from the median grey level (Mg). The OD range that called a PID file (Plankton Identifier file). All par-
can be resolved by the ZooScan is above 1.8. The ameters used during the imaging process are recorded
sharpness of the background allows setting the white in the log file. Some examples of extracted vignettes
point close to the median grey level independent of the and some measurements are illustrated in Appendix
number and size of the organisms in the image. 5. After extraction of the vignette (Region of Interest, or
ROI), the values of each measurement variable are
Wp ¼ Mg†1:15 ð1Þ associated with that vignette.
Mg
Bp ¼ ð2Þ Additional quality control. Segmented black and white
1:15† logðODÞ
images and the objects’ outlines are recorded for
quality control. The segmented image is checked for
ZooProcess provides a tool to check the efficiency of the background subtraction and correct object contours.
the procedure by scanning standard reference disks A 2D “dot cloud” graph allows selection and visualiza-
(diameter 5.6 mm) with ODs of 0.3 and 0.9. The average tion of vignettes by clicking on a dot or on a selected
grey level values are 150 and 73 (+10%) for the two zone, and the sample image can be visualized with the
disks, respectively, for all ZooScans tested to date after full recognized outlines superimposed.
processing of the images using the default parameters (see
Appendix 2). The normalization parameters and the Plankton Identifier (PkID). PkID permits automatic and
median grey level of both the raw 16 bit image and the semi-automatic classification of plankton images. For the
final 8 bit image are archived in the image log file. ZooScan application, PkID is interfaced with ZooProcess,

289
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

but it can also be used standalone. It has been developed combinations of variables and immediately test their
in DELPHI (Borland), because the source code can be suitability for a particular classification task using cross
compiled. For supervised learning, PkID works in validation.
conjunction with Tanagra (Rakotomalala, 2005; http://
eric.univ-lyon2.fr/~ricco/tanagra/en/tanagra.html), also
developed in DELPHI. Source code for PkID can be Data analysis and performance evaluations. Evaluation of
obtained on request for customization (see Gasparini, classifier performance requires the examination of a
2007). CM, which is a contingency table crossing true (manu-
Three successive steps are followed in applying PkID: ally validated) and predicted (assigned by the classifier)
(i) “Learning” creates training files that link measure- identification of objects. Object counts on the matrix
ments from groups of similar objects (vignettes); (ii) diagonal represent correct identifications and the sum

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


“Data analysis” permits construction and optimization of counts off the diagonal divided by the total number
of classifiers; (iii) “Show results” displays final identifi- of objects gives the overall error rate (accuracy) of the
cations and statistical reports (see: http://www.obs-vlfr. classifier. Correct interpretation of the CM requires the
fr/~gaspari/Plankton_Identifier/userguide.html). examination of each category separately, including the
rate of true positives (number of objects correctly pre-
Learning file creation. In the “learning” section, objects dicted/total number actual objects) as well as false posi-
are grouped into categories by simple visual drag and tives (number of objects falsely assigned to a category/
drop of object vignettes. total number of predicted objects).
There are three ways to build a CM, all available in
Number of categories. The number of categories of PkID. The first, re-substitution CM, involves validation
objects selected is a trade-off between the number of of the classification procedure on the same data set used
retained categories and level of acceptable error to compute the classification functions. Re-substitution
(Culverhouse et al., 2003; Fernandes et al., 2009). CMs systematically underestimate error rates and even
Typically numerous categories are created, each con- give no errors when algorithms such as Random Forest
taining objects of similar visual appearance. Then, from are used. The second entails random partitioning of the
results of classifier performance evaluation, categories initial data set into n equal fractions, n 2 1 fractions
showing high cross-contamination are merged. A new being used to compute the classification model and one
performance evaluation is then conducted and the to validate it; this process is repeated m-times to fill up
process is applied iteratively until acceptable levels of the CM. This procedure, called cross-validation,
error rates are reached. For the case of semi-automated requires more computation time than the re-substitution
classification presented in this paper, a minimal learning CM, but usually gives better error evaluations.
set composed of a few dominant zooplankton categories However, since data used still originate from the same
may be sufficient. data set, error levels usually remain underestimated.
The third method (Dundar et al., 2004) uses two equiv-
Data analysis: algorithm selection. Several supervised alent and independent learning files describing the
learning algorithms are available in the “data analysis” same categories with different objects. One is used to
section of version 1.2.6 of PkID (Table I). build the model and the other to validate it. This pro-
cedure, called test, gives good error estimation but
Selection of measurement variables. It has been shown requires twice the effort of learning file creation.
that irrelevant variables strongly affect performance and Moreover, it cannot easily be applied during the learn-
accuracy of supervised learning methods (Guyon and ing file optimization procedure unless two learning file
Elisseeff, 2003). In PkID, the user can include different optimizations are conducted in parallel.

Table I: The different classifiers in Plankton Identifier (Gasparini, 2007) analyzed in the present study
Name Short description Reference

5-NN k-nearest neighbor using heterogeneous value difference metric Peters et al. (2002)
S-SVC linear Support Vector Machine from LIBSVM library, using linear functions Chang and Lin (2001)
S-SVC RBF Support Vector Machine from LIBSVM library, using radial basis activation functions Chang and Lin (2001)
Random Forest Bagging, decision tree algorithm Breiman (2001)
C 4.5 Decision tree algorithm Quinlan (1993)
Multilayer Perceptron Multilayer Perceptron neural network Simpson et al. (1992)

290
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

Data management. Here we recommend appropriate measurements. Then we present a brief case study from
practices for archiving ZooScan data and metadata. the Bay of Villefranche, in order to illustrate the sequen-
ZooScan data include: (i) raw images of zooplankton tial processes involved in sample and data analysis.
samples or sub-samples, (ii) raw background images
from the system’s hardware, (iii) digital images of indi- Learning set creation
vidual objects (i.e. vignettes), (iv) measurements made
After scanning, normalization, background subtraction
by ZooProcess software on individual objects,
and extraction of vignettes, the first step is to create a
(v) classification results determined automatically or
preliminary learning set or to use an existing learning
semi-automatically using PkID and (vi) computed abun-
set to classify (“predict”) a small number of dominant
dances, biovolumes and biomass. ZooScan metadata
groups. Our experiments to determine the optimal
include: (i) information about sampling and measured
number of objects to sort into each category for con-

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


variables, (ii) image scan and grey level normalization,
struction of a learning set showed that sufficiently high
(iii) algorithm selection and measured variables, (iv)
recall (true positives) and low contamination (false posi-
learning sets and (v) confusion matrices. One of the best
tives) are achieved when approximately 200– 300
practices in data management is to keep data and meta-
objects are sorted per category (Fig. 3), with relatively
data together and, as much as possible, in the same file.
small additional gains beyond this number. Therefore,
While the latter two types of metadata generated by the
we recommend sorting 200– 300 vignettes per category
ZooScan system come as complementary files, the first
of object to be identified.
three types are included in either the PID files or the
log files, along with the data.
Safeguarding ZooScan data and metadata requires Choice of the classifier and number
that these be published in digital libraries such as of predicted categories
National and/or World Data Centres (NODCs and/or
WDCs) that have the capacity to archive and distribute We compared the performance of six classifiers (see
images and their associated metadata. NODCs such as Table I) for numbers of categories of objects ranging
US-NODC in the USA, SISMER in France and from 35 to 5 categories (Fig. 4). The categories were
BODC in the UK are designated by the International balanced in number of vignettes (300 vignettes from
Oceanographic Data Exchange programme (IODE) of each). Each time the number of categories was reduced,
UNESCO Intergovernmental Oceanographic Commis- a new group of 300 vignettes from that newly combined
sion (IOC), while World Data Centers (WDCs) such as category was selected for the learning set. The results
WDC-MARE in Europe, WDC-Oceanography in the show, first, that the Random Forest algorithm consist-
USA, Russia, China and Japan are designated by the ently had the highest recall and nearly always the lowest
International Council for Science (ICSU). Part of the contamination, regardless of the number of categories
data from the annual time series of zooplankton from predicted (Fig. 4). Support Vector Machine using linear
the Bay of Villefranche-sur-mer, which is presented in functions had the second best performance. All further
the Results section, have been safeguarded at the analyses were carried out with the Random Forest
WDC-MARE and available online by the PANGAEA algorithm. The results also demonstrated that machine
information system (doi:10.1594/PANGAEA.724540). classifications were improved when a smaller number of
Access to raw images, log files and PID files is password categories were predicted (Fig. 4).
protected, whereas low resolution images and key vari-
ables such as abundances and biovolumes of copepods
and total plankton are publically available. With respect Morphometric and biomass measurements
to ZooScan data, it is essential that different instruments Comparisons of ZooProcess automatic measurements of
are inter-calibrated and that software configurations are digitized zooplankton images with manual measure-
known. ments of the same images revealed linear relationships
between ZooProcess feret diameter and manually
measured total length (Fig. 5). There was a greater
scatter in the case of appendicularians than in other taxa
R E S U LT S
and a slope ,1.0, because the appendicularian tails
We first present our results illustrating general character- were often curved, affecting the automated measure-
istics of the ZooScan/ZooProcess/PkID system, includ- ments of feret diameter but not manual measurements.
ing construction of learning sets, selection of classifier Other automated measurements, including major and
algorithms and validation of morphometric and biomass minor elliptical axes, were also correlated with manual

291
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


Fig. 3. Dependence of (A) recall (true positives) and (B) contamination (false positives) rate on the number of vignettes sorted for a learning set.
Curves are illustrated for eight categories of organisms or objects, and the overall mean.

Fig. 4. Dependence of (A) recall (true positives) and (B) contamination (false positives) rate on the number of categories predicted by the
classifier, using different classifier algorithms (see Table I).

length measurements (data not shown), although feret reproducible, although their values may differ somewhat
diameter typically showed the best relationship. from manually determined values.
Comparison of automated measurements of surface The relationships between C and N content and auto-
area (as area excluded) with manual measurements of mated measurements of linear or areal dimensions were
the same individuals was carried out for three taxa well described by power curves (Fig. 7). Much of the
(copepods, euphausiids and chaetognaths: Fig. 6). In all scatter in the relationships shown in Fig. 7 is attributable to
cases, there was a linear relationship between manual the mixture of different species included in these analyses.
and automated measurements. The automated measure- The exponents for C and N were similar to each other,
ments were somewhat higher for copepods and euphau- implying relatively constant C:N ratios. In the case of both
siids, but lower for chaetognaths. These results suggest copepods and chaetognaths, the exponents relating C or N
that automated measurements are consistent and content to linear dimensions (feret diameter) were close to

292
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


Fig. 5. Relationship between automated measurements of feret diameter and manual measurements of total length, for (A) copepods,
(B) euphausiids, (C) appendicularians (tail length), (D) chaetognaths, (E) ostracods and (F) thecosome pteropods from the California Current.

Fig. 6. Relationship between automated measurements of area excluded and manual measurements of projected area for (A) copepods,
(B) euphausiids, (C) chaetognaths from the California Current.

3 and the exponents in relation to areal measurements differences are consistent with the changing body shapes
(area excluded) were close to 2. However, for euphausiids, with ontogeny of euphausiids, as the cephalothorax width
the exponents were close to 2 and 1, respectively. These and depth of euphausiids tends to increase in proportion

293
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


Fig. 7. Relationship between carbon and nitrogen content and automated measurements of body size for copepods (A–D), euphausiids (E–H)
and chaetognaths (I–L) from the California Current. Carbon content plotted vs. linear dimensions (feret diameter; A, E, I) or areal dimensions
(area excluded; C, G, K). Nitrogen content plotted vs. linear dimensions (feret diameter; B, F, J) or areal dimensions (area excluded; D, H, L).

to total body length through their development. In the few objects with low accuracy of detection were not
case of copepods and euphausiids, area excluded was a retained (they were left to contaminate the prediction).
slightly better predictor of C or N content than feret diam- The accuracy of the prediction for this second learning
eter, although for all three taxa the results indicate that set was much better than for the first iteration, and the
automated measurements of either linear or areal dimen- subsequent manual validation was done faster.
sions of vignettes can be related in a useful manner to the As samples were analyzed from different seasons in
biomass of these organisms. the Villefranche time series, newly encountered taxo-
nomic categories were added into the learning set when
Results from the Bay of they became sufficiently numerous, provided that con-
Villefranche-sur-mer fusion with other dominant categories remained low.
This occurred, for example, with cladocerans that
Learning set optimization and application bloomed only in autumn and were nearly absent during
To create our initial learning set for the Villefranche other time periods. Sometimes categories with relatively
case study, we utilized a pre-existing learning set (see high contamination were maintained in the learning set
Methods) to predict 5000 objects from the Villefranche because of their ecological value. For example, the
time series. We then manually validated the prediction Limacina category showed a 34.5% error rate (Table II).
into 30 categories (which took ca. 4 h). We included cat- Nevertheless, it was maintained as a separate group
egories for “bad focus” objects, artifacts, bubbles and because subsequent manual validation was rapid and
fibers. To improve the classifier, we then randomly the seasonal development of this taxonomic group was
selected a fixed number of vignettes drawn from each of important. After the prediction results did not improve
these 30 categories from the Villefranche time series significantly with additional iterations, we considered
and created a new learning set. This second learning the learning set satisfactory. It contained 14 zooplankton
set was tested by cross validation in PkID using the and 6 other categories (Table II) and was applied to the
Random Forest algorithm. Categories containing only a rest of the samples.

294
Downloaded
Table II: Confusion matrix for the 20 categories in the learning set used for machine classification of the 2007–2008 time series by the

G. from
Random Forest algorithm

GORSKY
Aggregates_dark

Pteropoda_other
Copepoda_small
Copepoda_other

https://academic.oup.com/plankt/article/32/3/285/1536761
Decapoda_large
Appendicularia
in learning set

Chaetognatha

Nectophores
Aggregates
Percentage

ET AL.
1-Precision
Bad focus

Cladocera

Radiolaria
Medusae

Thaliacea

Limacina
Bubbles

Oithona

Egg-like

Scratch
Fibers

Recall
Total

j ZOOPLANKTON IMAGE ANALYSIS USING THE


Aggregates 7.2 683 27 80 67 0 0 77 99 53 134 8 16 26 9 37 1 1 12 0 25 1355 0.50 0.42
Aggregates_dark 5.7 3 671 0 0 39 0 18 38 0 4 0 27 0 0 0 0 193 0 8 43 1064 0.63 0.40
Appendicularia 7.7 85 0 1243 0 0 22 0 2 2 0 5 0 73 0 18 1 0 0 2 2 1455 0.85 0.20
Bad focus 7.5 44 0 5 1230 0 0 5 27 2 12 0 1 5 0 47 0 0 20 0 12 1410 0.87 0.10
Bubbles 4.5 2 30 0 0 786 0 2 0 0 0 0 7 0 0 0 0 10 0 0 3 840 0.94 0.06
Chaetognatha 2.3 0 0 51 0 0 356 0 0 0 0 1 0 20 0 0 0 0 0 0 0 428 0.83 0.11
295

Cladocera 6.2 27 19 2 1 0 0 1028 22 0 2 0 5 0 0 1 0 1 0 0 62 1170 0.88 0.16


Copepoda_other 11.3 74 32 4 8 0 0 14 1608 119 237 24 1 0 0 4 0 0 0 6 2 2133 0.75 0.24
Oithona 7.5 24 0 28 0 0 0 0 61 1256 38 0 0 9 0 0 0 0 0 4 0 1420 0.88 0.17
Copepoda_small 7.8 130 3 0 6 0 0 3 139 43 1141 0 0 0 0 0 0 0 0 0 0 1465 0.78 0.28
Decapoda_large 2.4 2 0 7 0 0 0 0 23 0 0 410 0 0 0 1 0 0 0 2 0 445 0.92 0.12
Egg-like 1.9 12 26 3 7 4 0 12 3 0 0 0 259 0 1 0 0 8 0 0 15 350 0.74 0.20
Fibers 6.4 10 0 84 0 0 21 0 7 14 0 0 0 1034 0 0 0 0 12 28 0 1210 0.85 0.13
Medusae 0.7 16 0 0 2 0 0 0 2 0 0 0 2 0 100 18 0 0 0 0 0 140 0.71 0.19
Nectophores 6.1 26 0 21 32 0 1 3 5 0 0 0 1 1 10 1029 20 0 1 0 5 1155 0.89 0.19
Thalicea 1.4 8 0 5 6 0 2 0 0 0 0 0 0 0 3 108 123 0 0 0 0 255 0.48 0.17
Limacina 4.5 2 267 0 0 8 0 9 0 0 0 0 4 0 0 0 0 558 0 0 7 855 0.65 0.28

byZOOSCAN
Scratch 1.6 2 0 0 5 0 0 0 1 0 0 0 0 6 0 2 4 0 285 0 0 305 0.93 0.14

guest on 23 January 2023


Pteropoda_other 2.3 9 0 2 1 0 0 3 55 23 23 20 0 11 0 0 0 0 0 291 2 440 0.66 0.15
Radiolaria 5.1 23 45 11 8 0 0 45 12 1 3 0 1 0 0 3 0 3 0 0 800 955 0.84 0.18
Total 100 1182 1120 1546 1373 837 402 1219 2124 1513 1594 468 324 1185 123 1268 149 774 330 341 978 18 850 0.78 0.19

Corresponding correct identifications (in bold) are in the diagonal.


JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Table III: Final categories used for classifying Our semi-automated analysis of an annual cycle of
the 2007– 2008 time series in the Bay of zooplankton variation in the Bay of Villefranche
Villefranche, after initial machine revealed pronounced seasonal variation in abundance,
with substantial changes in the composition of the
classification, followed by manual validation,
mesozooplankton (Fig. 9). Calanoid copepods were the
then manual subdivision into additional numerically dominant organisms at all times of year,
categories increasing from 75% before the peak of the bloom to a
Categories used for classifying the Villefranche time series maximum of 95% at the peak and declining to 55%
Aggregates Decapoda_large afterwards, as cladocerans, appendicularians and other
Aggregates_dark Decapoda_other taxa increased in relative importance (Figs 9 and 10).
Algae Echinodermata Poecilostome and oithonid copepods were abundant
Amphipoda Egg like

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


Annelids Fiber prior to the peak (16 and 8%, respectively). The com-
Appendicularia Fish munity appears to be more diverse in summer.
Bad focus Heteropoda In Fig. 10, we compare time series of individual
Bivalves Medusae_ephyrae
Bubbles Medusae_other major taxa both before and after manual validation of
Chaetognatha Multiple the sorted vignettes. While automated classification
Cladocera Nauplii (“unvalidated”) shows very good agreement with the
Copepoda_Acartia Ostracoda
Copepoda_Centropages Other manually validated time series for total copepods, this
Copepoda_Euterpina Pteropoda_Limacina was not the case for other categories of organisms. For 4
Copepoda_Harpacticoida Pteropoda_other of the 5 other groups of organisms in Fig. 10 (i.e.
Copepoda_Oithona Radiolaria
Copepoda_Poecilostomatoida Scratch Appendicularia, chaetognaths, Cladocera, Oithona), the
Copepoda_Temora Siphonophora_eudoxid typical error was an overestimate, with moderate to high
Copepoda_other Siphonophora_nectophores contamination with other organisms (false positives). For
Copepoda_other_multiples Siphonophora_other
Copepoda_other_small Thaliacea the sixth group (Decapoda), the usual error was underes-
timation (i.e. false negatives). This result underscores the
importance of manual validation, even for classifiers that
seem to have an overall acceptable error rate.
The CM (Table II) of this learning set shows that Sensitive ZooScan size measurements make it possible
most of the groups have a recall (rate of true positives) to readily reconstruct size spectra of organisms. For
of about 80% and a contamination rate (false positives) example, Fig. 11 illustrates the overall size spectrum for
smaller than 20%. Thus, this classifier performed mod- all copepods combined, as well as spectra for some of the
erately well, however not sufficiently accurately for eco- dominant genera, including the smaller-bodied Oithona,
logical studies. The classifications of vignettes were then intermediate-sized Acartia and larger-bodied Centropages.
manually validated by sorting all vignettes into appro-
priate categories, a process which was facilitated by the
prior machine classification. Many users may wish to DISCUSSION
use the classifications obtained at this point (i.e. in this
case, to 14 categories). The ZooScan–ZooProcess–PkID system is an end-to-end
approach for digital imaging of preserved zooplankton,
segmentation and feature extraction, and design and
Analysis of seasonal variations application of machine learning classifiers. The results
For the present study, while manually validating image from ZooScan analyses lead readily to numerical abun-
assignments to appropriate categories we chose to create dances as well as construction of size and biomass spectra.
additional categories beyond those in Table II. Several The calibration of each digital scan with reference to an
additional taxonomic categories could be reliably distin- OD standard makes it possible to directly compare
guished manually, although not by the machine classi- images from ZooScans used in different laboratories.
fiers. The result was a total of 42 categories, 33 of them Many existing zooplankton sampling programs have
zooplankton (see Table III), and all verified with essen- archived large numbers of plankton samples that have yet
tially 100% accuracy. The use of the automated classifier to be fully analyzed. Analysis of such samples is recog-
greatly facilitated manual validation; it was simple to nized as a high priority (Perry et al., 2004), but this is an
subsequently drag and drop their vignettes into the expensive task when carried out by trained microscopists.
correct categories. Representative vignettes of some of Complete analysis has awaited the development of
the identified taxa may be seen in Fig. 8. machine learning or automated molecular methods.

296
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


Fig. 8. Examples of vignettes of organisms from ZooScan analysis of the 2007– 2008 time series in the Bay of Villefranche-sur-mer (scale
bar ¼ 1 mm). (A) Copepods, (B) Centropages, (C) Harpacticoida, (D) Poecilostomatoida, (E) Temora, (F) Oithona, (G) Cladocera, (H) Ostracoda, (I)
Radiolaria, (J) eggs, (K) Limacina, (L) Pteropoda, (M) Appendicularia, (N) medusae, (O) Siphonophora, (P) Thaliacea, (Q) Decapoda, (R)
Chaetognatha.

Fig. 9. Total abundance of mesozooplankton from 2007 to 2008, and the proportion of primary mesozooplankton categories (inset pie
diagrams) before, during and after the 2008 spring bloom in the Bay of Villefranche. All classifications were validated manually.

Equally important for such sample collections is the complement to the conservation of the physical samples
archiving of digital representations of the samples, to themselves. Such digital images permit automatic or
facilitate permanent records of their contents as a semiautomatic image analysis, rapid measurement of

297
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


Fig. 10. Abundance of six major groups of mesozooplankton from 2007 to 2008 in the Bay of Villefranche. Time series of each category
are illustrated as classified automatically by the Random Forest algorithm without manual validation (dotted line) and after manual validation
(solid line).

Geoscientific and Environmental Data] in databases


accessible to the scientific community, standardization of
images from different ZooScans allowing the construction
of combined learning sets, non-destructive analysis so the
samples can be used for other purposes and safe labora-
tory operation with aqueous samples.
Several classification algorithms have already been
tested in the plankton recognition literature
(Culverhouse et al., 1996; Grosjean et al., 2004; Blaschko
et al., 2005; Hu and Davis, 2005). The Random Forest
algorithm seems to be one of the most promising
(Grosjean et al., 2004; Bell and Hopcroft, 2008).
However, care is needed in design and testing of learn-
ing sets. Bell and Hopcroft (Bell and Hopcroft, 2008)
built a learning set of 63 categories, but reduced this to
two categories and correctly identified copepods 67.8%
Fig. 11. Size spectrum of total copepods and three different copepod of the time and euphausiid eggs with even lower accu-
genera (Acartia, Centropages and Oithona), from all sampling dates in the
Bay of Villefranche. All classifications were validated manually. racy. Following automated classification in Irigoien et al.
(Irigoien et al., 2009), only four categories (two size cat-
organisms and a permanent record of the sample contents egories of copepods, the euphausiids and mysids cat-
that can be revisited in the future. The ZooScan system egory, and chaetognaths) out of 17 had an acceptable
fulfils many of these objectives. It permits relatively rapid error level. Hu and Davis (Hu and Davis, 2006) pro-
analysis of zooplankton samples combining automated posed use of a sequential dual classifier, using first a
classification and manual validation, digital archiving of shape-based feature set and a neural network classifier,
images [for example the Villefranche time series ZooScan followed by a texture-based feature set and a support
images are stored in PANGAEAw-Publishing Network for vector machine classifier.

298
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

Here we endorse a practical semi-automated method dimensions of digitized organisms can be related to
that may help biologists obtain taxonomically more their biomass, applied on a taxon-specific basis.
detailed data sets with sufficient accuracy. Comparison The classification method proposed here allows a
between machine predicted and manually validated relatively detailed taxonomic characterization of zoo-
classifications showed that for dominant taxa such as plankton samples and provides a practical compromise
copepods, automatic recognition was sufficiently accu- between the fully automatic but less accurate and the
rate. However, for less abundant taxa such as appendi- accurate manual classification of zooplankton. Useful
cularians and chaetognaths, automatic recognition size and biomass estimations may be rapidly obtained
generally overestimated true abundances (but underesti- for ecologically oriented studies. Results from different
mated the abundance of decapods). Fully automated ZooScan data sets can be combined using
classification would have resulted in inaccurate descrip- PANGAEAw’s data warehouse, thus encouraging coop-

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


tions of seasonal cycles of key zooplankton taxa and erative, networked studies over broad geographic scales.
produced biased size spectra. Such biases result from
contamination by other abundant groups, especially in
winter/spring in the present study when copepods
strongly dominate. There are no simple conversion AC K N OW L E D G E M E N T S
factors that could be used here because the error is not We thank Todd Langland and Corinne Desnos for
constant through the seasonal cycle. The total time assistance with measurements.
required for classification with manual validation is only
slightly longer than with a fully automated classifier,
because there is no need to construct a detailed learn-
ing set. Moreover, the results are significantly improved FUNDING
over an automated method alone. Proper design of the The ZooScan development was funded by CNRS/
initial classifier makes the subsequent manual validation INSU ZOOPNEC program, by the UPMC and the EU
step proceed relatively quickly. The initial classifier will FP6 programs SESAME and EUR-OCEANS under
then facilitate subsequent subdivision into categories contracts GOCE-2006-036949 and 511106, respectively.
that are easily classified manually. C.G.-C. was supported by an EUR-OCEANS PhD fel-
It is important to keep in mind when classifying the lowship. This work was stimulated by SCOR WG130
sample automatically that all types of objects that are and was supported by the Mediterranean Scientific
encountered in a sample, including artifacts, must have Commission (CIESM) Zooplankton Indicator program,
a corresponding category in the learning set. If not, and by the US National Science Foundation via the
they will systematically contaminate other categories, California Current Ecosystem LTER program.
leading to lower recognition performance.
Our results are encouraging for the estimation of
zooplankton size and biomass spectra from ZooScan
analyses. Many ecological traits (including metabolic REFERENCES
rates, population abundance, growth rate and pro- Abramoff, M. D., Magelhaes, P. J. and Ram, S. J. (2004) Image
ductivity, spatial habitat, trophic relationships) are corre- Processing with ImageJ. Biophotonics Int., 11, 36– 42.
lated with body size (e.g. Gillooly et al., 2002; Brown Bell, J. L. and Hopcroft, R. R. (2008) Assessment of ZooImage as a tool
et al., 2004). Hence, because body size captures so many for the classification of zooplankton. J. Plankton Res., 30, 1351–1367.
aspects of ecosystem function, it can be used to syn- Benfield et al. (2007) RAPID Research on Automated Plankton
thesize a suite of co-varying traits into a single dimen- Identification. Oceanography, 20, 172–187.
sion (Woodward et al., 2005). However, with some Berman, M. S. (1990) Enhanced zooplankton processing with
automated measurement methods for reconstructing image analysis technology. Int. Counc. Explor. Sea Comm. Meet,
size spectra from in situ measurements, all the in situ 1990/L:20, 5.
objects are treated as living plankton, though it has Berube, D. and Jebrak, M. (1999) High precision boundary fractal
been shown that a significant proportion of objects can analysis for shape characterization. Comp. Geosci., 25, 1059–1071.
be marine snow (Heath et al., 1999; González-Quirós Blaschko, M. B., Holness, G., Mattar, M. A. et al. (2005) Automatic in
situ identification of plankton. Seventh IEEE Workshops on Application of
and Checkley, 2006; Checkley et al., 2008). The Computer Vision (WACV/MOTION’05), 1, 79–86.
ZooScan imaging system provides an efficient means to
Bollmann, J., Quinn, P. S., Vela, M. et al. (2004) Automated particle
reconstruct plankton size spectra from taxonomically analysis: calcareous microfossils. In Francus, P. (ed.), Image Analysis,
well-characterized zooplankton samples. In addition, Sediments and Paleoenvironments. Kluwer Academic Publishers,
automated measurements of either linear or areal Dordrecht, The Netherlands, pp. 229– 252.

299
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Breiman, L. (2001) Random forests. Mach. Learn., 45, 5– 32. Irigoien, X., Fernandes, J. A., Grosjean, P. h. et al. (2009) Spring
Brown, J. H., Gillooly, J. F., Allen, A. P. et al. (2004) Toward a meta- zooplankton distribution in the Bay of Biscay from 1998 to 2006 in
bolic theory of ecology. Ecology, 85, 1771–1789. relation with anchovy recruitment. J. Plankton Res., 31, 1–17.
Chang, C.-C. and Lin, J. (2001) Training nu-support vector classifiers: Jeffries, H. P., Sherman, K., Maurer, R. et al. (1980) Computer proces-
theory and algorithms. Neural. Comp., 13, 2119– 2147. sing of zooplankton samples. In Kennedy, V. (ed.), Estuarine
Perspectives. Academic Press, New York, pp. 303–316.
Checkley, D. M., Jr, Davis, R. E., Herman, A. W. et al. (2008)
Assessing plankton and other particles in situ with the SOLOPC. Jefferies, H. P., Berman, M. S., Poularikas, A. D. et al. (1984)
Limnol. Oceanogr., 53, 2123–2126. Automated sizing, counting and identification of zooplankton by
pattern recognition. Mar. Biol., 78, 329– 334.
Culverhouse, P. F., Williams, R., Reguera, B. et al. (1996) Automatic
categorisation of 23 species of Dinoflagellate by artificial neural Kennett, J. P. (1968) Globorotalia truncatulinoides as a
network. Mar. Ecol. Prog. Ser., 139, 281 –287. paleo-oceanographic index. Science, 159, 1461–1463.
Culverhouse, P. F., Williams, R., Reguera, B. et al. (2003) Do experts Ortner, P. B., Cummings, S. R., Aftring, R. P. et al. (1979) Silhouette

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


make mistakes? A comparison of human and machine identification photography of oceanic zooplankton. Nature, 277, 50– 51.
of dinoflagellates. Mar. Ecol. Prog. Ser., 247, 17–25. Perry, R. I., Batchelder, H. P., Mackas, D. L. et al. (2004) Identifying
Dundar, M., Fung, G., Bogoni, L. et al. (2004) A methodology for global synchronies in marine zooplankton populations: Issues and
training and validating a CAD system and potential pitfalls. CARS, opportunities. ICES J. Mar. Sci., 61, 445– 456.
July, pp. 1010–1014. Peters, A., Hothorn, T. and Lausen, B. (2002) Ipred: improved predic-
Fernandes, J. A., Irigoien, X., Boyra, G. et al. (2009) Optimizing the tors. R News, 2, 33–36.
number of classes in automated zooplankton classification. Quinlan, J. R. (1993) C4.5: Programs for Machine Learning. Morgan
J. Plankton Res., 31, 19–29. Kaufman, San Francisco.
Gasparini, S. (2007) PLANKTON IDENTIFIER: a software for auto- Rakotomalala, R. (2005) TANAGRA: une plate-forme
matic recognition of planktonic organisms., http://www.obs-vlfr.fr/ d’expérimentation pour la fouille de données. MODULAD, 32, 71–85.
~gaspari/Plankton_Identifier/index.php. Rasband, W. S. (2005) ImageJ, U. S. National Institutes of Health.
Gillooly, J. F., Charnov, E. L., West, G. B. et al. (2002) Effects of size Bethesda, MD, USA, http://rsb.info.nih.gov/ij/.
and temperature on developmental time. Nature, 417, 70– 73. Rolke, M. and Lenz, J. (1984) Size structure analysis of zooplankton
González-Quirós, R. and Checkley, D. M., Jr (2006) Occurrence of samples by means of an automated image analyzing system.
fragile particles inferred from optical plankton counters used in situ J. Plankton Res., 6, 637–645.
and to analyze net samples collected simultaneously. J. Geophys. Res., Santos Filho, E., Sun, X., Picheral, M. et al. Implementation and
111, C05S06. doi:10.1029/2005JC003084. evaluation of new features for zooplankton identification: Zooscan
Gorsky, G., Guilbert, P. and Valenta, E. (1989) The autonomous Case Study. J. Plankton Res., (submitted for publication).
image analyzer: enumeration, measurement and identification of Simpson, R., Williams, R., Ellis, R. et al. (1992) Biological
marine phytoplankton. Mar. Ecol. Prog. Ser., 58, 133– 142. pattern recognition by neural networks. Mar. Ecol. Prog. Ser., 79,
Grosjean, P., Picheral, M., Warembourg, C. et al. (2004) Enumeration, 303 –308.
measurement, and identification of net zooplankton samples using the Sternberg, S. R. (1983) Biomedical image processing. Computer, 16,
ZOOSCAN digital imaging system. ICES J. Mar. Sci., 61, 518–525. 22– 34.
Guyon, I. and Elisseeff, A. (2003) An introduction to variable and Wiebe, P. H., Gallager, S. M., Davis, C. S. et al. (2004) Using a high-
feature selection. J. Mach. Learn. Res., 3, 1157–1182. powered strobe light to increase the catch of Antarctic krill. Mar.
Heath, M. R., Dunn, J., Fraser, J. G. et al. (1999) Field calibration of Biol., 144, 493 –502.
the optical plankton counter with respect to Calanus finmarchicus. Woodward, G., Ebenman, B., Emmerson, M. et al. (2005) Body-size
Fish. Oceanogr., 8(Suppl. 1), 13–24. in ecological networks. Trends Ecol. Evol., 20, 402– 409.
Herman, A. W. (1992) Design and calibration of a new optical plank-
ton counter capable of sizing small zooplankton. Deep-Sea Res., 39,
395–415. A P P E N D I X 1 . G LO S S A RY O F
Hu, Q. and Davis, C. (2005) Automatic plankton image recognition T E R M I N O LO G Y U S E D
with co-occurrence matrices and support vector machine. Mar. Ecol.
Prog. Ser., 295, 21–31.
Accuracy The proportion of the total number of
Hu, Q. and Davis, C. (2006) Accurate automatic quantification of
taxa-specific plankton abundance using dual classification with cor-
classified objects that is correctly
rection. Mar. Ecol. Prog. Ser., 306, 51– 61. classified

300
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

Category A taxon or group of taxa used in the APPENDIX 2.


learning set and confusion matrix
Grey level control of 17 ZooScan units using two differ-
Classifier A supervised learning algorithm applied
ent optical density calibration disks (OD 0.3 and 0.9).
to automated classification of objects.
The variability among different instruments is lower
Classifiers are developed from a suite
than the variability within the same category of objects
of characteristics extracted from each
in one image (not shown).
object
Confusion A matrix illustrating both predicted
matrix (CM) (from the classifier) and true classifi-
cations of all object categories
Contamination See false positive rate

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


dat1.txt file The PID file completed with the pre-
dicted and validated categories, if a
validation has been performed
Error rate Proportion of mispredicted organisms
(to be manually corrected in order to
obtain a fully validated data set)
False positive rate The proportion of objects that is incor-
rectly classified as belonging to a cat-
egory of interest; also called
contamination
Learning set A set of vignettes of organisms sorted
A P P E N D I X 3 . T H E M E TA DATA
in categories by an expert and used in
W I N D OW I N Z O O P RO C E S S W I T H
a supervised learning model; also
called training set
T H E D E TA I L S R E C O R D E D W I T H
Log file A text file containing details concern- E AC H S C A N
ing the analyzed sample image
.pid file A data file resulting from image analy- Sample Id Incorporating sample date and
sis by ZooProcess. Includes the LOG time in filename assists with sub-
file above the data section. Each ident- sequent file retrieval
ified object occupies one row, with all ZooScan operator
the variables extracted from that object Ship
in columns Scientific program
Plankton Software for automatic recognition of Station Id Name of sampling location
Identifier plankton Sampling date Date and time of sample
Precision The proportion of predicted positive collection
objects that was correctly assigned Latitude Coordinates of the station,
Recall see true positive rate degrees.minutes
True positive rate The proportion of objects that is cor- Longitude Coordinates of the station,
rectly classified as belonging to a cat- degrees.minutes
egory of interest; also called recall Bottom depth (m) Bottom depth of the station
Validation Manual sorting of vignettes to the CTD reference (Permits CTD data to be associ-
correct category, following initial auto- filename ated with plankton results)
mated classification of vignettes Other reference (Can be used to record the
Variable Attributes extracted from every name of the collector)
detected object (see the list of extracted Number of tows in (Useful where samples are
variables in Appendix 3) the same sample pooled)
Vignette An image of a single detected object; Tow type
also called ROI (region of interest) Net type
ZooProcess Software for image acquisition, treat- Net mesh (cod end) Cod-end mesh size (if different
ment and analysis built for the from the net mesh size, this
ZooScan system information should be recorded
elsewhere in the remarks field)

301
JOURNAL OF PLANKTON RESEARCH j VOLUME 32 j NUMBER 3 j PAGES 285 – 303 j 2010

Net opening Area of net mouth APPENDIX 4. LIST OF


surface (m2) VA R I A B L E S R E C O R D E D I N T H E
Maximum depth (m) Maximum depth reached by the DATA S E C T I O N O F T H E P I D
net when collecting the sample FILES
Minimum depth (m) Minimum depth reached by the
net when collecting the sample Standard ImageJ variables
Filtered volume (m3) Flowmeter readout (alternatively,
derived from mouth area and
tow length) Angle Angle between the primary axis and a line par-
Fraction id Identifies the fraction name allel to the x-axis of the image
when the sample has been BX X coordinate of the top left point of the smal-
lest rectangle enclosing the object

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


sieved in different size categories
(e.g. D1 for fraction .1 mm BY Y coordinate of the top left point of the smal-
and D2 for fraction between lest rectangle enclosing the object
200 mm and 1 mm) Height Height of the smallest rectangle enclosing the
Fraction min mesh Lower mesh size for sieving the object
(mm) sample Width Width of the smallest rectangle enclosing the
Fraction max mesh Upper mesh size for sieving the object
(mm) sample X X position of the center of gravity of the object
Fraction splitting ratio Ratio of total sample volume to XM X position of the center of gravity of the
volume of aliquot scanned object’s grey level
Remarks Free text field XMg5 X position of the center of gravity of the object,
Submethod Method used to subsample the using a gamma value of 51
original sample XStart X coordinate of the top left point of the image
Y Y position of the center of gravity of the object
YM Y position of the center of gravity of the
object’s grey level
YMg5 Y position of the center of gravity of the object,
using a gamma value of 51
YStart Y coordinate of the top left point of the image

Other variables

Area Surface area of the object in square pixels


Mean Average grey value within the object; sum of
the grey values of all pixels in the object
divided by the number of pixels
StdDev Standard deviation of the grey value used to
generate the mean grey value
Mode Modal grey value within the object
Min Minimum grey value within the object
(0 ¼ black)
Max Maximum grey value within the object
(255 ¼ white)
Slope Slope of the grey level normalized cumulat-
ive histogram
Histcum1 grey level value at 25% of the normalized
cumulative histogram of grey levels
Histcum2 grey level value at 50% of the normalized
cumulative histogram of grey levels
Histcum3 grey level value at 75% of the normalized
cumulative histogram of grey levels

302
G. GORSKY ET AL. j ZOOPLANKTON IMAGE ANALYSIS USING THE ZOOSCAN

Perim The length of the outside boundary of the Skelarea Surface area of skeleton in pixels. In a
object binary image, the skeleton is obtained by
Major Primary axis of the best fitting ellipse for the repeatedly removing pixels from the edges
object of objects until they are reduced to the
Minor Secondary axis of the best fitting ellipse for width of a single pixel
the object
Circ Circularity ¼ (4 * Pi * Area) / Perim2; a
value of 1 indicates a perfect circle, a value APPENDIX 5.
approaching 0 indicates an increasingly Examples of extracted vignettes and measurements.
elongated polygon Vignettes of (a) an appendicularian, (b and c) copepods
Feret Maximum feret diameter, i.e. the longest with antennules in different orientations and (c) a chae-

Downloaded from https://academic.oup.com/plankt/article/32/3/285/1536761 by guest on 23 January 2023


distance between any two points along the tognath. Feret diameter (grey line), major and minor ellip-
object boundary tical axes (black lines) and the smallest rectangle enclosing
IntDen Integrated density. The sum of the grey values the object are delineated on the leftmost image.
of the pixels in the object (i.e. ¼ Area*Mean) Silhouettes illustrate the surface area of each organism
Median Median grey value within the object when the contiguous regions of background pixels are
Skew Skewness of the histogram of grey level excluded (“area excluded,” center image) and the total
values surface area (rightmost image). Scale bar ¼ 1 mm.
Kurt Kurtosis of the histogram of grey level values
%area Percentage of object’s surface area that is
comprised of holes, defined as the back-
ground grey level
Area_exc Surface area of the object excluding holes,
in square pixels (¼Area*(12(%area/100))
Mean_exc Average grey value excluding holes within
the object (¼ IntDen /Area_exc)
Fractal Fractal dimension of object boundary
(Berube and Jebrak, 1999)

303

You might also like