Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds

Steffen Oppel

Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds

Knowledge about the spatial distribution of seabirds at sea is important for conservation. During marine conservation planning, logistical constraints preclude seabird surveys covering the complete area of interest and spatial distribution of seabirds is frequently inferred from predictive statistical models. Increasingly complex models are available to relate the distribution and abundance of pelagic seabirds to environmental variables, but a comparison of their usefulness for delineating protected areas for seabirds is lacking. Here we compare the performance of five modelling techniques (generalised linear models, generalised additive models, Random Forest, boosted regression trees, and maximum entropy) to predict the distribution of Balearic Shearwaters (Puffinus mauretanicus) along the coast of the western Iberian Peninsula. We used ship transect data from 2004 to 2009 and 13 environmental variables to predict occurrence and density, and evaluated predictive performance of all models using spatially segregated test data. Predicted distribution varied among the different models, although predictive performance varied little. An ensemble prediction that combined results from all five techniques was robust and confirmed the existence of marine important bird areas for Balearic Shearwaters in Portugal and Spain. Our predictions suggested additional areas that would be of high priority for conservation and could be proposed as protected areas. Abundance data were extremely difficult to predict, and none of five modelling techniques provided a reliable prediction of spatial patterns. We advocate the use of ensemble modelling that combines the output of several methods to predict the spatial distribution of seabirds, and use these predictions to target separate surveys assessing the abundance of seabirds in areas of regular use....Read more

Author's personal copy Comparison of ﬁve modelling techniques to predict the spatial distribution and abundance of seabirds Steffen Oppel a,⇑ , Ana Meirinho b , Iván Ramírez b , Beth Gardner c , Allan F. O’Connell d , Peter I. Miller e , Maite Louzao f,g a Royal Society for the Protection of Birds, The Lodge, Sandy, Bedfordshire SG19 2DL, UK b Sociedade Portuguesa para o Estudo das Aves, Avenida João Crisóstomo N18 4D, 1000-179 Lisboa, Portugal c Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695, USA d US Geological Survey, Patuxent Wildlife Research Center, Laurel, MD 20708, USA e Plymouth Marine Laboratory, Prospect Place, Plymouth PL1 3DH, UK f UFZ – Helmholtz Centre for Environmental Research, Perlmoserstrasse 15, 04318 Leipzig, Germany g Centre d’Etudes Biologiques de Chizé, CNRS UPR 1934, 79369 Villiers en Bois, France article info Article history: Available online 30 December 2011 Keywords: Species distribution model Machine learning Marine protected area Important bird area (IBA) Shearwater Portugal abstract Knowledge about the spatial distribution of seabirds at sea is important for conservation. During marine conservation planning, logistical constraints preclude seabird surveys covering the complete area of interest and spatial distribution of seabirds is frequently inferred from predictive statistical models. Increasingly complex models are available to relate the distribution and abundance of pelagic seabirds to environmental variables, but a comparison of their usefulness for delineating protected areas for sea- birds is lacking. Here we compare the performance of ﬁve modelling techniques (generalised linear mod- els, generalised additive models, Random Forest, boosted regression trees, and maximum entropy) to predict the distribution of Balearic Shearwaters (Pufﬁnus mauretanicus) along the coast of the western Iberian Peninsula. We used ship transect data from 2004 to 2009 and 13 environmental variables to pre- dict occurrence and density, and evaluated predictive performance of all models using spatially segre- gated test data. Predicted distribution varied among the different models, although predictive performance varied little. An ensemble prediction that combined results from all ﬁve techniques was robust and conﬁrmed the existence of marine important bird areas for Balearic Shearwaters in Portugal and Spain. Our predictions suggested additional areas that would be of high priority for conservation and could be proposed as protected areas. Abundance data were extremely difﬁcult to predict, and none of ﬁve modelling techniques provided a reliable prediction of spatial patterns. We advocate the use of ensemble modelling that combines the output of several methods to predict the spatial distribution of seabirds, and use these predictions to target separate surveys assessing the abundance of seabirds in areas of regular use. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Marine biodiversity is under increasing human pressure and many species of marine vertebrates have declined over the past decades (Halpern et al., 2008). The decline of many seabird species is directly linked to high mortality at sea due to ﬁsheries bycatch (Croxall and Rothery, 1991; Oro et al., 2004; Weimerskirch, 2002). Reducing mortality of seabirds and other marine biodiver- sity may be achieved by the designation and enforcement of mar- ine protected areas (Game et al., 2009). Identifying marine areas that are suitable for protection to ben- eﬁt seabirds requires a thorough understanding of the spatial dis- tribution of seabirds at sea (Louzao et al., 2006a; Lovvorn et al., 2009; Piatt et al., 2006). During the breeding season all seabirds are central place foragers, and the location of important foraging areas at sea are inﬂuenced by different factors such as prey avail- ability, foraging ranges and colony sizes of each species (Grecian et al., 2012; Huettmann and Diamond, 2001; Thaxter et al., 2012). Outside the breeding season, however, many species roam widely or migrate long distances, and use distinct areas for stop- over (Guilford et al., 2009). Because these foraging areas along migratory routes are equally important for the conservation of sea- birds, the identiﬁcation and protection of those areas is a key pri- ority for seabird conservation (BirdLife International, 2010b; Oppel et al., 2009; Piatt et al., 2006). Determining the spatial distribution of seabirds during the non- breeding period is difﬁcult because logistical constraints generally 0006-3207/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.biocon.2011.11.013 ⇑ Corresponding author. Tel.: +44 1767 693452; fax: +44 1767 693211. E-mail addresses: steffen.oppel@rspb.org.uk, steffen.oppel@gmail.com (S. Oppel). Biological Conservation 156 (2012) 94–104 Contents lists available at SciVerse ScienceDirect Biological Conservation journal homepage: www.elsevier.com/locate/biocon

Author's personal copy limit surveys to subsets of the area of interest. The increasing avail- ability of large-scale remote sensing data, which can be used as environmental predictor variables, makes it possible to use statis- tical models to predict species distributions over large areas (Elith and Leathwick, 2009; Tremblay et al., 2009). Several statistical modelling techniques have been used to predict the occurrence and abundance of seabirds at sea (Huettmann and Diamond, 2006; Louzao et al., 2009; Nur et al., 2011; Tremblay et al., 2009; Yen et al., 2004). Currently, however, it is not clear how modelling methods differ in their ability to predict species distributions (Elith and Graham, 2009), and which approaches yield the most reliable predictions for seabirds. The number and complexity of modelling techniques used to predict species distributions has increased substantially over the past decades (Hegel et al., 2010), and several comparisons of model performance have been carried out for terrestrial species (e.g., Elith and Graham, 2009; Elith et al., 2006; Segurado and Araújo, 2004). In contrast, the marine environment is less studied and more challenging given its dynamic nature (Ready et al., 2010; Robinson et al., 2011; Wakeﬁeld et al., 2009). Furthermore, seabirds are highly mobile species, and their presence at certain locations varies temporally depending on whether an area is used during the breeding season, as a migration stopover, or as moult refuge (Tremblay et al., 2009). A comparison of the performance of different models that predict distributions and abundances of seabirds based on shipboard survey data has to our knowledge only been explored for one coastal species (Yen et al., 2004), yet the bourgeoning interest in the identiﬁcation of pelagic mar- ine protected areas warrants a comparison of newer distribution modelling techniques. Here we compare the performance of ﬁve modelling techniques to predict the occurrence and abundance of a migratory seabird species outside of the breeding season. The Balearic Shearwater (Pufﬁnus mauretanicus) is a critically endangered species that breeds only at the Balearic archipelago in the western Mediterra- nean, and migrates to the North-East Atlantic after the breeding season (Brooke, 2004). The species suffers from high adult mortal- ity at sea (Oro et al., 2004), and most research efforts have focused on understanding foraging ecology and distribution during the breeding season in the Mediterranean (Bartumeus et al., 2010; Louzao et al., 2006a, 2006b). Marine protected areas are needed for Balearic Shearwaters throughout its range, and although both Spain and Portugal have delineated marine important bird areas (IBAs, Arcos et al., 2009; Ramírez et al., 2008), most of these areas were still not legally protected as of October 2011 (BirdLife Inter- national, 2010a). Our model comparison aims to inform seabird conservation managers about the performance of modelling techniques that can be used to predict the spatial distribution and abundance of seabirds for the identiﬁcation of marine IBAs or protected areas. We tested model predictions against independent data and com- pared predicted distributions with the locations of existing marine IBAs to evaluate whether our model results agree with IBAs that were identiﬁed with a variety of different methods (Ramírez et al., 2008). Thus, we provide information on which modelling techniques are useful for seabirds, and identify areas that may war- rant protection to beneﬁt the Balearic Shearwater. 2. Methods 2.1. Data collection Between December 2004 and April 2009, we conducted ship- board surveys off the coast of Portugal and western Andalusia (Spain) between 36°N and 42°N, and 6°W and 10°W, covering an area of 3497 km 2 (Fig. 1). Survey effort was carried out between March and November each year, with fewer surveys from Decem- ber through February. A total of 1590 h of observations provided data for our analysis, and 84% of the survey effort was in Portu- guese waters, 26% in Spanish waters. We used standard European Seabirds at Sea protocols for data collection (Camphuysen and Garthe, 2004; Tasker et al., 1984) on board four similar research vessels. All seabirds in contact with water within 300 m of the survey transect were counted on one side of the ship. All ﬂying seabirds were counted using the ‘snap- shot method’, and bird observations were summed over 5 min periods. Based on recorded vessel speed and the nominal width of the survey transect we then calculated the area surveyed (km 2 ), and density of birds as the total number of observed birds divided by the area covered (birds km À2 ). 2.2. Data processing and exploratory analysis We ﬁrst binned all observation data into a spatial grid with cell size 4 Â 4 km to match the spatial resolution of remotely sensed environmental data. For each grid cell, we added the num- ber of observed shearwaters during each season and year, and divided this number by the total area surveyed in this cell (here- after ‘effort’). We were not able to correct density estimates for detectability, as >70% of birds were recorded in ﬂight without an estimated distance to the transect line. Because it is likely that detectability varied as a function of environmental conditions (e.g., sea state, cloud cover), our data provide an index of density rather than a robust estimate of absolute density. Although we recognise that detection of seabirds is imperfect (Ronconi and Burger, 2009), our objective was to evaluate a suite of modelling techniques, all of which used the same data set and therefore suffered from equal bias due to incomplete detection. Every grid cell that had a calculated density >0 received an additional binary detection/non-detection value of ‘1’ (hereafter referred to as ‘presence’), whereas grid cells that were surveyed, but where no Balearic Shearwaters were observed were coded as ‘0’ (hereafter referred to as ‘absence’). We deﬁned ‘seasons’ based on the known phenology of Balearic Shearwaters in the North-East Atlantic (Mouriño et al., 2003; Ruiz and Martí, 2004; Yésou, 2003), and phenological data collected during this project (Ramírez et al., 2008). Balearic Shearwaters breed from late February to late June, and migrate west out of the Mediterranean and then north in the Atlantic between May and July. They moult in the Northeast Atlantic from June to August, and migrate back to the Mediterranean between September and December (Arcos et al., 2009; Mouriño et al., 2003; Yésou, 2003). We therefore considered three seasons: ‘spring’ (January through April, representing mostly non-breeding birds), ‘summer’ (May through August, corresponding to northward migration and moult), and ‘autumn’ (September through December, correspond- ing to southward migration). Spatial autocorrelation is frequently encountered in ecological data, and not properly accounting for spatial correlation can inﬂu- ence the statistical inference of species distribution models (Dor- mann, 2007; Dormann et al., 2007; Lichstein et al., 2002). We explored whether there was spatial autocorrelation in Balearic Shearwater distribution and density by calculating Moran’s I (Mor- an, 1950) and Geary’s C (Geary, 1954) for the 50 nearest neighbour- ing grid cells using the functions ‘moran.test’ and ‘geary.test’, respectively, in the R-package ‘spdep’. Moran’s I ranges from À1 (perfect dispersion) to +1 (perfect correlation), with values around zero indicative of a random spatial pattern. Geary’s C ranges from 0 to 2, with values < 1 indicative of positive spatial autocorrelation. We found little evidence for spatial autocorrelation in both the dis- tribution (Moran’s I = 0.07 ± 0.05 standard deviation; Geary’s S. Oppel et al. / Biological Conservation 156 (2012) 94–104 95

Author's personal copy Biological Conservation 156 (2012) 94–104 Contents lists available at SciVerse ScienceDirect Biological Conservation journal homepage: www.elsevier.com/locate/biocon Comparison of ﬁve modelling techniques to predict the spatial distribution and abundance of seabirds Steffen Oppel a,⇑, Ana Meirinho b, Iván Ramírez b, Beth Gardner c, Allan F. O’Connell d, Peter I. Miller e, Maite Louzao f,g a Royal Society for the Protection of Birds, The Lodge, Sandy, Bedfordshire SG19 2DL, UK Sociedade Portuguesa para o Estudo das Aves, Avenida João Crisóstomo N18 4D, 1000-179 Lisboa, Portugal Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695, USA d US Geological Survey, Patuxent Wildlife Research Center, Laurel, MD 20708, USA e Plymouth Marine Laboratory, Prospect Place, Plymouth PL1 3DH, UK f UFZ – Helmholtz Centre for Environmental Research, Perlmoserstrasse 15, 04318 Leipzig, Germany g Centre d’Etudes Biologiques de Chizé, CNRS UPR 1934, 79369 Villiers en Bois, France b c a r t i c l e i n f o Article history: Available online 30 December 2011 Keywords: Species distribution model Machine learning Marine protected area Important bird area (IBA) Shearwater Portugal a b s t r a c t Knowledge about the spatial distribution of seabirds at sea is important for conservation. During marine conservation planning, logistical constraints preclude seabird surveys covering the complete area of interest and spatial distribution of seabirds is frequently inferred from predictive statistical models. Increasingly complex models are available to relate the distribution and abundance of pelagic seabirds to environmental variables, but a comparison of their usefulness for delineating protected areas for seabirds is lacking. Here we compare the performance of ﬁve modelling techniques (generalised linear models, generalised additive models, Random Forest, boosted regression trees, and maximum entropy) to predict the distribution of Balearic Shearwaters (Puffinus mauretanicus) along the coast of the western Iberian Peninsula. We used ship transect data from 2004 to 2009 and 13 environmental variables to predict occurrence and density, and evaluated predictive performance of all models using spatially segregated test data. Predicted distribution varied among the different models, although predictive performance varied little. An ensemble prediction that combined results from all ﬁve techniques was robust and conﬁrmed the existence of marine important bird areas for Balearic Shearwaters in Portugal and Spain. Our predictions suggested additional areas that would be of high priority for conservation and could be proposed as protected areas. Abundance data were extremely difﬁcult to predict, and none of ﬁve modelling techniques provided a reliable prediction of spatial patterns. We advocate the use of ensemble modelling that combines the output of several methods to predict the spatial distribution of seabirds, and use these predictions to target separate surveys assessing the abundance of seabirds in areas of regular use. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction Marine biodiversity is under increasing human pressure and many species of marine vertebrates have declined over the past decades (Halpern et al., 2008). The decline of many seabird species is directly linked to high mortality at sea due to ﬁsheries bycatch (Croxall and Rothery, 1991; Oro et al., 2004; Weimerskirch, 2002). Reducing mortality of seabirds and other marine biodiversity may be achieved by the designation and enforcement of marine protected areas (Game et al., 2009). Identifying marine areas that are suitable for protection to beneﬁt seabirds requires a thorough understanding of the spatial dis⇑ Corresponding author. Tel.: +44 1767 693452; fax: +44 1767 693211. E-mail addresses: steffen.oppel@rspb.org.uk, steffen.oppel@gmail.com (S. Oppel). 0006-3207/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.biocon.2011.11.013 tribution of seabirds at sea (Louzao et al., 2006a; Lovvorn et al., 2009; Piatt et al., 2006). During the breeding season all seabirds are central place foragers, and the location of important foraging areas at sea are inﬂuenced by different factors such as prey availability, foraging ranges and colony sizes of each species (Grecian et al., 2012; Huettmann and Diamond, 2001; Thaxter et al., 2012). Outside the breeding season, however, many species roam widely or migrate long distances, and use distinct areas for stopover (Guilford et al., 2009). Because these foraging areas along migratory routes are equally important for the conservation of seabirds, the identiﬁcation and protection of those areas is a key priority for seabird conservation (BirdLife International, 2010b; Oppel et al., 2009; Piatt et al., 2006). Determining the spatial distribution of seabirds during the nonbreeding period is difﬁcult because logistical constraints generally Author's personal copy S. Oppel et al. / Biological Conservation 156 (2012) 94–104 limit surveys to subsets of the area of interest. The increasing availability of large-scale remote sensing data, which can be used as environmental predictor variables, makes it possible to use statistical models to predict species distributions over large areas (Elith and Leathwick, 2009; Tremblay et al., 2009). Several statistical modelling techniques have been used to predict the occurrence and abundance of seabirds at sea (Huettmann and Diamond, 2006; Louzao et al., 2009; Nur et al., 2011; Tremblay et al., 2009; Yen et al., 2004). Currently, however, it is not clear how modelling methods differ in their ability to predict species distributions (Elith and Graham, 2009), and which approaches yield the most reliable predictions for seabirds. The number and complexity of modelling techniques used to predict species distributions has increased substantially over the past decades (Hegel et al., 2010), and several comparisons of model performance have been carried out for terrestrial species (e.g., Elith and Graham, 2009; Elith et al., 2006; Segurado and Araújo, 2004). In contrast, the marine environment is less studied and more challenging given its dynamic nature (Ready et al., 2010; Robinson et al., 2011; Wakeﬁeld et al., 2009). Furthermore, seabirds are highly mobile species, and their presence at certain locations varies temporally depending on whether an area is used during the breeding season, as a migration stopover, or as moult refuge (Tremblay et al., 2009). A comparison of the performance of different models that predict distributions and abundances of seabirds based on shipboard survey data has to our knowledge only been explored for one coastal species (Yen et al., 2004), yet the bourgeoning interest in the identiﬁcation of pelagic marine protected areas warrants a comparison of newer distribution modelling techniques. Here we compare the performance of ﬁve modelling techniques to predict the occurrence and abundance of a migratory seabird species outside of the breeding season. The Balearic Shearwater (Puffinus mauretanicus) is a critically endangered species that breeds only at the Balearic archipelago in the western Mediterranean, and migrates to the North-East Atlantic after the breeding season (Brooke, 2004). The species suffers from high adult mortality at sea (Oro et al., 2004), and most research efforts have focused on understanding foraging ecology and distribution during the breeding season in the Mediterranean (Bartumeus et al., 2010; Louzao et al., 2006a, 2006b). Marine protected areas are needed for Balearic Shearwaters throughout its range, and although both Spain and Portugal have delineated marine important bird areas (IBAs, Arcos et al., 2009; Ramírez et al., 2008), most of these areas were still not legally protected as of October 2011 (BirdLife International, 2010a). Our model comparison aims to inform seabird conservation managers about the performance of modelling techniques that can be used to predict the spatial distribution and abundance of seabirds for the identiﬁcation of marine IBAs or protected areas. We tested model predictions against independent data and compared predicted distributions with the locations of existing marine IBAs to evaluate whether our model results agree with IBAs that were identiﬁed with a variety of different methods (Ramírez et al., 2008). Thus, we provide information on which modelling techniques are useful for seabirds, and identify areas that may warrant protection to beneﬁt the Balearic Shearwater. 2. Methods 2.1. Data collection Between December 2004 and April 2009, we conducted shipboard surveys off the coast of Portugal and western Andalusia (Spain) between 36°N and 42°N, and 6°W and 10°W, covering an 95 area of 3497 km2 (Fig. 1). Survey effort was carried out between March and November each year, with fewer surveys from December through February. A total of 1590 h of observations provided data for our analysis, and 84% of the survey effort was in Portuguese waters, 26% in Spanish waters. We used standard European Seabirds at Sea protocols for data collection (Camphuysen and Garthe, 2004; Tasker et al., 1984) on board four similar research vessels. All seabirds in contact with water within 300 m of the survey transect were counted on one side of the ship. All ﬂying seabirds were counted using the ‘snapshot method’, and bird observations were summed over 5 min periods. Based on recorded vessel speed and the nominal width of the survey transect we then calculated the area surveyed (km2), and density of birds as the total number of observed birds divided by the area covered (birds kmÿ2). 2.2. Data processing and exploratory analysis We ﬁrst binned all observation data into a spatial grid with cell size 4 4 km to match the spatial resolution of remotely sensed environmental data. For each grid cell, we added the number of observed shearwaters during each season and year, and divided this number by the total area surveyed in this cell (hereafter ‘effort’). We were not able to correct density estimates for detectability, as >70% of birds were recorded in ﬂight without an estimated distance to the transect line. Because it is likely that detectability varied as a function of environmental conditions (e.g., sea state, cloud cover), our data provide an index of density rather than a robust estimate of absolute density. Although we recognise that detection of seabirds is imperfect (Ronconi and Burger, 2009), our objective was to evaluate a suite of modelling techniques, all of which used the same data set and therefore suffered from equal bias due to incomplete detection. Every grid cell that had a calculated density >0 received an additional binary detection/non-detection value of ‘1’ (hereafter referred to as ‘presence’), whereas grid cells that were surveyed, but where no Balearic Shearwaters were observed were coded as ‘0’ (hereafter referred to as ‘absence’). We deﬁned ‘seasons’ based on the known phenology of Balearic Shearwaters in the North-East Atlantic (Mouriño et al., 2003; Ruiz and Martí, 2004; Yésou, 2003), and phenological data collected during this project (Ramírez et al., 2008). Balearic Shearwaters breed from late February to late June, and migrate west out of the Mediterranean and then north in the Atlantic between May and July. They moult in the Northeast Atlantic from June to August, and migrate back to the Mediterranean between September and December (Arcos et al., 2009; Mouriño et al., 2003; Yésou, 2003). We therefore considered three seasons: ‘spring’ (January through April, representing mostly non-breeding birds), ‘summer’ (May through August, corresponding to northward migration and moult), and ‘autumn’ (September through December, corresponding to southward migration). Spatial autocorrelation is frequently encountered in ecological data, and not properly accounting for spatial correlation can inﬂuence the statistical inference of species distribution models (Dormann, 2007; Dormann et al., 2007; Lichstein et al., 2002). We explored whether there was spatial autocorrelation in Balearic Shearwater distribution and density by calculating Moran’s I (Moran, 1950) and Geary’s C (Geary, 1954) for the 50 nearest neighbouring grid cells using the functions ‘moran.test’ and ‘geary.test’, respectively, in the R-package ‘spdep’. Moran’s I ranges from ÿ1 (perfect dispersion) to +1 (perfect correlation), with values around zero indicative of a random spatial pattern. Geary’s C ranges from 0 to 2, with values < 1 indicative of positive spatial autocorrelation. We found little evidence for spatial autocorrelation in both the distribution (Moran’s I = 0.07 ± 0.05 standard deviation; Geary’s Author's personal copy 96 S. Oppel et al. / Biological Conservation 156 (2012) 94–104 Fig. 1. General map of the study region showing the number of Balearic Shearwaters observed during shipboard surveys between 2005 and 2009. Polygons delimit marine important bird areas identiﬁed for Balearic Shearwaters (Ramírez et al., 2008; Arcos et al., 2009). Dashed contour line indicates the edge of the continental shelf (500 m depth contour). Data from the dark grey shaded area served as test data to evaluate the predictive models. C = 0.92 ± 0.05 s.d.) and the density (Moran’s I = 0.02 ± 0.09 s.d.; Geary’s C = 0.98 ± 0.05 s.d.) of Balearic Shearwaters at the spatial scale of our grid cells (16 km2). Hence, we did not speciﬁcally incorporate measures to account for spatial autocorrelation in our models, but note that methods to incorporate spatial autocorrelation have recently become available (Hothorn et al., 2011). Our models used latitude and longitude as predictor variables and therefore implicitly included some spatial structure. 2.3. Environmental data To model Balearic Shearwater occurrence and density, we used 13 environmental variables (Table S1) that are either known, or sus- pected, to be correlated with seabird distribution and abundance (Louzao et al., 2006a; Tremblay et al., 2009; Wakeﬁeld et al., 2009). Physical variables (distance to coast, mean bathymetry, and bathymetry gradient) were extracted from global bathymetric data (www.ngdc.noaa.gov/mgg/gdas/gd_designagrid.html?dbase =GRDET2) using the cell value nearest to the centroid of each grid cell and were considered invariant throughout the period of our study. Dynamic oceanographic data (sea surface temperature, SST; chlorophyll a concentration, CHL; and sea surface height, SSH) were extracted as monthly averages from Aqua MODIS and Pathﬁnder AVHRR satellite imagery via NOAA’s BloomWatch data portal (http://coastwatch.pfel.noaa.gov/coastwatch/CWBrowserWW360. jsp), and varied among seasons and years in our study (Table S2). Author's personal copy S. Oppel et al. / Biological Conservation 156 (2012) 94–104 Composite front metrics (density of ocean fronts, and mean distance to nearest major ocean front) were derived from AVHRR satellite imagery (Miller, 2009). Seabird distribution is sometimes uncoupled from current oceanographic conditions measured by variables such as SST and CHL due to time lags between variables that can be measured and the factors (e.g. food availability) that actually attract seabirds (Wakeﬁeld et al., 2009). To account for these productivity time lags we integrated SST, CHL, and SSH over a period of three months prior to each season (Louzao et al., 2009; Table S1). We also included the temporal change in SST and CHL as predictor variables, calculated as (maximum SST ÿ minimum SST) 100/maximum SST, with maximum being the highest and minimum the lowest monthly mean SST value in each season. To account for annual anomalies, we included the SST anomaly for each season, calculated as the difference between the average value for a given season and year and the average for that season over a 20-year period in that grid cell. Because seabirds may respond to spatial gradients of oceanographic variables (Louzao et al., 2006a; Tremblay et al., 2009; Wakeﬁeld et al., 2009), we also calculated spatial SST and CHL gradients as (maximum value ÿ minimum value) 100/maximum value, with maximum being the highest and minimum the lowest seasonal mean SST or CHL value over a moving 5 5 grid cell window, thus representing 400 km2. This spatial scale was chosen because it provided an excellent ﬁt to observed fronts in the Mediterranean in surveys for Balearic Shearwaters during the breeding season (Louzao et al., 2009; Louzao et al., 2006a). Finally, we used the SSH deviation and composite metrics of ocean fronts in each season as indicators of mesoscale structures that mix additional nutrients up into the surface layer. These structures can sustain higher phytoplankton and zooplankton productivity than surrounding water, and can provide foraging opportunities for seabirds. We did not aim to explore the functional relationships between Balearic Shearwater distribution and environmental variables, but this information is presented in the online supplement (Figs. S1–S17). 2.4. Model construction For marine IBAs, identiﬁcation criteria depend not only on the presence, but also on the abundance of birds in an area (BirdLife International, 2010b). Hence, useful models must predict both occurrence and abundance. Because of the high mobility of seabirds and imperfect detection at sea, shipboard survey data generally have highly skewed distributions with frequent non-detections (zeros). Such data are difﬁcult to incorporate into standard parametric models (Martin et al., 2005; Sileshi et al., 2009; Warton, 2005). An efﬁcient way to overcome these difﬁculties is to ﬁt models in a hierarchical fashion (e.g., a ‘hurdle model’), including a component that estimates the occurrence probability, and a subsequent component that estimates the number of individuals given that the species is present (Millar, 2009; Potts and Elith, 2006; Wenger and Freeman, 2008). We adopted that strategy by constructing two separate sets of models, one to predict the presence of Balearic Shearwaters, and one to predict the density of Balearic Shearwaters given their presence in an area. For comparison, we also included a zero-inﬂated modelling approach to estimate density, which accounts for the large number of non-detections by incorporating the two components described above into one framework (Martin et al., 2005; Potts and Elith, 2006; Warton, 2005). For the occurrence models, we compared ﬁve different modelling techniques (for detailed information comparing these methods see Elith et al., 2006; Hegel et al., 2010; Prasad et al., 2006): generalised linear models (GLM; McCullagh and Nelder, 1989), generalised additive models (GAM; Hastie et al., 2001; Wood and Augustin, 2002), and three machine-learning approaches: Random Forest (RF; Breiman, 2001; Cutler et al., 2007), boosted regression 97 trees (BRT; Elith et al., 2008; Friedman, 2002), and maximum entropy (Maxent; Elith et al., 2011; Phillips et al., 2006; Phillips and Dudik, 2008). All models were constructed in R 2.13.0 with the packages ‘mgcv’, ‘gbm’, ‘randomForest’, and ‘dismo’ interfaced with the standalone MaxEnt program v. 3.3.3e. (http://www.cs.princeton.edu/~schapire/maxent/). Maximum entropy is a presence-only modelling approach that uses background samples of the environment rather than absence locations to estimate environmental relationships. We only used those grid cells that were surveyed in a given season and year as background data to facilitate a valid comparison with other models (Elith et al., 2011). Model speciﬁcations and software code used to construct the models are available in an online supplement (Appendix S2). We also estimated probability of presence based on an ensemble of all models, as such predictions are often more robust than predictions derived from a single model (Araújo and New, 2007; Marmion et al., 2009; Thuiller et al., 2009). Ensemble predictions were calculated as weighted averages of single-model predictions, with weights assigned to each modelling technique based on its discriminatory power as measured by the area under the receiver-operated characteristic curve (Appendix S2; Araújo and New, 2007; Marmion et al., 2009; Thuiller et al., 2009). For the density model, we used the same modelling techniques as for presence, except for maximum entropy, which is currently not available for modelling non-binary response data. We used the Poisson distribution for the parametric models (the GLM density model, the GAM density model, and the zero-inﬂated GLM model), because it resulted in better predictions than the negative binomial distribution. Ensemble predictions of density were calculated as above for distribution models, but weighting of models was based on the Pearson correlation coefﬁcient as an indicator. To avoid the inﬂuence of extreme observations, we excluded one outlying record of 702 Balearic Shearwaters in a grid cell. We included all environmental predictor variables in each of the modelling approaches. Because inclusion of a large number of variables may lead to over ﬁtting in parametric models, we reduced GLM complexity by sequentially eliminating variables from a full model with all predictors until a minimum AIC was reached. In GAMs, we used the automatic term selection procedure which imposes a penalty to smooth functions and can thus effectively remove terms from the model (Wood and Augustin, 2002). Machine learning approaches are generally robust to the inclusion of a large number of correlated variables (Archer and Kimes, 2008), and we therefore did not reduce the number of variables in these models. To predict the distribution and density for each season and year, we included ‘season’ and ‘year’ as factor variables in each model. In addition, we included latitude and longitude in all models, and the survey effort (in km2) per grid cell in all density models. 2.5. Model evaluation and calibration We divided the survey data into training and test data by setting aside approximately 30% of the surveyed area for spatial evaluation of the models (Araújo and Guisan, 2006; Austin, 2007). All areas north of 39.7°N and south of 38°N were used to construct models (training data), whereas the data in the intermediate sector were used to evaluate the predictive performance of the models (test data, shaded grey in Fig. 1). This division provided sufﬁcient presence data for all seasons in each of the three sectors (>20 grid cells each), and thus allowed for robust model testing. We assessed the performance of distribution models based on the accuracy of predictions for both the training and the independent test data, and report the area under the receiver-operating characteristic curve (AUC) as discrimination performance criteria. AUC ranges from 0 to 1, with values below 0.6 indicating a performance no better than random, values between 0.7–0.9 considered as useful, and values >0.9 as Author's personal copy 98 S. Oppel et al. / Biological Conservation 156 (2012) 94–104 excellent. All model evaluation statistics were calculated using the package ‘PresenceAbsence’ in R 2.11.1 (Appendix S2). Spatial distribution models need to be well calibrated to be useful for predicting beyond the original spatial extent of input observations (Phillips and Elith, 2010; Reineking and Schröder, 2006). In addition to discrimination, which measures a model’s ability to discriminate between presence and absence locations, calibration measures how well the frequency of observations in test data agrees with predicted probabilities of occurrence. To test calibration we used a linear regression of the relative frequency of observed presences over ten bins of predicted probabilities of presence implemented in a binned calibration plot using a custom-made function in R (Phillips and Elith, 2010; Appendix S2). The slope and the intercept of this regression indicate the calibration and the bias of the model, respectively (Phillips and Elith, 2010). In addition, we calculated the point biserial correlation between predicted and observed values, which is sensitive to both discrimination and calibration. For density models, we used the Pearson correlation coefﬁcient and the slope and intercept of a major axis regression of observed over predicted values to evaluate the bias and consistency of model predictions (Pineiro et al., 2008; Potts and Elith, 2006). 2.6. Identification of priority areas for conservation of Balearic Shearwaters To identify priority areas for conservation of Balearic Shearwaters along the W Iberian coast, we used our predicted probabilities of occurrence and the spatial prioritization algorithm ‘Zonation’ (Moilanen, 2007; Moilanen et al., 2005), which has been used successfully in large-scale marine applications (Leathwick et al., 2008). The ‘Zonation’ algorithm ranks areas according to their priority for conservation and is thus ideally suited for conservation planning. The ranking is achieved by sequentially removing grid cells from the study area that have low predicted probabilities of occurrence, and thus the lowest conservation value. The sequential removal also considers proximity of cells to areas of high conservation priority and thus results in a spatially constrained set of priority areas most relevant for conservation (Moilanen, 2007; Moilanen et al., 2005). The approach is designed for the use with multiple species, and marine reserve designation generally requires consideration of multiple species (Ainley et al., 2009; Nur et al., 2011). Here, we only had predicted probabilities of distribution for one species, but we used the predicted distribution in each of our 15 periods (three seasons in each of 5 years) analogous to identifying priority areas for multiple species. We used a simple core-area prioritization in Zonation 2.0 to guarantee the retention of high-quality areas identiﬁed in any particular season. We ran the algorithm without boundary quality penalties and a boundary length penalty of 0.1 to retain fewer contiguous areas rather than many small dissociated cells which would be impractical to designate as protected areas. We then compared the most important 10% of the study area retained by the prioritisation algorithm to existing IBAs in Portugal (Ramírez et al., 2008) and Spain (Arcos et al., 2009). The IBAs in Portugal were delineated based on a subset of the data used here, whereas the IBAs in Spain were delineated based on independent data and thus provided a test of the performance of our models. 3. Results 3.1. At-sea surveys We observed 5737 Balearic Shearwaters in 8174 grid cells that were surveyed over the 5 years of study (Fig. 1). On average, 0.7 (±9.8 s.d.) individuals were recorded per km2 of survey effort, but in 91% of grid cells no Balearic Shearwaters were observed. In a further 8% of grid cells only 1–10 birds were observed, and congregations of >100 Balearic Shearwaters were observed on only six occasions throughout the study period (0.07% of surveyed grid cells). 3.2. Performance of distribution models All ﬁve modelling techniques had a reasonable ability to discriminate between areas where Balearic Shearwaters were present and absent (all AUC > 0.75, Tables 1 and S3). We found little spatial autocorrelation in model residuals for all models (Moran’s I = 0.03 ± 0.05 s.d., Geary’s C = 0.97 ± 0.05 s.d.). For the training data used to construct the models, the machinelearning approaches RF and BRT provided the best discrimination between areas of presence and absence (highest AUC, Table S3). Maxent was the best-calibrated model, and BRT had the smallest bias. The performance of GLM and GAM was acceptable, but poorer than for the machine-learning approaches when predicting to the data used for model construction (Table S3). By contrast, prediction of independent test data was very similar among the ﬁve modelling techniques, with AUC ranging between 0.76 and 0.81 (Table 1). The correlation between predicted presences and observed data also were similar among models (Table 1). The BRT model had the poorest calibration and highest bias, while Maxent showed the best calibration and the GLM had the lowest bias of the ﬁve techniques (Table 1). The ensemble prediction showed the best combination of predictive performance and calibration (Table 1). 3.3. Performance of density models As with the occurrence models, performance of density models was very different between training and independent test data. When evaluated for the training data, RF explained 68% of the variation in Balearic Shearwater density, while none of the other models explained >35% of the variation (Table S4). The RF model also showed the highest correlation between observed and predicted density, while the Poisson GLM and GAM showed the best calibration and lowest bias (Table S4). Both GAM and BRT showed relatively high correlation between observed and predicted density, but BRT suffered from poor calibration and relatively large bias. The zero-inﬂated Poisson and the Poisson GLM explained the least amount of variation (<5%) in the density of Balearic Shearwaters in the training data. None of the models we constructed were able to accurately predict the density of Balearic Shearwaters in the independent test data (Fig. 2, Table 2). In this evaluation, the model performance depended not only on the density model itself, but also on the predictive ability Table 1 Model evaluation and calibration statistics of ﬁve modelling techniques and an ensemble predicting the distribution of Balearic Shearwaters in an independent test area. AUC = area under the receiver-operated characteristic curve; COR = point biserial correlation coefﬁcient between observed and predicted values; calibration = slope of regression of observed vs. predicted values; bias = intercept of regression of observed vs. predicted values. See text for description of modelling techniques. Model AUC COR Calibration Bias BRT RF GLM GAM MAX Ensemble 0.80 0.81 0.78 0.76 0.77 0.80 0.25 0.31 0.28 0.24 0.26 0.30 0.03 0.25 0.73 0.36 0.80 0.56 0.18 0.09 0.02 0.07 ÿ0.15 0.02 Author's personal copy 99 S. Oppel et al. / Biological Conservation 156 (2012) 94–104 of the occurrence model of the same technique, because predicted density was the product of predicted probability of presence and predicted density of Balearic Shearwaters for all models except the zero-inﬂated Poisson model. Neither of the models explained >8% of the variation in recorded density data of Balearic Shearwaters, and the predictive performance of all ﬁve models was similarly poor (Table 2). The parametric models (GLM, GAM, and zero-inﬂated Poisson) had the lowest correlation and explained the least variation. RF had the highest correlation, but was the worst calibrated model with very large bias (Table 2). The ensemble prediction combined the same correlation as the RF model but with substantially smaller bias. In summary, predicting the density of Balearic Shearwaters based on physical and oceanographic data alone was highly unreliable outside the breeding season (Fig. 2). Table 2 Model calibration statistics of ﬁve modelling techniques and an ensemble predicting the density (birds/km2) of Balearic Shearwaters in an independent test area. COR = Pearson correlation coefﬁcient between observed and predicted values; rho = Spearman’s rank correlation coefﬁcient; R2, calibration, and bias = coefﬁcient of determination, slope, and intercept of regression of log-transformed observed vs. log-transformed predicted values, respectively. See text for description of modelling techniques. 3.4. Priority areas for conservation of Balearic Shearwaters additional area around Porto has not been previously identiﬁed as IBA. The GLM was the only method that identiﬁed most of the Spanish IBA‘Entorno marino de las Rías Baixas’ as an area of highest priority for conservation, but in turn considered the two Portuguese IBAs of much lower priority. 6 6 Balearic Shearwater distribution varied among seasons, and seasonal differences were most prominent in the Gulf of Cádiz, with higher predicted probabilities in autumn than in summer (Figs. 3 and 4). Despite their similar performance criteria, the ﬁve techniques predicted different distributions of Balearic Shearwaters over the 5 years of our study. Consequently, the areas identiﬁed as highest conservation priority for Balearic Shearwaters also differed among techniques (Fig. 5). The areas north of the Spanish–Portuguese border, around Porto, Figuera da Foz, and in the Gulf of Cádiz (Spain) were consistently retained in the most important 10% of the study area. Three of these areas have been identiﬁed as marine IBAs by Portugal and Spain (Fig. 1), but an 4 5 3 4 2 3 0 1 2 1 0 -3 -2 -1 0 1 2 3 -3 -2 -1 0 1 2 3 -1 0 1 2 3 0 1 2 3 6 6 POIS 0 0 1 1 2 2 3 3 4 4 5 5 ZIP -2 -1 0 1 2 3 -3 6 6 -3 -2 Ensemble 4 3 2 1 0 0 1 2 3 4 5 GAM 5 observed log density (birds km 2 ) RF 5 BRT -3 -2 -1 0 1 2 3 -3 -2 -1 2 predicted log density (birds km ) Fig. 2. Relationship between observed and predicted density of Balearic Shearwaters in independent test data along the coast of Portugal. BRT: boosted regression trees; RF: Random Forest; ZIP: zero-inﬂated poisson GLM; POiS: poisson GLM; GAM: generalised additive model; Ensemble: ensemble prediction weighting different models based on the correlation between observed and predicted data. Note that densities were log transformed to enhance display. Model COR Rho R2 Calibration Bias BRT RF ZIP GLM GAM Ensemble 0.12 0.14 0.09 0.03 0.01 0.13 0.26 0.25 0.17 0.17 0.17 0.25 0.08 0.06 0.02 0.02 0.02 0.07 10.5 14.8 6.7 5.1 0.3 14.4 ÿ0.5 ÿ7.7 ÿ0.5 ÿ2.3 ÿ11.7 ÿ0.3 4. Discussion 4.1. Utility of models to identify marine IBAs Our study shows that the choice of modelling method may inﬂuence the identiﬁcation of marine areas for the protection of seabirds. None of the ﬁve techniques tested provided superior predictions in all performance criteria, a ﬁnding that is consistent with other model comparison studies (Elith and Graham, 2009; Segurado and Araújo, 2004; Syphard and Franklin, 2009). Despite similar predictive performance, the nature of predicted distributions can vary due to different emphasis on and modelled relationships with environmental variables (Elith and Graham, 2009; Ready et al., 2010). In our study, for example, the Maxent and the GLM model had similar predictive performance (Table 1), but would have selected very different areas for the conservation of Balearic Shearwaters (Fig. 5). Because of the uncertainty in choosing a single appropriate technique to identify important areas for seabirds, using a variety of different models and combining them in an ensemble can improve overall prediction (Araújo and New, 2007; Coetzee et al., 2009; Jones-Farrand et al., 2011; Marmion et al., 2009). Our ensemble of ﬁve techniques successfully predicted areas of importance for Balearic Shearwaters, including one that had been identiﬁed in a different IBA identiﬁcation project (Arcos et al., 2009). The Spanish IBA ‘Entorno marino de las Rías Baixas’ lies north of the area covered by ship surveys in this project, and thus indicates a reliable prediction of our models to an area that extended beyond the sampling region. Additional independent evidence of the importance and validity of the areas we identiﬁed is provided by the recent tracking of Balearic Shearwaters with geolocators and satellite transmitters (T. Guilford & M. Louzao, unpubl. data). In fact, most of the 29 birds tracked with geolocators used at least one of the important areas mentioned above during the non-breeding period and the more ﬁne scale satellite-based tracking data highlights the importance of the IBA Figueira da Foz for a non-breeder. We are therefore conﬁdent that the ensemble predictions are robust and we recommend that conservation managers rely on a suite of modelling techniques when trying to identify marine protected areas. Due to recent advances in freely available software, computational challenges for ensemble predictions have decreased considerably (Thuiller, 2003; Thuiller et al., 2009). In contrast to the occurrence models, our density models performed poorly on independent test data (Fig. 2), and we do Author's personal copy 100 S. Oppel et al. / Biological Conservation 156 (2012) 94–104 BRT 42 41 41 40 40 39 39 38 38 37 37 36 36 -10 -9 -8 -7 -6 GLM 42 40 40 39 39 38 38 37 37 36 36 -8 -7 -6 MaxEnt 42 40 40 39 39 38 38 37 37 36 36 -8 -7 -8 -6 -7 -6 0.6 0.4 -9 -8 -7 -6 Ensemble 42 41 -9 -9 GAM -10 41 -10 0.8 42 41 -9 1 -10 41 -10 RF 42 0.2 0 -10 -9 -8 -7 -6 Fig. 3. Predicted distribution of Balearic Shearwater during summer (May–August) 2005–2009 along the coast of southwest Iberia based on ﬁve different distribution models and an ensemble prediction across all ﬁve models. BRT: boosted regression trees, RF: Random Forest, GLM: generalised linear model, GAM: generalised additive model, MaxEnt: maximum entropy, Ensemble: ensemble prediction across all ﬁve models. not recommend their use to estimate whether an area meets the numerical threshold to qualify for a marine IBA (BirdLife International, 2010b). The identiﬁcation of marine IBAs may beneﬁt from a two-step approach, where the spatial distribution models are used to delineate potential areas where a species occurs regularly, and speciﬁc surveys are then conducted in a second step to assess the abundance of target species in those areas. 4.2. Important areas for the conservation of Balearic Shearwaters Our modelling exercise indicated that Balearic Shearwaters use slightly different regions during their northward (May–July) and southward migration (September–December). During summer, shearwaters were present in four marine areas over the continental shelf north of Lisbon (Porto, Figuera da Foz, south of the Berlengas Islands and north of the Spanish–Portuguese border), whereas the Gulf of Cádiz was only used during the southward migration in autumn. The majority of Balearic Shearwaters migrate to the North-East Atlantic before moulting, but some individuals might also moult and stay from June to October over the northern Portuguese continental shelf and in the IBA ‘Rías Baixas’ (Arcos et al., 2009; Mouriño et al., 2003). During autumn migration, birds might forage in additional, seasonally highly productive areas such as the Gulf of Cádiz (García Lafuente and Ruiz, 2007). Our results conﬁrm the high importance of the IBAs ‘Figueira da Foz’, ‘Berlengas’ and ‘Rías Baixas’ (Fig. 1), but suggest that in addition the near-shore waters around Porto are regularly used by Balearic Shearwaters both in summer and in autumn. Low adult survival rates at sea are the main cause for the decline of the Balearic Shearwater population (Louzao et al., 2006b; Oro et al., 2004), and areas that are reliably used by the species during moult and on migration require legal protection to reduce mortality. Our study suggests that the Atlantic coast of Portugal north of Lisbon and extending into Spain serves as an important migratory stopover and/or moulting habitat for Balearic Shearwaters, where protected areas that reduce accidental mortality may Author's personal copy 101 S. Oppel et al. / Biological Conservation 156 (2012) 94–104 BRT 42 41 41 40 40 39 39 38 38 37 37 36 36 -10 -9 -8 -7 -6 GLM 42 40 40 39 39 38 38 37 37 36 36 -8 -7 -6 MaxEnt 42 40 40 39 39 38 38 37 37 36 36 -8 -7 -8 -6 -7 -6 0.6 0.4 -9 -8 -7 -6 Ensemble 42 41 -9 -9 GAM -10 41 -10 0.8 42 41 -9 1 -10 41 -10 RF 42 0.2 0 -10 -9 -8 -7 -6 Fig. 4. Predicted distribution of Balearic Shearwater during autumn (September–December) 2005–2009 along the coast of southwest Iberia based on ﬁve different distribution models and an ensemble prediction across all ﬁve models. BRT: boosted regression trees, RF: Random Forest, GLM: generalised linear model, GAM: generalised additive model, MaxEnt: maximum entropy, Ensemble: ensemble prediction across all ﬁve models. beneﬁt a signiﬁcant proportion of the species. Further surveys to assess whether the number of Balearic Shearwaters in the nearshore waters around Porto meet marine IBA criteria (BirdLife International, 2010b) would be needed given the socioeconomic costs that may be incurred when marine areas are created for the conservation of seabirds (Adams et al., 2011; Balmford et al., 2004). In addition, the designation of marine reserves would beneﬁt from information on the spatial distribution of multiple species (Ainley et al., 2009; Nur et al., 2011). We therefore recommend integrating the information presented here with similar data for other species, and other stakeholder interests to designate effective marine protected areas for seabirds (Smith et al., 2009). 4.3. Advancing seabird distribution models The vast majority of spatial distribution model literature predicts the distribution of stationary plants or animals in more temporally stable environments. Novel approaches are emerging to model mobile, migratory, or range-shifting species (Elith et al., 2010; Fink et al., 2010; Hothorn et al., 2011; Zurell et al., 2009), but signiﬁcant challenges still exist in the marine environment, where conditions at a given location are constantly changing (Tremblay et al., 2009; Zipkin et al., 2010). Several unresolved issues exist in seabird distribution models regarding heterogeneity in spatiotemporal scales that may be informed by advances in other environments (Robinson et al., 2011; Schröder, 2008). For example, what time lag exists between the easily measured proxys of primary productivity (e.g., sea surface temperature and chlorophyll a concentration) and foraging conditions that actually attract seabirds? Similarly, is it more useful to model the distribution of pelagic birds separately for each year, using contemporary local environmental measurements, or will predictions that pool observations over several years and use average environmental conditions at a given location provide more robust predictions (Tuanmu et al., 2011)? There may be no single best approach to these issues, and simulations would be useful to thoroughly test Author's personal copy 102 S. Oppel et al. / Biological Conservation 156 (2012) 94–104 BRT 42 41 41 40 40 39 39 38 38 37 37 36 36 -10 -9 -8 -7 -6 GLM 42 -10 41 40 40 39 39 38 38 37 37 36 36 -9 -8 -7 -6 MaxEnt 42 40 40 39 39 38 38 37 37 36 36 -8 top 1% -7 -9 -6 -7 -6 -8 -7 -6 Ensemble 42 41 -9 -8 GAM -10 41 -10 -9 42 41 -10 RF 42 -10 top 5% -9 -8 -7 -6 top 10% Fig. 5. Ranking of marine areas along the coast of southwest Iberia for the conservation of Balearic Shearwaters based on ﬁve different distribution models and an ensemble prediction across all ﬁve models. BRT: boosted regression trees, RF: Random Forest, GLM: generalised linear model, GAM: generalised additive model, MaxEnt: maximum entropy, Ensemble: ensemble prediction across all ﬁve models. Areas were identiﬁed using the ‘Zonation’ algorithm, and the colour of shading reﬂects the priority for conservation (in % of total study area). which combination of temporal aggregation and resolution is most reliable for seabirds at sea. Similar questions exist regarding the choice of environmental predictor variables and modelling techniques. We did not examine the contribution of different variables in this manuscript, and relied on variables commonly used in seabird studies (Tremblay et al., 2009; Wakeﬁeld et al., 2009). Although it is well known that different modelling techniques will provide different predictions of the distribution of a species, the causes for these differences are still poorly understood. The most likely reason for differences in the mapped predictions for occurrence between models are differences in the functions ﬁtted by each technique (Elith and Graham, 2009). Different predictor variables and varying levels of complexities in the ﬁtted functions (Figs. S1–S17) are likely to explain differences between the machine-learning and the parametric modelling approaches in our study (Dormann et al., 2008; Syphard and Franklin, 2009), but a detailed analysis of the speciﬁc model differences is beyond the scope of this contribution. The machine-learning methods RF, BRT, and Maxent provided excellent discrimination between areas where Balearic Shearwaters were present and absent. However, when used on spatially independent test data, the predictive performance of those three models was only marginally better than GLM and GAM approaches, suggesting that the machine learning methods suffer proportionally more from over ﬁtting than parametric models (Dormann et al., 2008; Hastie et al., 2001). A similar pattern emerged for the density models, where RF showed good performance on the training data, but equally poor performance as other models when predicting shearwater density in independent test data. Many seabird surveys yield only brief temporal windows into the spatial distribution of seabirds, and many recorded absences may therefore be considered as false absences, because seabirds may have been present in that area at a time when the surveyors were not. Such methodological absences reduce the power of spatial distribution and abundance models (Lobo et al., 2010; Martin et al., 2005), and create uncertainty when models are evaluated. Therefore, Lobo et al. (2010) suggested removing absences that are identical or close to observed presences in environmental space from the training data to remove the potential effect of false absences. False absences may also have contributed to the poor performance of our density models, as the hurdle model approach that we employed assumes that zero observations reﬂect true absence (Risk et al., 2011). However, even when predicted to only the presence fraction of independent test data our density models showed weak correlation with observed densities. More sophisticated models that simultaneously model the observation and process uncertainty and can thus account for imperfect detection may provide better estimates of spatial abundance patterns in the future (Dail and Madsen, 2011; Hutchinson et al., 2011; Warton and Shepherd, 2010). Acknowledgements Data collection for this project started with LIFE 04NAT/PT/ 000213 funding by the European Union, and we thank IPIMAR (Instituto de Investigação das Pescas e do Mar) and the Instituto Hidrográﬁco Português for subsequent support in data collection. We thank all research institutions that allowed observers onboard the survey vessels, and all observers and volunteers of the LIFE project. M.L. was funded by a Marie Curie Individual Fellowship (PIEFGA-2008-220063). G. Buchanan, A. Marcer and R. Hijmans assisted with maximum entropy modelling. The manuscript beneﬁted from thoughtful comments by B. Best, G. Humphries, J. M. Arcos, R. A. Ronconi, E. Owen, N. Nur, and two anonymous reviewers. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.biocon.2011.11.013. Author's personal copy S. Oppel et al. / Biological Conservation 156 (2012) 94–104 References Adams, V.M., Mills, M., Jupiter, S.D., Pressey, R.L., 2011. Improving social acceptability of marine protected area networks: a method for estimating opportunity costs to multiple gear types in both ﬁshed and currently unﬁshed areas. Biological Conservation 144, 350–361. Ainley, D.G., Dugger, K.D., Ford, R.G., Pierce, S.D., Reese, D.C., Brodeur, R.D., Tynan, C.T., Barth, J.A., 2009. Association of predators and prey at frontal features in the California current: competition, facilitation, and co-occurrence. Marine Ecology Progress Series 389, 271–294. Araújo, M.B., Guisan, A., 2006. Five (or so) challenges for species distribution modelling. Journal of Biogeography 33, 1677–1688. Araújo, M.B., New, M., 2007. Ensemble forecasting of species distributions. Trends in Ecology & Evolution 22, 42–47. Archer, K.J., Kimes, R.V., 2008. Empirical characterization of random forest variable importance measures. Computational Statistics and Data Analysis 52, 2249– 2260. Arcos, J.M., Bécares, J., Rodríguez, B., Ruiz, A., 2009. Áreas Importantes para la Conservación de las Aves marinas en España. Sociedad Española de Ornitología (SEO/BirdLife), Madrid, Spain. Austin, M., 2007. Species distribution models and ecological theory: a critical assessment and some possible new approaches. Ecological Modelling 200, 1– 19. Balmford, A., Gravestock, P., Hockley, N., McClean, C.J., Roberts, C.M., 2004. The worldwide costs of marine protected areas. Proceedings of the National Academy of Sciences of the United States of America 101, 9694–9697. Bartumeus, F., Giuggioli, L., Louzao, M., Bretagnolle, V., Oro, D., Levin, S.A., 2010. Fishery discards impact on seabird movement patterns at regional scales. Current Biology 20, 215–222. BirdLife International, 2010a. Marine IBAs in the European Union. BirdLife International, Brussels, Belgium. BirdLife International, 2010b. Marine Important Bird Areas Toolkit: Standardised Techniques for Identifying Priority Sites for the Conservation of Seabirds At-Sea. BirdLife International, Cambridge, UK. Breiman, L., 2001. Random forests. Machine Learning 45, 5–32. Brooke, M., 2004. Albatrosses and Petrels Across the World. Oxford University Press, USA. Camphuysen, K.C.J., Garthe, S., 2004. Recording foraging seabirds at sea: standardised recording and coding of foraging behaviour and multi-species foraging associations. Atlantic Seabirds 6, 1–32. Coetzee, B.W.T., Robertson, M.P., Erasmus, B.F.N., van Rensburg, B.J., Thuiller, W., 2009. Ensemble models predict Important Bird Areas in southern Africa will become less effective for conserving endemic birds under climate change. Global Ecology and Biogeography 18, 701–710. Croxall, J., Rothery, P., 1991. Population regulation of seabirds: implications of their demography for conservation. In: Perrins, C.M., Lebreton, J.D., Hirons, G.M. (Eds.), Bird Population Studies, Relevance to Conservation and Management. Oxford University Press, Oxford, UK, pp. 272–296. Cutler, D.R., Edwards, T.C., Beard, K.H., Cutler, A., Hess, K.T., Gibson, J., Lawler, J.J., 2007. Random forests for classiﬁcation in ecology. Ecology 88, 2783–2792. Dail, D., Madsen, L., 2011. Models for estimating abundance from repeated counts of an open metapopulation. Biometrics 67, 577–578. Dormann, C.F., 2007. Effects of incorporating spatial autocorrelation into the analysis of species distribution data. Global Ecology and Biogeography 16, 129– 138. Dormann, C.F., McPherson, J.M., Araújo, M.B., Bivand, R., Bolliger, J., Carl, G., Davies, R.G., Hirzel, A., Jetz, W., Kissling, W.D., 2007. Methods to account for spatial autocorrelation in the analysis of species distributional data: a review. Ecography 30, 609–628. Dormann, C.F., Purschke, O., Marquez, J.R.G., Lautenbach, S., Schröder, B., 2008. Components of uncertainty in species distribution analysis: a case study of the Great Grey Shrike. Ecology 89, 3371–3386. Elith, J., Graham, C.H., 2009. Do they? How do they? WHY do they differ? On ﬁnding reasons for differing performances of species distribution models. Ecography 32, 66–77. Elith, J., Graham, C.H., Anderson, R.P., Dudik, M., Ferrier, S., Guisan, A., Hijmans, R.J., Huettmann, F., Leathwick, J.R., Lehmann, A., Li, J., Lohmann, L.G., Loiselle, B.A., Manion, G., Moritz, C., Nakamura, M., Nakazawa, Y., Overton, J.M., Peterson, A.T., Phillips, S.J., Richardson, K., Scachetti-Pereira, R., Schapire, R.E., Soberon, J., Williams, S., Wisz, M.S., Zimmermann, N.E., 2006. Novel methods improve prediction of species’ distributions from occurrence data. Ecography 29, 129– 151. Elith, J., Kearney, M., Phillips, S., 2010. The art of modelling range-shifting species. Methods in Ecology and Evolution 1, 330–342. Elith, J., Leathwick, J.R., 2009. Species distribution models: ecological explanation and prediction across space and time. Annual Review of Ecology, Evolution, and Systematics 40, 677–697. Elith, J., Leathwick, J.R., Hastie, T., 2008. A working guide to boosted regression trees. Journal of Animal Ecology 77, 802–813. Elith, J., Phillips, S.J., Hastie, T., Dudík, M., Chee, Y.E., Yates, C.J., 2011. A statistical explanation of MaxEnt for ecologists. Diversity and Distributions 17, 43–57. Fink, D., Hochachka, W.M., Zuckerberg, B., Winkler, D.W., Shaby, B., Munson, M.A., Hooker, G., Riedewald, M., Sheldon, D., Kelling, S., 2010. Spatiotemporal exploratory models for broad-scale survey data. Ecological Applications 20, 2131–2147. 103 Friedman, J.H., 2002. Stochastic gradient boosting. Computational Statistics & Data Analysis 38, 367–378. Game, E.T., Grantham, H.S., Hobday, A.J., Pressey, R.L., Lombard, A.T., Beckley, L.E., Gjerde, K., Bustamante, R., Possingham, H.P., Richardson, A.J., 2009. Pelagic protected areas: the missing dimension in ocean conservation. Trends in Ecology & Evolution 24, 360–369. García Lafuente, J., Ruiz, J., 2007. The Gulf of Cádiz pelagic ecosystem: a review. Progress in Oceanography 74, 228–251. Geary, R.C., 1954. The contiguity ratio and statistical mapping. Incorporated Statistician 5, 115–146. Grecian, W.J., Witt, M.J., Attrill, M.J., Bearhop, S., Godley, B.J., Grémillet, D., Hamer, K.C., Votier, S.C., 2012. A novel projection technique to identify important at-sea areas for seabird conservation: an example using Northern gannets breeding in the North East Atlantic. Biological Conservation this issue. Guilford, T., Meade, J., Willis, J., Phillips, R.A., Boyle, D., Roberts, S., Collett, M., Freeman, R., Perrins, C.M., 2009. Migration and stopover in a small pelagic seabird, the Manx shearwater Pufﬁnus pufﬁnus: insights from machine learning. Proceedings of the Royal Society B 276, 1215–1223. Halpern, B.S., Walbridge, S., Selkoe, K.A., Kappel, C.V., Micheli, F., D’Agrosa, C., Bruno, J.F., Casey, K.S., Ebert, C., Fox, H.E., Fujita, R., Heinemann, D., Lenihan, H.S., Madin, E.M.P., Perry, M.T., Selig, E.R., Spalding, M., Steneck, R., Watson, R., 2008. A global map of human impact on marine ecosystems. Science 319, 948–952. Hastie, T., Tibshirani, R., Friedman, J.H., 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, New York, NJ. Hegel, T.M., Cushman, S.A., Evans, J., Huettmann, F., 2010. Current state of the art for statistical modelling of species distributions. In: Cushman, S., Huettmann, F. (Eds.), Spatial Complexity, Informatics, and Wildlife Conservation. Springer, Tokyo, pp. 273–311. Hothorn, T., Müller, J., Schröder, B., Kneib, T., Brandl, R., 2011. Decomposing environmental, spatial, and spatiotemporal components of species distributions. Ecological Monographs 81, 329–347. Huettmann, F., Diamond, A., 2006. Large-scale effects on the spatial distribution of seabirds in the Northwest Atlantic. Landscape Ecology 21, 1089–1108. Huettmann, F., Diamond, A.W., 2001. Seabird colony locations and environmental determination of seabird distribution: a spatially explicit breeding seabird model for the Northwest Atlantic. Ecological Modelling 141, 261–298. Hutchinson, R.A., Liu, L.-P., Dietterich, T.G., 2011. Incorporating boosted regression trees into ecological latent variable models. In: 25th AAAI Conference on Artiﬁcial Intelligence. Association for the Advancement of Artiﬁcial Intelligence, San Francisco, CA. Jones-Farrand, D.T., Fearer, T.M., Thogmartin, W.E., Iii, F.R.T., Nelson, M.D., Tirpak, J.M., 2011. Comparison of statistical and theoretical habitat models for conservation planning: the beneﬁt of ensemble prediction. Ecological Applications 21, 2269–2282. Leathwick, J., Moilanen, A., Francis, M., Elith, J., Taylor, P., Julian, K., Hastie, T., Duffy, C., 2008. Novel methods for the design and evaluation of marine protected areas in offshore waters. Conservation Letters 1, 91–102. Lichstein, J., Simons, T., Shriner, S., Franzreb, K., 2002. Spatial autocorrelation and autoregressive models in ecology. Ecological Monographs 72, 445–463. Lobo, J.M., Jiménez-Valverde, A., Hortal, J., 2010. The uncertain nature of absences and their importance in species distribution modelling. Ecography 33, 103–114. Louzao, M., Becares, J., Rodriguez, B., Hyrenbach, K.D., Ruiz, A., Arcos, J.M., 2009. Combining vessel-based surveys and tracking data to identify key marine areas for seabirds. Marine Ecology Progress Series 391, 183–197. Louzao, M., Hyrenbach, K., Arcos, J., Abelló, P., de Sola, L., Oro, D., 2006a. Oceanographic habitat of an endangered Mediterranean procellariiform: implications for marine protected areas. Ecological Applications 16, 1683–1695. Louzao, M., Igual, J., McMinn, M., Aguilar, J., Triay, R., Oro, D., 2006b. Small pelagic ﬁsh, trawling discards and breeding performance of the critically endangered Balearic shearwater: improving conservation diagnosis. Marine Ecology Progress Series 318, 247–254. Lovvorn, J.R., Grebmeier, J.M., Cooper, L.W., Bump, J.K., Richman, S.E., 2009. Modeling marine protected areas for threatened eiders in a climatically changing Bering Sea. Ecological Applications 19, 1596–1613. Marmion, M., Parviainen, M., Luoto, M., Heikkinen, R.K., Thuiller, W., 2009. Evaluation of consensus methods in predictive species distribution modelling. Diversity and Distributions 15, 59–69. Martin, T.G., Wintle, B.A., Rhodes, J.R., Kuhnert, P.M., Field, S.A., Low-Choy, S.J., Tyre, A.J., Possingham, H.P., 2005. Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecology Letters 8, 1235–1246. McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models. Chapman and Hall, Washington, DC, USA. Millar, R.B., 2009. Comparison of hierarchical Bayesian models for overdispersed count data using DIC and Bayes’ factors. Biometrics 65, 962–969. Miller, P., 2009. Composite front maps for improved visibility of dynamic seasurface features on cloudy SeaWiFS and AVHRR data. Journal of Marine Systems 78, 327–336. Moilanen, A., 2007. Landscape zonation, beneﬁt functions and target-based planning: unifying reserve selection strategies. Biological Conservation 134, 571–579. Moilanen, A., Franco, A.M.A., Early, R.I., Fox, R., Wintle, B., Thomas, C.D., 2005. Prioritizing multiple-use landscapes for conservation: methods for large multispecies planning problems. Proceedings of the Royal Society B: Biological Sciences 272, 1885–1891. Author's personal copy 104 S. Oppel et al. / Biological Conservation 156 (2012) 94–104 Moran, P.A.P., 1950. Notes on continuous stochastic phenomena. Biometrika 37, 17. Mouriño, J., Arcos, F., Salvadores, R., Sandoval, A., Vidal, C., 2003. Status of the Balearic shearwater (Puffinus mauretanicus) on Galician coast (NW Iberian Peninsula). Scientia Marina 67, 135–142. Nur, N., Jahncke, J., Herzog, M.P., Howar, J., Hyrenbach, K.D., Zamon, J.E., Ainley, D.G., Wiens, J.A., Morgan, K., Ballance, L.T., Stralberg, D., 2011. Where the wild things are: predicting hotspots of seabird aggregations in the California current system. Ecological Applications 21, 2241–2257. Oppel, S., Dickson, D.L., Powell, A.N., 2009. International importance of the eastern Chukchi Sea as a staging area for migrating King Eiders. Polar Biology 32, 775– 783. Oro, D., Aguilar, J.S., Igual, J.M., Louzao, M., 2004. Modelling demography and extinction risk in the endangered Balearic shearwater. Biological Conservation 116, 93–102. Phillips, S.J., Anderson, R.P., Schapire, R.E., 2006. Maximum entropy modeling of species geographic distributions. Ecological Modelling 190, 231–259. Phillips, S.J., Dudik, M., 2008. Modeling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography 31, 161–175. Phillips, S.J., Elith, J., 2010. POC plots: calibrating species distribution models with presence-only data. Ecology 91, 2476–2484. Piatt, J.F., Wetzel, J., Bell, K., DeGange, A.R., Balogh, G.R., Drew, G.S., Geernaert, T., Ladd, C., Byrd, G.V., 2006. Predictable hotspots and foraging habitat of the endangered short-tailed albatross (Phoebastria albatrus) in the North Paciﬁc: Implications for conservation. Deep Sea Research Part II 53, 387–398. Pineiro, G., Perelman, S., Guerschman, J.P., Paruelo, J.M., 2008. How to evaluate models: observed vs. predicted or predicted vs. observed? Ecological Modelling 216, 316–322. Potts, J.M., Elith, J., 2006. Comparing species abundance models. Ecological Modelling 199, 153–163. Prasad, A.M., Iverson, L.R., Liaw, A., 2006. Newer classiﬁcation and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9, 181–199. Ramírez, I., Geraldes, P., Meirinho, A., Amorim, P., Paiva, V.H., 2008. Areas Importantes Para as Aves Marinhas em Portugal. Sociedade Portuguesa para o Estudo das Aves, Lisbon, Portugal. Ready, J., Kaschner, K., South, A.B., Eastwood, P.D., Rees, T., Rius, J., Agbayani, E., Kullander, S., Froese, R., 2010. Predicting the distributions of marine organisms at the global scale. Ecological Modelling 221, 467–478. Reineking, B., Schröder, B., 2006. Constrain to perform: regularization of habitat models. Ecological Modelling 193, 675–690. Risk, B.B., de Valpine, P., Beissinger, S.R., 2011. A robust-design formulation of the incidence function model of metapopulation dynamics applied to two species of rails. Ecology 92, 462–474. Robinson, L.M., Elith, J., Hobday, A.J., Pearson, R.G., Kendall, B.E., Possingham, H.P., Richardson, A.J., 2011. Pushing the limits in marine species distribution modelling: lessons from the land present challenges and opportunities. Global Ecology and Biogeography 20, 789–802. Ronconi, R.A., Burger, A.E., 2009. Estimating seabird densities from vessel transects: distance sampling and implications for strip transects. Aquatic Biology 4, 297– 309. Ruiz, A., Martí, R. (Eds.), 2004. La pardela Balear. SEO/BirdLife-Conselleria de Medi Ambient del Govern de les Iles Balears, Madrid, Spain. Schröder, B., 2008. Challenges of species distribution modeling belowground. Journal of Plant Nutrition and Soil Science 171, 325–337. Segurado, P., Araújo, M.B., 2004. An evaluation of methods for modelling species distributions. Journal of Biogeography 31, 1555–1568. Sileshi, G., Hailu, G., Nyadzi, G.I., 2009. Traditional occupancy-abundance models are inadequate for zero-inﬂated ecological count data. Ecological Modelling 220, 1764–1775. Smith, R.J., Eastwood, P.D., Ota, Y., Rogers, S.I., 2009. Developing best practice for using Marxan to locate marine protected areas in European waters. ICES Journal of Marine Science 66, 188–194. Syphard, A.D., Franklin, J., 2009. Differences in spatial predictions among species distribution modeling methods vary with species traits and environmental predictors. Ecography 32, 907–918. Tasker, M.L., Jones, P.H., Dixon, T., Blake, B.F., 1984. Counting seabirds at sea from ships: a review of methods employed and a suggestion for a standardized approach. Auk 101, 567–577. Thaxter, C., Lascelles, B., Sugar, K., Cook, A.S., Roos, S., Bolton, M., Langston, R., Burton, N.H., 2012. Seabird foraging ranges as a preliminary tool for identifying candidate Marine Protected Areas. Biological Conservation, this issue. Thuiller, W., 2003. BIOMOD – optimizing predictions of species distributions and projecting potential future shifts under global change. Global Change Biology 9, 1353–1362. Thuiller, W., Lafourcade, B., Engler, R., Araujo, M.B., 2009. BIOMOD – a platform for ensemble forecasting of species distributions. Ecography 32, 369–373. Tremblay, Y., Bertrand, S., Henry, R.W., Kappes, M.A., Costa, D.P., Shaffer, S.A., 2009. Analytical approaches to investigating seabird-environment interactions: a review. Marine Ecology Progress Series 391, 153–163. Tuanmu, M.-N., Viña, A., Roloff, G.J., Liu, W., Ouyang, Z., Zhang, H., Liu, J., 2011. Temporal transferability of wildlife habitat models: implications for habitat monitoring. Journal of Biogeography 38, 1510–1523. Wakeﬁeld, E.D., Phillips, R.A., Matthiopoulos, J., 2009. Quantifying habitat use and preferences of pelagic seabirds using individual movement data: a review. Marine Ecology Progress Series 391, 165–182. Warton, D.I., 2005. Many zeros does not mean zero inﬂation: comparing the goodness-of-ﬁt of parametric models to multivariate abundance data. Environmetrics 16, 275–289. Warton, D.I., Shepherd, L.C., 2010. Poisson point process models solve the ‘‘pseudoabsence problem’’ for presence-only data in ecology. Annals of Applied Statistics. Weimerskirch, H., 2002. Seabird demography and its relationship with the marine environment. In: Schreiber, E.A., Burger, J. (Eds.), Biology of Marine Birds. CRC Press, Boca Raton, Florida, pp. 115–135. Wenger, S.J., Freeman, M.C., 2008. Estimating species occurrence, abundance, and detection probability using zero-inﬂated distributions. Ecology 89, 2953–2959. Wood, S.N., Augustin, N.H., 2002. GAMs with integrated model selection using penalized regression splines and applications to environmental modelling. Ecological Modelling 157, 157–177. Yen, P.P.W., Huettmann, F., Cooke, F., 2004. A large-scale model for the at-sea distribution and abundance of Marbled Murrelets (Brachyramphus marmoratus) during the breeding season in coastal British Columbia, Canada. Ecological Modelling 171, 395–413. Yésou, P., 2003. Recent changes in the summer distribution of the Balearic shearwater Pufﬁnus mauretanicus off western France. Scientia Marina, 67. Zipkin, E., Gardner, B., Gilbert, A., O’Connell, A., Royle, J., Silverman, E., 2010. Distribution patterns of wintering sea ducks in relation to the North Atlantic Oscillation and local environmental characteristics. Oecologia 163, 893–902. Zurell, D., Jeltsch, F., Dormann, C.F., Schröder, B., 2009. Static species distribution models in dynamically changing systems: how good can predictions really be? Ecography 32, 733–744.

Log In

Comparison of five modelling techniques to predict the spatial distribution and abundance of seabirds