Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Academia.eduAcademia.edu

Rodents in the arena: a critical evaluation of methods measuring personality traits

2018, Ethology Ecology & Evolution

AI-generated Abstract

The study evaluates methods of measuring personality traits in Eurasian red squirrels and Eastern grey squirrels through the Open Field Test (OFT) and Mirror Image Stimulation (MIS). It assesses the validity of three analytical approaches: Principal Component Analysis (PCA), Factor Analysis (FA), and an expert-based method (EB). Results indicate that all methods provide valid measurements of behavior, with high repeatabilities observed in certain traits. Shorter tests (OFT of 4 min and MIS of 3 min) are recommended to reduce stress on animals, maintain accuracy, and enhance field efficiency. Testing sequence impacts outcomes, suggesting a limit of two tests per individual per season.

INTRODUCTION

A growing number of studies on many animal taxa has shown that individual animals exhibit differences in behaviour that persist over time and across contexts (Sih et al. 2004;Réale et al. 2007). This phenomenon is known as animal personality, also temperament or coping style. Behaviours that are part of an individual's personality, called personality traits, show within-individual consistency (repeatability) (Réale et al. 2000;Dingemanse et al. 2002;Bell et al. 2009) and are, to some extent, heritable (Drent et al. 2003;Bester-Meredith & Marler 2007;Brown et al. 2007). Personality traits are often classified into five axes proposed in a review by Réale et al. (2007): (1) the shyboldness axis (response to risky situations), (2) exploration-avoidance (response to a new situation), (3) activity (general moving activity), (4) aggressiveness (tendency to respond with agonistic behaviours towards conspecifics), and (5) sociability (any nonagonistic responses to conspecifics).

Differences in personality among individuals have been observed throughout a variety of taxa (Sih et al. 2012): invertebrates (e.g. Sinn et al. 2006;Kortet & Hedrick 2007), fish (Budaev et al. 1999;Bierbach et al. 2015), reptiles (e.g. López et al. 2005;Cote & Clobert 2007), birds (e.g. Carere & van Oers 2004;Both et al. 2005;Dingemanse et al. 2012) and mammals (e.g. Svartberg et al. 2005;Dochtermann & Jenkins 2007). Personality can have consequences for space use (Wilson & McLaughlin 2007;Martin & Réale 2008), dispersal (Fraser et al. 2001;Dingemanse et al. 2003;Cote et al. 2010), invasiveness (Rehage & Sih 2004;Malange et al. 2016), mating and/or reproductive success (Réale et al. 2000;Both et al. 2005), parental care (Budaev et al. 1999;Both et al. 2005), survival (Boon et al. 2008;Haage et al. 2017), and thus an individual's fitness (Sinn et al. 2006;Réale et al. 2007;Smith & Blumstein 2008). Moreover, animal personality may also be relevant in applied wildlife conservation research since taking into consideration the individual behavioural profile may lead to a more effective management, conservation, and recovery of populations (Haage et al. 2017;Merrick & Koprowski 2017).

To advance the study of animal personality the methods used to measure an individual's personality are of overriding importance. In behavioural ecology, two of the most used tests for a direct measure of personality traits are the open field test (OFT) (Walsh & Cummins 1976) and mirror image stimulation test (MIS) (Svendsen & Armitage 1973). The first is applied to quantify activity, exploration and responses linked to stress in a novel environment; the second to assess aggressiveness and sociability towards conspecifics. Both tests have been used for different animal taxa (e.g. Armitage 1986;Dammhahn 2012;Bierbach et al. 2015;Haage et al. 2017) but only a few papers underline the importance of analysing the test validity (Martin & Réale 2008;Montiglio et al. 2010;Carter et al. 2013). Carter et al. (2013) reviewed the definitions and methods used in personality studies in behavioural ecology, underlining the risk of misclassifying traits without a strict and detailed test validation. It is essential that each personality trait can be operationally defined and measured by a set of correlated behaviours (Réale et al. 2007). Such a set of behaviours can be derived from statistical inference or from an expert-based approach; the latter is a classification of behaviours into groups, each of which reflects a personality trait, based on researchers' previous knowledge (experience and literature). Hence, to obtain reliable estimates of personality traits, not only the validity of the OFT and/or MIS should be explored, but also the different data analysis methods should be validated (Réale et al. 2007;Carter et al. 2013). The majority of studies on rodent personality have used Principal Component Analysis (PCA) to reduce the measured behavioural variables defined in ethograms to a smaller number of synthetic personality traits. Carter et al. (2013) suggested that a factor analytic approach (Factor Analysis, FA) may be more appropriate in behavioural ecology to both establish independent factors/axes of correlated personality traits and investigate how these orthogonal axes are related to the ecological factors investigated (see also Budaev 2010). Moreover, a factorial approach would also have the benefit of reducing the problems associated with so-called jingle-jangle fallacies (Carter et al. 2013). Nevertheless, none of the studies on wild rodents' personality carried out after 2013 used that approach (see Table 1). Other than behavioural data analysis, the arena test protocol itself might affect personality measures. The repetition of the arena test on the same individual can lead to a decrease in the intensity of activity/exploration behaviours over trails as found in many studies on birds and small mammals (Archer 1973;Dingemanse et al. 2002;Boon et al. 2007Boon et al. , 2008Martin & Réale 2008;Boyer et al. 2010;Montiglio et al. 2010;Taylor et al. 2012). Moreover, long tests may also result in habituation to the arena, hence test duration, other than number of test repetitions, is another important factor that can affect the measure of personality (Montiglio et al. 2010).

Table 1

Papers published since 1970 about free-ranging rodents and personality using arena test (OFT = Open Field Test; MIS = Mirror Image Stimulation test; AT = other types of Arena Test; STAT = statistical methods used for data analysis of personality traits). Ethogram for open field and mirror-image stimulation tests. Description of the single behaviours and indication of the expert-based grouping into categories that represent personality traits (see methods for definition expert-based method). Aggressiveness with squirrel identity as random intercept. Number of test repetition per animal, capture event (new capture or recapture)

In this study we explore the performance of direct methods to measure individual variation in personality traits in free-living rodents and compare the performance of both PCA and FA to classify behaviours into personality traits, that are then compared to our expert-based classification. We used data collected with the open field test (OFT) and mirror image stimulation test (MIS) on Eurasian red squirrels (Sciurus vulgaris) and Eastern grey squirrels (Sciurus carolinensis) to explore the following hypotheses.

(1) Our direct indices of traits are consistent over time (significant repeatability) for the individual squirrel, indicating they reflect its personality. (2) Different techniques for the analysis of the behaviours measured during arena tests (Principal Component Analysis, Factor Analysis, expert-based method) result in comparable discrimination of personality traits. (3) They produce consistent patterns of variation among individuals; in other words, the values of personality trait indices (scores for an individual squirrel from a given arena test using either PCA, FA or expert-based method) should be positively correlated. Finally, we discuss how factors related to novelty (first versus subsequent capture of an animal), repetition and duration of the arena test can affect its results in measuring personality.

MATERIALS AND METHODS

Study system

Squirrels show marked individual differences in behaviour and in personality that can affect important ecological interactions (e.g. host-parasite interactions, Boyer et al. 2010) and influence individual variation in fitness (e.g. Boon et al. 2008;Le Coeur et al. 2015). Arena tests have been commonly used to measure personality traits and study these relationships in squirrels (Table 1), but often without a critical consideration of test parameters (e.g. duration, number of repeated tests). Therefore they make good models for a personality method validation.

Red squirrels were studied in three study sites in the Stelvio National Park (Lombardy, Italy; for details see Salmaso et al. 2009;Rodrigues et al. 2010), and grey squirrels in two sites in the Poplain in Piedmont. Red squirrels were monitored during two capture sessions per site for at least 5 days, one in May-June 2016 and the other one in September-October 2016. Grey squirrels were monitored for at least 5 days over three capture sessions per site: December 2015-April 2016 in one site and November 2016-January 2017 in the other.

Trapping and handling squirrels

Single-capture traps (model 202, Tomahawk Live Trap Co., Tomahawk, WI, USA) were used and fixed on tree trunks or on the ground. Prebaiting started 1 week before each trapping session using some hazelnuts placed inside the blocked traps. Activated traps were checked 2 times per day according to the increased activity of the animals (during the morning and late afternoon) and to reduce the time squirrels were confined in traps. Each trapped squirrel was flushed into a cotton handling bag (zipper-tube, Wauters & Dhondt 1989), weighed to the nearest 5 g with a Pesola spring balance, identified to species, sex and reproductive status (Wauters et al. 2000)

Measuring personality

Once caught and handled, we put a marked squirrel inside the arena by opening a sliding door (28 × 15 cm, internal opening 12 × 12 cm) and allowing the animal to move from the handling bag into the arena. The arena is a white extruded polycarbonate box of 50 × 51 × 51 cm; the floor of the arena consists of a panel with four blind holes (7 cm diameter × 4 cm deep), that allow to differentiate between exploration and activity behaviours (hole board test, Martin & Réale 2008). The wall at the opposite site of the entrance has a sliding panel that can be removed to reveal a mirror (24 × 51 cm). In the lid of the arena (inside a 5 cm diameter hole) we fit a web camera (Drift, Professional HD Action Camera, model: FD9960, Ghost S) to record the animal's behaviour. To quantify individual personality, we performed two different experiments inside the arena: Open Field Test (OFT) to estimate activity and exploration levels in a novel environment (Walsh & Cummins 1976;Martin & Réale 2008) and Mirror Image Stimulation (MIS) to test aggressiveness, sociability or avoidance towards conspecifics (Svendsen & Armitage 1973). The Open Field Test, introduced during the first decades of 1900 (Hall & Ballechey 1932), is still one of the most widely used instruments to measure personality traits. The Mirror Image Stimulation test is a technique for studying aggressive and social patterns in a wide variety of animals which respond to their reflection in a mirror (Gallup 1968;Svendsen & Armitage 1973). The two tests were performed in the same testing session, with the OFT also serving as habituation time before the MIS. The arena was placed on the ground near the trap where the squirrel was caught and recording (OFT experiment) started before we released the animal inside the arena. After 6 min we opened the mirror and began MIS test for another 4 min. At the end of MIS the squirrel was released by opening the sliding door. After each experiment the arena was cleaned with 90% ethyl alcohol to eliminate urine and faecal pellets when present and to eliminate effects of squirrel's scent on behaviour of the next animal.

We performed arena tests for each individual only once per capture-session to reduce stress and habituation in animals (minimum time between tests for the same individual: 28 days); in addition to check the assumptions of repeatability of personality traits we repeated both experiments (OFT and MIS) in different capture-sessions to have at least two arena tests for most individuals.

We analysed digital videos of OFT and MIS with CowLog 3.0.2 software (Hänninen & Pastell 2009) and used ethograms similar to Boon et al. (2007) (Table 1); for each experiment, the software calculates the time that an individual spent in each behaviour.

Data analysis

We first transformed the time calculated by CowLog 3.0.2 in percentage of time spent by each squirrel in a given behavioural state. Since, in previous studies, PCA scores gave sometimes poor results (low % variance explained by first two or three PC components, e.g. Martin & Réale 2008;Boyer et al. 2010;Montiglio et al. 2012) we decided to test also an "expert-based" (EB) method to reduce number of variables and create behavioural groups of personality traits, relying on ethological knowledge (Wauters & Dhondt 1987, 1989, 1992Wauters et al. 2001) and video observations (see Table 2 for classifying behaviours from OFT and MIS into personality traits). With the EB approach the researcher defines groups of behaviours, with each group related to a specific personality trait, summing the single behaviours percentages and obtaining a general personality trait score. In contrast, with the PCA or FA the loadings for the different behaviours are calculated and then the new scores (along the first few principal components or factors) interpreted as measures of personality traits, leading to a changing in the loadings (and hence behaviour scores) every time new animals are added to the dataset.

Table 2

Aggressiveness was considered as the number of attacks towards the mirror during MIS. All data analysis was performed using the software R 3.4.2 (R Development Core Team 2016). To reduce the variables with a statistical method, we ran a Principal Component Analysis (PCA) and a Factor Analysis (FA) separately for OFT and MIS. Next, we compared the EB personality traits' values (square root transformed) with the PC and FA scores for each individual using Pearson correlation coefficient. Only the principal components and the factors explaining the greater part of the total variance were retained with the Kaiser-Guttman criterion (Martin & Réale 2008).

To assess individual consistency of behaviours in both experiments, we used a restricted sample of individuals (20 red squirrels, 38 grey squirrels) that were caught in at least two different trapping sessions. We estimated the repeatability, also called Intra-class Correlation Coefficient (ICC) with Linear Mixed Models (LMM, lme4 package); (Bates et al. 2014a(Bates et al. , 2014b. PC scores, FA scores or EB personality traits were the dependent variable, squirrel ID the random intercept term, to account for repeated tests on the same individuals, and sex, study site, and number of test (first, second or third test to the same animal) were included as fixed effects. For each final model, we estimated repeatability of personality traits, PC and FA scores as R = (V i × 100)/V i + V r ) where V i is the variance associated with the individual random effect, and V r is the residual variance of the model. Repeatability, in fact, represents the percentage of the total phenotypic variance explained by within-individual variance (Lessells & Boag 1987). Repeatability was estimated using the R package rptR v 0.9.2 (CI = 95%, number of parametric bootstraps for interval estimation = 5000, number of permutations used when calculating asymptotic P-values = 1000) (Bohn et al. 2017;Stoffel et al. 2017).

In a final step, we critically evaluated the effects of some methodological parameters (duration of the trial; animal tested at first capture or during a recapture; results of first against second trial) on personality-trait measurements.

To investigate the effect of the duration of both OFT and MIS tests on the estimates of personality traits, we used the behavioural coding over just 4 min for OFT and 3 min for MIS and calculated again, on this shorter experiment time, the proportion of time spent by each squirrel on a single behaviour. This allowed us to determine whether a shorter arena test would allow defining personality in our study species without losing information and thus without altering our results (see also Montiglio et al. 2010). Using the square-root transformed proportion time of the different behaviours, we ran again PCA, FA and EB groups (shyness, exploration and activity for OFT and sociability, avoidance, alert and other for MIS) for the shorter OFT and MIS. We used linear mixed models (LMM) with the short time personality-trait estimates as response variable, the long personality-trait estimates as explanatory variable and squirrel ID as random intercept to explore whether short-and long-time arena tests (OFT 4 min compared with OFT 6 min, and MIS 3 min compared with MIS 4 min) gave comparable results .

In a second step, since we will show that a shorter duration did not affect the results, we used the short time personality-trait estimates as response variable on a linear mixed model (LMM) , sex and body mass were included as fixed effects. Sex and body mass were added because of possible relationships of personality with social dominance (heavy squirrels tend to be older, more dominant animals) and sex-specific spacing patterns (Wauters & Dhondt 1989, 1992Koprowski 1993;Gurnell et al. 2001).

RESULTS

Personality of red squirrels

We performed 76 arena tests on 58 different red squirrels (43 males and 15 female); 20 animals were tested more than 1 time (41 arena tests). The proportion of time spent by red squirrels in different behaviours is reported in Table S1 (Supplemental material).

Comparison of analytical methods

For OFT, we retained the first three Principal Components (PCs) and two Factors of FA; for MIS the first three PCs and FA factors. The retained PCA components explained 78% and 66% of the total variance in OFT and MIS respectively, while the retained factors in FA explained 61% in OFT and 58% in MIS (Table 3). The first principal component and FA factor from the open field (OF1) had high loadings for behaviours related to exploration and activity. OF2 was characterised primarily by behaviours of locomotion and immobility ("hang" and "locomotion" in PC2 and "locomotion" and "immobile" in FA2). OF3 in PCA was composed of behaviours related to both activity ("locomotion" and "rise") and immobility ("hang") of the animal. In the mirror test, MIS1 in both PCA and FA had high loadings for "locomotion" and exploration behaviours, MIS2 for behaviours related to the sociability of the animal ("back", "front", "no aggressive"), while MIS3 in PCA was related to the behaviours "hang" and "watch" (Table 3).

Table 3

Principal component analysis and factor analysis for OFT (6 min) and MIS (4 min) of red squirrels. Bold type indicates behaviours that contributed importantly to a component.

There was a good correspondence between the grouping of behaviours from our EB method and that obtained with PCA or FA (Tables S2 and S3, Supplemental material). For OFT, the personality trait shyness was negatively correlated with PC1 (r = − 0.90, P < 0.0001), FA1 (r = − 0.70, P < 0.0001) and FA2 (r = − 0.73, P < 0.0001) scores. Both exploration and activity were positively related to PC1 (exploration: r = 0.82, P < 0.0001; activity: r = 0.86, P < 0.0001) and FA1 (exploration: r = 0.83, P < 0.0001; activity: r = 0.71, P < 0.0001). Activity was also positively correlated to FA2 (r = 0.70, P < 0.0001). Hence, the first PC and FA factor determined an activity-shyness continuum but they were also strongly correlated with the EB personality trait exploration. For MIS, PC2 scores were positively associated with the personality trait avoidance (r = 0.75, P < 0.0001) and negatively with sociability (r = − 0.90; P < 0.0001), and FA2 was negatively associated with avoidance (r = − 0.95, P < 0.0001) and positively with sociability (r = 0.92; P < 0.0001). In this case, the second factor and component determined the sociability axis (Table S3, Supplemental Material). We did not record any event of scratch during both experiments or events of attack towards the mirror during MIS.

In OFT, two Principal Components (PC1 and PC2) and factors (FA1 and FA2) were retained. In MIS, two principal components and three factors (Table 4). The retained components of the PCA explained 68% and 50% of total variance in OFT and MIS respectively. For Factor Analysis, the retained factors explained 61% of total variance in OFT and 54% in MIS (Table 4). The first two PCs and FA factors of OFT (OFT1-2) had high loadings for behaviours related to exploration and activity. MIS1 of both PCA and FA was characterised by behaviours related to activity and exploration, while MIS2 separated the animals that spent time in front of the arena from those that stayed in the back far from the mirror. MIS3 in FA had high loadings for the behaviours "hang" and "back" (Table 4).

Table 4

Principal component analysis and factor analysis for OFT (6 min) and MIS (4 min) of grey squirrels. Bold type indicates behaviours that contributed importantly to a component.

PC and FA scores correlated significantly with EB personality traits. For OFT, shyness was negatively correlated with PC1 (r = − 0.88, P < 0.0001) and FA1 (r = − 0.89, P < 0.0001) scores. Both the exploration and activity traits were positively related to PC1 (exploration: r = 0.81, P < 0.0001; activity: r = 0.80, P < 0.0001) and FA1 (exploration: r = 0.82, P < 0.0001; activity: r = 0.78, P < 0.0001). Hence, as for red squirrels, the first PC and factor determined an activity-shyness continuum but could not separate activity from exploration (Tables S2, Supplemental Material).

For MIS, PC1 scores were positively associated with the personality trait avoidance (r = 0.60, P < 0.0001) and negatively with sociability (r = − 0.56; P < 0.0001), and also PC2 was negatively associated with sociability (r = − 0.53, P < 0.0001). FA2 was negatively associated with avoidance (r = − 0.79, P < 0.0001) and positively with sociability (r = 0.97; P < 0.0001). Hence, the second factor and the first component determined the sociability axis (Table S3, Supplemental Material). The trait alert was slightly correlated to PC1, PC2, FA1 and FA3 (all P ≤ 0.03). We did not record any event of attack towards the mirror during MIS.

Repeatability

Using the restricted data (animals with more than one arena test, n = 20) we observed repeatability within individuals for most of the PCA components and FA factors and also for some EB personality traits. In OFT, both PC1 (R = 47%, LRT = 3.86, df = 1, P = 0.02) and PC2 scores (R = 50%, LRT = 4.59, df = 1, P = 0.01) were significantly consistent within individuals. FA1 had also a highly significant repeatability (R = 55%, LRT = 5.63, df = 1; P = 0.009). PC3 and the other two FA factors had poor and non-significant repeatability (Table S4, Supplemental Material). In MIS, only PC1 and FA1 were repeatable (PC1: R = 57%, LRT = 6.53, df = 1; P = 0.005; FA1: R = 46%, LRT = 3.65, df = 1; P = 0.03). For the personality traits, in OFT repeatability was 55% for activity (LRT = 5.75, df = 1; P = 0.008) and 48% in MIS for the group "other" (LRT = 4.02, df = 1; P = 0.02). All other behaviours had lower repeatability (Table S4, Supplemental Material).

Using the restricted data (animals with more than one arena test, n = 38) we observed repeatability within individuals for the first PCA component and FA factor and also for three EB personality traits. In OFT, PC1 (R = 42%, LRT = 6.47, df = 1, P = 0.005) and FA1 (R = 38%, LRT = 4.97, df = 1, P = 0.02) were significantly consistent within individuals. In MIS, only PC1 and FA1 were repeatable (PC1: R = 49%, LRT = 10.5, df = 1; P = 0.0005; FA1: R = 40%, LRT = 6.78, df = 1; P = 0.005).

EB personality traits in OFT had a repeatability of 45% for activity (LRT = 8.6, df = 1; P = 0.002). In MIS, it was 35% for sociability (LRT = 4.44, df = 1; P = 0.02) and 49% for the group "other" (LRT = 10.3, df = 1; P = 0.0007). All other behaviours had non-significant repeatability (Table S4, Supplemental Material).

Personality of grey squirrels

For grey squirrels, we performed 128 arena tests on 83 individuals (36 males and 47 females); 38 grey squirrels were tested more than 1 time (79 arena tests). The proportion of time spent by grey squirrels in different behaviours is reported in Table S1 (Supplemental Material).

Methodological factors affecting personality measures using arena tests

Comparing the personality indices derived from shorter-time (OFT 4 min, MIS 3 min) and longer-time arena tests (OFT 6 min, MIS 4 min), the LMM models for both tests (OFT and MIS) were statistically significant for all the personality traits, PC and FA scores (Table S5, Supplemental Material). All slope estimates are highly significant and close to 1.0 showing that reducing time of the arena test did not alter proportion of time spent in different behaviours (or values of PCA/FA scores). In other words, the shorter and longer arena tests resulted in nearly identical measures of an individual's personality traits.

Table 5

There was a significant effect of the number of test repetition per animal. Considering only the consistent (hence repeatable) expert-based traits for each species (activity for red and grey squirrels in OFT and sociability in MIS for grey squirrels) we found that squirrels of both species were less active in the second and third trial than in the first (Table S6, Supplemental Material). Sociability scores of grey squirrels did not differ significantly among subsequent tests. Moreover, in both species, there were no differences in activity or sociability measured in the arena test when an individual was trapped for the first time compared to arena tests carried out during recaptures. There was no significant relationship of a squirrel's sex or body mass with the measured personality traits (Table S6, Supplemental Material).

Table 6

DISCUSSION

In this study we investigated whether individual variation in behaviour of Eurasian red squirrels and Eastern grey squirrels recorded during the Open Field Test (OFT) and the Mirror Image Stimulation test (MIS) could be used to determine their personality. We compared the performance of commonly used Principal Component Analysis and Factor Analysis to measure personality traits and then compared both with an EB classification of behaviours into groups reflecting personality traits. In red squirrels, the behaviour in OFT yielded one multivariate variable representing activity and exploration and one representing shyness, while the behaviour during MIS yielded one component reflecting activity and/or exploration and a second reflecting sociability. PCA and FA produced comparable results. In grey squirrels, OFT yielded the same pattern as in red squirrels. However, MIS produced a better discrimination of behaviours when FA was used than with PCA: only FA yielded two clearly separated multivariate factors, the first representing activity and exploration the second the sociabilityavoidance axis. For red squirrels, using OFT, the personality traits activity and exploration showed moderate to high repeatability with all three methods (PCA, FA and EB), while shyness only with PCA and EB approach. In this species sociability measured during MIS had low repeatability. For grey squirrels, the analytical methods PCA and FA had high repeatabilities for the multivariate component/factor representing activity and exploration during both OFT and MIS. During OFT the EB approach performed less well (high repeatability for activity but low for exploration) while during MIS moderate to high repeatabilities were reported for activity-exploration and sociability with EB and FA. Thus, overall our EB classification of personality traits produced similar results as the two analytical methods. There were some differences in performance between PCA and FA, with FA slightly better in discriminating personality traits.

However, our analyses also revealed limitations of using arena tests for the two squirrel species: repeatability of measures, an essential characteristic for a behaviour to be considered a personality trait (Bell et al. 2009;Carter et al. 2013) was moderate-high for activity/exploration in both species and for being more or less social in grey squirrels only. Also, PCA and FA did not distinguish between activity related behaviours and exploration behaviours, grouping them in the same component or factor.

Use and duration of arena tests

In recent years, personality has begun to receive theoretical and empirical attention from ecologists and both OFT and MIS are generally considered reliable techniques to measure personality traits consistent over time and under different environmental conditions (Dingemanse et al. 2002;Bierbach et al. 2015;Haage et al. 2017).

A review on previous studies of free-ranging rodent personality showed that arena test has been used to test the relationship between personality and different ecological parameters, to study personality itself or its heritability and ontogeny. Remarkably, there is no agreement among studies on a standard duration of the tests. For OFT the duration of the test was between 1.5 and 10 min, while for MIS between 5 and 15 min ( Table 1). None of the papers reported why they used a specific duration (but see Montiglio et al. 2010) and sometimes, even considering the same species, different authors used different durations. However, test duration can affect the results (hence reliability): too short tests may cause loss of information (e.g. not register infrequent but potentially important behaviours), while longer tests may stress the animal or result in habituation to the arena. In fact, if individuals express different temporal patterns of behaviours within a single OFT (or MIS), then the reliability of cumulative proportions of a behaviour type might be affected by the duration of the tests (see fig. 1 in Montiglio et al. 2010).

Figure 1

Here, we showed that reducing the duration of OFT and MIS (of 6 + 4 min to 4 + 3 min) did not change an individual squirrel's personality-trait measures. This is important because shorter arena tests are likely to reduce physiological stress in animals (Dosmann et al. 2015;Dantzer et al. 2016) and the risk of habituation to the experimental conditions (Montiglio et al. 2010).

Arena test reliability: comparing statistical analyses and expert-based methods

Our EB method that classified behaviours relying on ethological knowledge of the species and operator experience, yielded personality traits that correlated strongly with the scores derived from the retained components and factors of traditional PCA and FA. One advantage of using EB approach is that the values (sum of percentages of the behaviours grouped together in a trait) remain fixed independent of data management operations (adding new data, using restricted data sets). With PCA and FA in contrast, loadings and hence the derived scores along one of the components/factors for an individual animal will change every time a new data set is analysed (or new animals are added/removed). Moreover, reviewing previous studies on rodents' personality, we found that the first two-three components of PCA not always explained a wide proportion of the total variance (e.g Martin & Réale 2008;Montiglio et al. 2012;Merrick & Koprowski 2017). Based on changes in loadings estimated with PCA, different behaviours can be associated to the same personality trait in different studies. For example, for the American red squirrel Tamiasciurus hudsonicus, Boon et al. (2007) identified "activity" as a group of locomotion + sniff + dip + rise (variables names have been changed accordingly to our classification names to make the reading easier but the behaviour described is the same) while Taylor et al. (2012) as locomotion + dip and Merrick and Koprowski (2017) as locomotion + scratch/chew. These different interpretations, mainly due to analytical reasons inherent to PCA, do not have a clear ecological meaning and, in our opinion, such results are difficult to compare or even to allow developing a general theory of effects of personality on a species' life-history. The expert-based method we proposed avoids these problems, but might introduce others related to different interpretation of certain behaviours by different researchers. For example, Haigh et al. (2017) also used such an approach of grouping behaviours in general categories in their study on variation in red squirrel behaviour under different densities. However, in contrast with ours and previous studies on sciurids, they considered slow approach and touching the mirror as aggressive behaviours together with actual attacking it ("bang" see Table 1 in Haigh et al. 2017), while we considered the former two behaviours as part of social behaviour ("sociability"). Observing red squirrels in the wild indicates that aggressive interactions occur when one animal moves quickly in on the other and chases it in tree canopy, along the trunk or on the ground, while a slow approach is very rarely followed by an aggressive interaction (Wauters & Dhondt 1987, 1989Wauters & Gurnell 1999).

MIS and problems with measuring reaction to a conspecific

Surprisingly, we did not record any attack on the mirror in either red or grey squirrels. This might be related to the type of test (MIS) where the animal's reactions control the image. In other words, the reflected animal is as active as the interacting animal in the arena and the image can never present a submissive or aggressive gesture unless it is initiated by the latter (Svendsen & Armitage 1973). This could explain why in our case we had no cases of aggressive interaction with the mirror from both species (but see Haigh et al. 2017 for red squirrels). Aggressiveness to the mirror is common in territorial North American red squirrels (Boon et al. 2008;Shonfield et al. 2012;Kelley et al. 2015). In contrast, grey squirrels form female-kin groups, with a dominance hierarchy among resident adults, related to environmental cues (important resources inside the animal's home range, Koprowski 1993). Without these cues, MIS succeeded to measure an individual's tendency to behave sociable or avoid a conspecific, but not to record aggressiveness. Eurasian red squirrels have overlapping home ranges, but adult females defend exclusive core-areas against other females (intra-sexual territoriality, Wauters & Dhondt 1992). Hence we expected some degree of aggressiveness at least in adult females and dominant males (Wauters & Dhondt 1989). Possibly, aggressive behaviour in the wild may be a conditional behaviour related to the intensity of intruder pressure and thus to density of squirrels of the same sex. Densities in populations where aggressive interactions were documented (Belgium) were higher than in the Italian sites where we did arena tests (Belgium: 0.8-1.5 ind/ha; Italy: 0.1-0.6 ind/ha; Wauters et al. 2004Wauters et al. , 2008Rodrigues et al. 2010). Lack of an aggressive response to MIS was also documented in fox squirrels (Sciurus niger) and cliff chipmunks (Tamias dorsalis), which have similar social organisation as Eurasian red squirrels or engage in social nesting (Kilanowski 2015;Krause et al. 2015).

Repeatability: being critical about your results

For behaviours to be considered part of an individual's personality, they should be consistent over time and under different contexts (Bell et al. 2009). However, it must be underlined that recent studies have shown that repeatability of personality traits, in particular exploration speed and/or tendency, can vary over time (with age) and that the behaviour may increase with test sequence (Carere et al. 2005;Dingemanse et al. 2012;Kanda et al. 2012). Such variation in behaviour may reflect a true flexible component of personality, but it may also be the result of a changing response to the artificial conditions of the arena test. In our study, both squirrel species showed a consistent activity-exploration pattern in OFT, but only grey squirrels showed a repeatable sociability trait in MIS, probably related to the above mentioned social organisation of this species. The other EB traits had low or at best moderate repeatability. A non-significant repeatability was reported also for the PC scores in other studies. In Boon et al. (2007) the PC2 in MIS was associated to avoidance and in Kilanowski (2015) PC1 and PC2 were associated to sociality and image engagement but they were not repeatable. The lack of repeatability precludes the possibility to measure relationships between personality and ecological or fitness parameters. In fact, lack of repeatability (lack of consistency over time), suggests the measured traits were not part of the individual's personality but were rather flexible phenotypic traits (Boyer et al. 2010), or that their validity to measure a personality trait was low (Carter et al. 2013).

Other parameters to consider

Previous studies have shown that the number of times arena tests are repeated might affect personality measures (e.g. Dingemanse et al. 2012 for birds). We found that individual red and grey squirrels tended to be less active in subsequent tests than in the first one, suggesting some degree of habituation to the arena reducing the activity during OFT. Such a decrease in the intensity of activity/exploration behaviours over trails was found in many studies on birds and small mammals (Archer 1973;Dingemanse et al. 2002;Boon et al. 2007Boon et al. , 2008Martin & Réale 2008;Boyer et al. 2010;Montiglio et al. 2010;Taylor et al. 2012). Carter et al. (2013) suggested that the context of the open-field test (free vs forced test context) may create bias in behaviours: free open-field tests are more likely to measure voluntary exploration/curiosity and information gathering behaviour, while forced open-field tests might also be measuring fear and/or anxiety (Misslin & Cigrang 1986). In this study, we did not record any difference in personality between squirrels trapped for the first time or recaptured. Animals trapped for the first time live a stressing situation never experienced before but this did not affect their activity or sociability, indicating that the tests used measured their personality and not their fearfulness. To further exclude that OFT could be measuring a mixture of different personality traits simultaneously, we will compare direct arena test measures of personality traits with indirect ones derived from standardised capture-mark-recapture data (convergent validity) in future studies (see also Boon et al. 2008;Boyer et al. 2010;Carter et al. 2013).

CONCLUSIONS

The arena test allows researchers to study animal personality in a controlled environment, determining personality traits that are repeatable within, and comparable among individuals. We suggest that the use of arena tests should be preceded by studying the behaviour of the animals in wild, to facilitate defining ethograms and the classification of single behaviours in groups linked to personality traits. Using different approaches to measure a single personality trait (EB and FA) will be useful to evaluate whether an EB distinction is also supported by the statistical approach. For rodent personality research, we recommend to start testing a small group of animals for a longer duration (not more than 7 min/test) and next evaluate which shorter duration is valid to measure interindividual differences in personality without losing information. Afterwards, animals can be tested only for the short time, reducing stress and habituation as well as operator time. The number and frequency of test replicates will depend on different parameters: the research questions, the kind of repeatability that is in question (long-term, short-term), and/or the species' longevity. For tree squirrels, considering within-year repeatability, we suggest to test an individual no more than 2 times per season (e.g. breeding season) or year and to allow for at least 2 months between repetitions. Where personality is expected to have both a fixed and a variable component (Carere et al. 2005;Dingemanse et al. 2012), personality traits should be examined each year to study potential variation linked with, for example, age, dominance status or breeding experience.