A graphical exploration of the relationship between parasite aggregation indices
Abstract.
The level of aggregation in parasite populations is frequently incorporated into ecological studies. It is measured in various ways including variance-to-mean ratio, mean crowding, the parameter of the negative binomial distribution and indices based on the Lorenz curve such as the Gini index (Poulinβs D) and the Hoover index. Assuming the frequency distributions follow a negative binomial, we use contour plots to clarify the relationships between aggregation indices, mean abundance and prevalence. The contour plots highlight the nonlinear nature of the relationships between these measures and suggest that correlations are not a suitable summary of these relationships.
Key words and phrases:
Aggregation, Gini index, Hoover index, Lorenz order, Negative binomial distribution, Prevalence.1. Introduction
Investigations into parasite population dynamics frequently require an indicator of the level of aggregation in the parasite population (Tinsley et al, 2020, Kura et al, 2022). As the concept of aggregation in parasites is poorly defined (Pielou, 1977, McVinish and Lester, 2020), aggregation has been measured in various ways. Commonly used indices include prevalence, the Variance-to-Mean Ratio (VMR), and the parameter of the negative binomial distribution. Closely related to VMR and are mean crowding and patchiness (Lloyd, 1967) which can be seen as more direct measures of the competitive experience of parasites within a host (Wade et al, 2018). Two other indices are derived from the Lorenz curve (Lorenz, 1905), the most widely accepted quantification of inequality. Poulin (1993) proposed using the Gini index (Gini, 1914), which has since become widely used in parasitology (RodrΓguez-HernΓ‘ndez et al, 2021, Bezerra and Bocchiglieri, 2023, Matos et al, 2023). The Hoover index (aka Pietra index) has more recently been proposed to measure parasite aggregation (McVinish and Lester, 2020, Lester and Blomberg, 2021).
This paper clarifies and extends our previous work on aggregation. It was stimulated by a recent paper by Morrill et al (2023) which correlated aggregation indices with mean abundance and prevalence using simulated data. We present a more accurate representation using βcontour plotsβ, calculated directly from the parameters of the negative binomial distributions. The plots provide a simple and more insightful way to comprehend the relationships.
2. Contour plots
The contours show combinations of two indices, specified on the vertical and horizontal axes, that give rise to similar values of the third index. Contour plots, developed in the 16th century (Morato-Moreno, 2017), are widely used in other disciplines but rarely in parasitology (e.g. Kura et al (2022)).
Our analysis assumed that parasite burden is adequately modelled by a negative binomial distribution (Crofton, 1971, Shaw et al, 1998, Poulin, 2011, Morrill et al, 2023). Following the typical practice in parasitology, we parameterised the negative binomial distribution in terms of mean abundance, , and the parameter which controls the shape of the distribution. We did not make any assumption on the distribution of and . We used the range of values for and suggested by the extensive data of Shaw and Dobson (1995). Their values for , and prevalence are superimposed on several of the contour plots as dot points.
To construct a contour plot of an aggregation index against and , we expressed the aggregation index as a function of and . The population values of several indices can be expressed simply in terms of and :
, , and . The Gini and Hoover indices lack simple expressions in terms of and however, they can still be evaluated numerically. The Hoover index can be expressed in terms of and by applying Arnold and Sarabia (2018, Lemma 5.3.3),
where is the cumulative distribution function of the negative binomial distribution with and mean evaluated at . Further details are given in the Appendix. The cumulative distribution function of the negative binomial distribution, , is available in statistical packages such as R (R Core Team, 2023). The Gini index can be expressed as
where is the Gaussian hypergeometric function (Ramasubban, 1958, equation 2.12). This can be evaluated in R using the hypergeo package (Hankin, 2015). Calculating indices directly from the parameters of the negative binomial distribution rather than using simulated data obviates the need to consider the uncertainty of estimates and the effects of different sample sizes.
We also employed contour plots to examine the relationship between aggregation indices, , and prevalence. This required first solving the equation
in terms of for each pair of and prevalence in the contour plot. This equation has a unique solution if . On the other hand, if , there is no solution to the equation. The solution was found numerically using the uniroot function in R. The expressions for the aggregation indices in terms of and is then used to construct the contour plot. Regions of and prevalence that are inconsistent with a negative binomial distribution are represented as white in the contour plot.
All contour plots were produced in R using the ggplot2 package (Wickham, 2016). The values for and reported in Shaw and Dobson (1995) were heavily skewed and spanned several orders of magnitude with ranging between 0.1 and 5200 and ranging between 0.001 and 16.5. To make the plots clearer, log scaling has been applied to these variables.
3. Relationship between mean abundance, , and prevalence
The relationship between , , and prevalence in wild parasite populations has been examined by several authors with conflicting results (Pennycuick, 1971, Scott, 1987, Poulin, 1993, Shaw and Dobson, 1995, Kura et al, 2022). While the expression of prevalence in terms of and is sufficiently simple to analyse, it is still instructive to construct the contour plot (Fig. 1 left). In it, each colour represents a region of values of and that give rise to similar values of prevalence.
We see that prevalence is increasing in both and leading to contours that are roughly L-shaped on the range of and plotted so prevalence is small when either or are small, and prevalence is large when both and are large. The contours also show that there is a non-linear relationship between and when prevalence is considered fixed. The contours become almost parallel to the horizontal axis as increases, a consequence of . On the other hand, the contours continue to move left as increases, a consequence of . The contour plot shows that the rate at which prevalence approaches one as increases is slow when is small.
If we restrict our attention to a single-coloured band, i.e. those values of and giving rise to similar values of prevalence, we see that, after controlling for prevalence, there is a negative relationship between and . This relationship is forced by the negative binomial distribution, so it will hold true in natural systems to the extent that those systems are well modelled by the negative binomial distribution. The different widths of the contour lines show the non-linearity of the relationship between , and prevalence.
The dot points represent estimates of and from the 269 parasite-host systems reported in Shaw and Dobson (1995). Although several parasite-host systems lie in a region of very high prevalence (both and large) or very small prevalence (either or small), many others occupy a region of the parameter space where a moderate change in the parameter values would result in a significant change in prevalence assuming a negative binomial distribution.
As Shaw & Dobson reported prevalences in their review, it is possible to compare these with the prevalence values implied by the negative binomial distribution (Fig. 1 right). In general, there is good agreement; most points within a given contour having the same colour. This demonstrates the accuracy of the contour plots to interpret relationships in real life situations. The few points where the observed prevalences donβt agree with that determined by the negative binomial could be because these distributions did not conform to a negative binomial.
4. Relationship of Hoover & Gini indices with mean abundance, , and prevalence
Contour plots of the Hoover index and Gini index as functions of and are shown in Fig 2 left and right. The contour plots are qualitatively very similar and share some similarities with the contour plot of prevalence (Fig. 1). Both Hoover and Gini indices decrease in both and , taking values close to one when either or were small, and taking values close to zero when both and were large. The contours are L-shaped becoming almost parallel to the horizontal axis as increases and almost parallel to the vertical axis as increases.
The plots show both indices display some stability over a wide range of and . Restricting our attention to the Hoover index (Figure 2 left), we see that for the value of the index is largely determined by the size of . For the value is less affected by but more affected by , as indicated by the number of contours crossed as decreases. For example, starting from and , as decreases the value of the index increases quickly crossing several contours from 0.4 to 1. On the other hand, when increases from the same point (1,6) the index stays in the same colour band and there is little change in the Hoover value (0.4 to 0.5). For many of the parasite-host systems reported in Shaw and Dobson (1995), shown on the figure as dot points, an increase in , that is moving the points vertically on the contour plot, does not appear to impact the Hoover index since the point would remain in the same-coloured region. On the other hand, in many of the samples, a moderate change in , that is moving the point horizontally, has a large impact on the Hoover index. Similar behaviour is observed in the contour plot of the Gini index (Fig. 2 right), with the Gini index appearing to be even less affected by changes in .
There are two noticeable differences between the contour plots for the Hoover and Gini indices (Fig. 2). Firstly, the Gini index is always larger than the Hoover index (Taguchi, 1968) (Arnold and Sarabia, 2018, Section 5.7). This causes the Gini index to have a smaller range over the region of values for and observed in wild populations. Specifically, for the values of and reported in Shaw and Dobson (1995), the Gini index exceeds 0.9 in 42% (113/269) of cases compared to 20% (54/269) of cases exceeding 0.9 Hoover index. Second, the contours of the Hoover index are not smooth, unlike those of the Gini index. The bumps that occur on the contours of the Hoover index occur at integer values of the mean, the most prominent occurring when the mean is 1. These bumps quickly become much less noticeable as the mean increases.
The contour plots of the Gini and Hoover indices exhibit greater differences when considered as functions of and prevalence (Fig. 3). First, unlike the Gini index, the contour lines of the Hoover index are parallel to the vertical axis when is less than one. As noted by Morrill et al (2023), when all infected hosts harbour infrapopulations larger than or equal to the overall mean, the Hoover index is equal to one minus prevalence. For the negative binomial distribution, this implies the Hoover index is equal to one minus prevalence when the mean is less than or equal to one. Second, there is less variability in the widths of the contours for the Hoover index compared to the Gini index. This suggests the dependence of the Hoover index on prevalence is more regular. At a given , a change of 0.1 in the prevalence will have roughly the same effect on the value of the Hoover index, regardless of the initial value of prevalence. In contrast, much of the contour plot of the Gini index is coloured yellow, corresponding to values greater than 0.9. Values of the Gini index less than 0.6 are restricted to small region of the plot, indicating that small changes in prevalence in that region will result in a large change in the Gini index.
The parasite data from Shaw and Dobson were taken from five taxonomic groups. The data, divided into taxa, were superimposed on the plots of vs with contour lines of prevalence and Hoover index. They did not show any obvious grouping.
5. Lorenz order and the negative binomial distribution
Both the Hoover and Gini indices are seen in Figure 2 to be decreasing functions of and , as is 1 - prevalence (Figure 1). This behaviour is due to how these indices relate to the Lorenz curve and how the parameters and affect the Lorenz curve of the negative binomial distribution.
The Lorenz curve of a distribution with cumulative distribution function is given by
where is the mean of the distribution and for (Gastwirth, 1971). In our context, the Lorenz curve describes the proportion of the host population that is infected with a proportion of the parasite population. When all hosts have the same parasite burden, the Lorenz curve is given by for all in [0,1]. This is called the egalitarian line. Several indices can be defined in terms of the Lorenz curve. Specifically, the Gini index is twice the area between the Lorenz curve and the egalitarian line, and the Hoover index is the greatest vertical distance between the Lorenz curve and the egalitarian line. Even can be viewed as the largest value of such that . The Lorenz curve induces a partial ordering of distributions. Assume and are two distribution functions with finite means. If the Lorenz curve of is greater than the Lorenz curve of for all , then we say that is smaller than in the Lorenz order and write . This ordering corresponds to the notion of aggregation put forward by Poulin (1993), McVinish and Lester (2020). From their connections with the Lorenz curve, we see that if , then the Gini and Hoover indices as well as will be smaller for than for .
The following result shows that the negative binomial distribution decreases in the Lorenz order as increases and as increases.
Theorem 1.
Let denote the negative binomial distribution with parameters and . If , then
If , then
The proof is provided in the Appendix.
The above result explains why Gini and Hoover indices and 1 β prevalence are all decreasing functions of and . Figure 2 also shows that that the contours of both the Gini and Hoover indices become parallel with the axes. This is due to the limiting behaviour of the negative binomial distribution. Depending on how the parameters are allowed to vary, it is known that the negative binomial distribution will converge to either a Poisson distribution or a gamma distribution (Adell and De La Cal, 1994). Fixing and letting increase, the negative binomial distribution converges to a Poisson distribution with mean . This causes the contour lines to become parallel with the horizontal axis as increases. Similarly, fixing and letting increase, an appropriately scaled negative binomial distribution converges to a Gamma distribution with shape and rate parameters both equal to . Since the Gini and Hoover indices are scale invariant (Arnold and Sarabia, 2018, Section 3.1), these indices approach their respective values for a Gamma (, ) distribution as increases. This causes the contour lines to become parallel with the vertical axis as increases.
6. Discussion
In choosing the index to use to measure aggregation, those based on Lorenz curves seem to be the favoured, such as the Hoover and Gini. The Gini returns closer values over a wider range of means, and prevalence compared to the Hoover, making differences less discernible. The Hoover has a biological interpretation and may be easier to calculate. When mean abundances are below one, the Hoover index has restricted values whereas the Gini has no such restriction, suggesting that Gini may be preferred in such a situation. Nevertheless, both indices provide a figure that seems to measure the same phenomenon, a phenomenon that is still undefined.
The contour graphs provide an easily interpreted demonstration of the effects of the various parameters on the Hoover and Gini indices. These could be deduced by an analysis of the formulae used to calculate the indices but this is not straightforward; indices do not correlate with a particular parameter. When applying an index to compare aggregation between samples or species, it is useful to know which parameter is having the greatest effect on the index. The contour graphs provide the answer.
In producing the graphs we calculated indices directly from the parameters of the negative binomial distribution rather than using simulated data as done by Morrill et al (2023). This obviated the need to consider the uncertainty of estimates and the effects of different sample sizes. Our results demonstrated the deterministic functional relationships between the aggregation indices, and the parameters, mean abundance and prevalence. The relationships were not linear indicating that correlation and principal components analysis may not be the best methods to analyse the relationships (Morrill et al, 2023).
Listing the advantages and disadvantages of Hoover and Gini indices, Morrill et al (2023, Table 2) describe them as having the disadvantages of being βstrongly negatively correlated with prevalenceβ and βweakly negatively correlated with mean abundance.β In contrast, the parameter of the negative binomial distribution and patchiness are described as having the advantages of being βnot necessarily correlated with mean abundanceβ and βonly weakly correlated with prevalence.β These comments ignore the fact that the negative binomial distribution, and hence any index computed on that distribution, is completely specified by the mean and prevalence. In other words, the dependence of any index on mean and prevalence is perfectly deterministic. In fact, the dependence on any pair of quantities that can be used to parameterise the negative binomial distribution, like and is perfectly deterministic.
Morrill et al (2023) argue that the Gini index is to be preferred over the Hoover index on the basis that Hoover index equals one minus prevalence when the mean is less than or equal to one whereas the Gini index has no such restriction. To decide between the Hoover and Gini indices, if one must choose, then the relationship between these indices and , and prevalence need to be considered more closely. Our contour plots (Fig. 2 & 3) have shown other differences in the behaviour of the Gini and Hoover indices. Compared to the Gini index, the Hoover index has a greater range over the region of values for and (or prevalence) observed in wild populations and has more regular dependence on prevalence. Given these properties and the Hoover indexβs clear biological interpretation, we argue that the Hoover index should be preferred over the Gini index, at least when is greater than one.
Our analysis has used contour plots to examine how the Gini and Hoover indices are affected by changes in , , and prevalence. This approach could, in principle, be applied to construct contour plots from any three indices, provided two of these can be used to parameterise the negative binomial distribution. For example, one could construct a contour plot of the Gini index as a function of VMR and mean crowding as both and can be expressed in terms of VMR and mean crowding:
and
The contour plot could then be constructed using the expression for the Gini index in terms of and given in Section 2. Further application of contour plots may unravel other complex relationships in ecological parasitology.
Appendix A Hoover index of the negative binomial distribution
In parasitology the negative binomial distribution is usually parameterised in terms of the the mean and . The probability mass function is then
and we write . Let denote the cumulative distribution function of the distribution. The first moment distribution of the distribution, , is
For any non-negative integer
which is the probability mass function of the distribution evaluated at . Hence,
Arnold and Sarabia (2018, Lemma 5.3.3) states that the Hoover index can be expressed as
Hence,
Appendix B Proof of Theorem 1
We first recall the definition of convex order, which is closely related to the Lorenz order (Shaked and Shanthikumar, 2007, subsection 3.A.1).
Definition: For random variables and such that for all convex functions for which the expectations exist. Then we say that is smaller than in the convex order, denoted
The convex order relates to the Lorenz order in the sense that
if and only if , provided the expectations exist (Shaked and Shanthikumar, 2007, equation 3.A.33) or (Arnold and Sarabia, 2018, Corollary 3.2.1).
Proof of Theorem 1.
For part (a), let . Conditional on , let . Then . As and , (Arnold and Sarabia, 2018, Theorem 3.4) implies . Since the Lorenz order is invariant under a change of scale, .
For part (b), standard conditioning arguments show that if is a standard Poisson process and (Gamma distribution with shape parameter and rate parameter ), then . Let .
It is known that for every convex function , is a convex it (Schweder, 1982, Proposition 2). If we can show that , then the result will follow from Shaked and Shanthikumar (2007, Theorem 3.A.21).
By construction . Let be the probability density function of . Then if exhibits exactly two sign changes in the sequence +, -, + (Shaked and Shanthikumar, 2007, Theorem 3.A.44). As the function is increasing, has the same sequence of sign changes as . Then
where is depends on and but not . There must be at least one sign change since both and integrate to one. For this function is concave so there must be two sign changes. As this function is positive for and when we have shown . This completes the proof. β
References
- Adell and De La Cal (1994) J.A. Adell and J. De La Cal. Approximating gamma distributions by normalized negative binomial distributions. Journal of Applied Probability, 31:391-400, 1994.
- Arnold and Sarabia (2018) B.C. Arnold and J.M. Sarabia. Majorization and the Lorenz order with applications in applied mathematics and economics. Springer, New York, 2018.
- Bezerra and Bocchiglieri (2023) R.H.S. Bezerra and A. Bocchiglieri. Ectoparasitic flies of bats (Mammalia: Chiroptera) in urban green areas of northeastern Brazil. Parasitol. Res., 122:117β126, 2023.
- Crofton (1971) H.D. Crofton. A quantitative approach to parasitism. Parasitology, 62:179β193, 1971.
- Gastwirth (1971) J.L. Gastwirth. A general definition of the Lorenz curve. Econometrica, 39:1037β1039, 1971.
- Gini (1914) C. Gini Sulla misura della concentrazione e della variabilita dei caratteri. Atti del R. Instituto Veneto di Scienze, Lettere ed Arti, 73:1203β1248, 1914.
- Hankin (2015) R.K.S. Hankin. Numerical Evaluation of the Gauss Hypergeometric Function with the hypergeo Package. The R Journal, 7:81-88, 2015.
- Kura et al (2022) K. Kura, J.E. Truscott, B.S. Collyera, A. Phillips, A. Garbae and R.M Anderson. The observed relationship between the degree of parasite aggregation and the prevalence of infection within human host populations for soil-transmitted helminth and schistosome infections. Trans. R. Soc. Trop. Med. Hyg., 116:1226β1229, 2022.
- Lester and Blomberg (2021) R.J.G. Lester and S.P. Blomberg. Three methods to measure parasite aggregation using examples from Australian fish parasites. Methods Ecol. Evol., 12:1999β2007, 2021.
- Lloyd (1967) M. Lloyd, Mean crowding. J. Anim. Ecol., 36:1β30, (1967).
- Lorenz (1905) M.O. Lorenz, Methods of measuring the concentration of wealth. Publication of the American Statistical Association, 9:209β219, 1905.
- Matos et al (2023) I. Matos, D. Silva, J. Oliveira, C. Gonçalves, R. Alves, N. Pereira, F. Catarino, O.M.C.C. Ameixa, J.A. Sousa, L.F. Rangel, M.J. Santos and C. Ayra-Pardo. Body size-dependent effects on the distribution patterns of phoretic mite species assemblages on Rhynchophorus ferrugineus (Olivier, 1790). Ecology and Evolution, 13:e10338, 2023.
- McVinish and Lester (2020) R. McVinish and R.J.G Lester. Measuring aggregation in parasite populations. J. R. Soc. Interface, 17:20190886, 2020.
- Morato-Moreno (2017) Morato-Moreno, Manuel. Origins of the two-dimensional relief representation on some spanish american maps in the sixteenth century. BoletΓn de la AsociaciΓ³n de GeΓ³grafos EspaΓ±oles, 73:493-499, 2017.
- Morrill et al (2023) A. Morrill, R. Poulin and M.R. Forbes. Interrelationships and properties of parasite aggregation measures: a userβs guide. Int. J. Parasitol., 53:763-776, 2023.
- Pennycuick (1971) P. Pennycuick. Frequency distributions of parasites in a population of three-spined sticklebacks, Gasterosteus aculeatus L., with particular reference to the negative binomial distribution. Parasitology, 63:389-406, 1971.
- Pielou (1977) E.C. Pielou. The measurement of aggregation. In: Mathematical Ecology. Wiley Interscience, New York, 1977.
- Poulin (1993) R. Poulin. The disparity between observed and uniform distributions: A new look at parasite aggregation. Int. J. Parasitol., 23:937β944, 1993.
- Poulin (2011) R. Poulin. Evolutionary Ecology of Parasites: (Second Edition). Princeton University Press, 2011.
- R Core Team (2023) R Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, 2023.
- Ramasubban (1958) T.A. Ramasubban. The mean difference and the mean deviation of some discontinuous distributions. Biometrika, 45:549-556, 1958.
- RodrΓguez-HernΓ‘ndez et al (2021) K. RodrΓguez-HernΓ‘ndez, P. Γlvarez-MendizΓ‘bal, P. Chapa-Vargas, F. Escobar, F. GonzΓ‘lez-GarcΓa and D. Santiago-Alarcon. Haemosporidian prevalence, parasitaemia and aggregation in relation to avian assemblage life history traits at different elevations. Int. J. Parasitol., 51:365-378, 2021.
- Schweder (1982) T.Β Schweder. On the dispersion of mixtures. Scandinavian Journal of Statistics, 9:165β169, 1982.
- Scott (1987) M.E. Scott. Temporal changes in aggregation: a laboratory study. Parasitology, 94:583-595, 1987.
- Shaked and Shanthikumar (2007) M.Β Shaked and J.G. Shanthikumar. Stochastic orders. Springer, New York, 2007.
- Shaw and Dobson (1995) D.J Shaw and A.P. Dobson. Patterns of macroparasite abundance and aggregation in wildlife populations: A quantitative review. Parasitology, 111:S111βS133, 1995.
- Shaw et al (1998) D.J. Shaw, B.T. Grenfell and A.P. Dobson. Patterns of macroparasite aggregation in wildlife host populations. Parasitology, 117:597-610, 1998.
- Taguchi (1968) T. Taguchi. Concentration-curve methods and structures of skew populations. Annals of Institute of Statistical Mathematics, 20:107β141, 1968.
- Tinsley et al (2020) R.C. Tinsley, H.R. Vineer, R. Grainger-Wood and E.R. Morgan, ER. Heterogeneity in helminth infections: factors influencing aggregation in a simple host-parasite system. Parasitology, 147:65-77, 2020.
- Wade et al (2018) M.J. Wade, C.L. Fitzpatrick and C.M. Lively. 50-year anniversary of Lloydβs βmean crowdingβ: Ideas on patchy distributions. J. Anim. Ecol., 87:1221β1226, 2018.
- Wickham (2016) H. Wickham. ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag, New York, 2016.