Visual Exploration on the Genetic and Landscape Mechanisms for Morphological Shape Variation

Jürgen Symanzik

Visual Exploration on the Genetic and Landscape Mechanisms for Morphological Shape Variation XiaoTian Dai∗ ∗ Jürgen Symanzik† Abbass Sharif‡ Guifang Fu§ Utah State University, Department of Mathematics and Statistics, 3900 Old Main Hill, Logan, UT 84322–3900, USA. Fax: 435 797 1822, E-mail: xiaotian.dai@aggiemail.usu.edu † Utah State University, Department of Mathematics and Statistics, 3900 Old Main Hill, Logan, UT 84322–3900, USA. Phone: 435 797 0696, Fax: 435 797 1822, E-mail: symanzik@math.usu.edu ‡ University of Southern California, USC Marshall School of Business, Los Angeles, CA 90089–0808, USA. Phone: 435 757 3431, E–mail: asharif@marshall.usc.edu § Utah State University, Department of Mathematics and Statistics, 3900 Old Main Hill, Logan, UT 84322–3900, USA. Phone: 435 797 0749, Fax: 435 797 1822, E-mail: guifang.fu@.usu.edu Abstract Morphological shape trait has long been a focus of many disciplines, but searching for structure in shape curves is opening unprecedented possibilities for investigating the complex features of morphological shape and help recognizing the genetics and landscape mechanisms that affect shape. Visualization studies of the interplay among landscape measurement, biological markers, and shape image can characterize important genetic and environmental effects on shape and discover their relative importance. In this article, we adapt five visual data mining methods to visualize the shape patterns under three different genotypes (aa, Aa, and AA), three location factors (longitude, latitude, and elevation), and two principal components (PC1 and PC3). Our study is motivated by a Quantitative Trait Loci (QTL) shape mapping research on the real leaf shape data of a natural population of poplar, Populus szechuanica var tibetica. Based on the estimated genotype information by the Linkage Disequilibrium (LD) model, we illustrate several noticeable characteristics that were not recognized before. Key Words: Radius-Centroid-Contour; Quantitative Trait Loci; Morphological Shape; Conditioned Choropleth Map; Data Enveloping; Density-based Plots; Heatmap; Sorted Cumulative Sum Plot. 1. Introduction Tremendous variations in morphological shape exist in every living organism from microbes and plants to animals and humans. Shape can be used to predict the structural and functional relationships implicated in changing environments and it is also important for yield, taste, biological functions, and so on. Although shape is of prominence to our daily life, the comprehensive underlying mechanisms that affect shape are still unknown. Genes have been already recognized to play an important role in controlling phenotypic variation in shape (Van der Knapp et al., 2002; Klingenberg, 2010; Scarpella et al., 2010). In addition, the plasticity of morphology, i.e., the environmental or landscape factors that influence adaptive genetic diversity in shape variation, has also been noticed (Myers et al., 2006; Debat et al., 2009; Gomez et al., 2009). The majority of shape-based genetic studies in the literature quantified the high dimensional image by either one single number (ratio of length over width) or a few biological meaningful landmarks (Rohlf and Marcus, 1993; Klingenberg, 2003, 2010; Langlade et al., 2005; Leamy et al., 2008), which are inaccurate and even infeasible when the shapes have abruptly curved boundaries. Fu et al. (2013) quantified the leaf shape using the Radius-CentroidContour (RCC) approach and represented each shape by a 360-dimensional RCC curve accurately. As a drawback, the super high-dimensional curve brings big challenges for understanding the genetic effect on the phenotype, and, moreover, the complicated interactions between environmental and genetic effects make it extremely difficult to understand how these important mechanisms characterize shape. The objectives of this article are to clearly characterize the environmental and genetic mechanisms determining morphological shape from the visualization aspect and, hence, throw light on the deeper understanding of shape mapping. To be more specific, we still work on the same real leaf shape data of a natural population of poplar, Populus szechuanica var tibetica as Fu et al. (2013) to detect more characteristics that were not found previously. Fu et al. (2013) reported that the most significant QTL is linked with marker GCP M 1063 with a genotype effect from lanceolate (AA) to ovate-orbicular (Aa) to ovate (aa) for the first principal component (PC1) and a subtle genotype effect for PC3. Using the estimated QTL genotype information of marker GCP M 1063, we apply visual data mining methods for further exploration. A variety of visualization methods have been investigated. The conditioned choropleth maps (Figure 2 and Figure 3) are well designed for illustrating tremendous information simultaneously on one plot, including landscape measurements (i.e., longitude, latitude, and elevation) and three QTL genotypes (aa, Aa, and AA). Data Enveloping (Figure 4 and Figure 5) is mainly used for evaluating how the detailed features of shape are regulated by each QTL genotype, respectively. Figure 5 evaluates the similar information from the image domain and provides a more clear information about how leaf shape varies under each QTL genotype. The density-based plot (Figure 6) inspires us to gain more knowledge about the distribution of the RCC curves within each QTL genotype group. Heatmaps (Figure7) do not only show the feature changing pattern of each curve, but they also capture the individual feature variation for all subjects. The sorted cumulative sum plot (Figure 8) measures how far a shape is away from a round circle, which gives a different judgment for the lanceolate or ovate shape. 2. Data Visualizations for Genotype Effects on Shape Trait 2.1 Shape Representation and Genetic Mapping In order to quantify the leaf shape for our research purposes, we use a RadiusCentroid-Contour to describe a shape. In the current literature, the majority of genetic studies on shape use landmarks to represent a shape. Landmarks are a set of points on the boundary assigned by either geometrical property (such as high curvature), or an extremum point, or specific biological meaning (Belongie et al., 2002). We selected 360 points on the boundaries of the leaves with the same angle increment and measured the radii from the centroid to the contour for each of the corresponding points. This gives an accurate and robust description of shape and this can be described as RCC values (Belongie et al., 2002). Figure 1 demonstrates the procedure of extracting shape (i.e., boundary) information from a leaf photo (see Figure 1(A) to (C)). After the radii from the centroid to the selected 360 points are measured, we can sketch a RCC curve for one individual leaf, as shown in Figure 1(D). The shape information of leaves are transformed into functional data sequences. Hence, we have 360 dimensions to describe the shape of one individual leaf, or, say, one observation, but we only have 106 leaves (observations) in our dataset. Fu et al. (2013) use principal component analysis (PCA) to reduce the dimensions of the RCC data by removing redundant information through mapping the high dimensional data to the subspace that best accounts for the distribution of the original pattern. The main idea behind PCA is to maximize the variation by finding a certain number of orthogonal axes, called PCs, which is much fewer than the original number of variables. For our RCC dataset, it was found from PCA that the first six orthogonal axes, or, say, PCs, could explain 88.1% of the variation among the samples, which, ordered according to the percentages of variance they explained, are PC1, 47.3%; PC2, 23.2%; PC3, 6.7%; PC4, 5.1%; PC5, 3.5%; and PC6, 2.3% (Fu et al., 2013). These PCs can describe each leaf shape by capturing different aspects of leaf shape variability including global and local. Next, we will apply the Linkage Disequilibrium (LD) model to detect QTLs using these PC values. To map the QTLs that affect leaf shape, the PC values were associated with 29 microsatellite markers. Some markers may be associated with different types of PC axes, suggesting that the same QTLs have a pleiotropic effect on different features of a leaf shape. For example, marker GCPM 1063 is significantly associated with PC1 and PC3. In general, the QTLs detected by PC1 control overall leaf shape variation, whereas the QTLs detected by the other PCs are responsible for local subtle leaf variation. The QTL detected (A) (B) ! ! ! ! ! ! ! ! !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! (D) (C) " RCC 0.08 # 0.06 0.04 0 90 180 270 360 Angle Figure 1: The procedure of quantifying the shape of a leaf. The vector of ! which is a function of RCC values in Figure 1(D) is expressed as a curve, radial angle. by marker GCPM 1063 alters the leaf shape from lanceolate (AA) to ovateorbicular (Aa) to ovate (aa) through PC1 and PC3. We can estimate the genotype of each leaf based on its PC1 or PC3 value after applying the LD model. It is common that the estimated genotype labels of the same leaf are different for different PC axes. 2.2 Spatial Distribution of Genotypes Before looking at the actual RCC values with respect to the PCs, it seems to be worthwhile to take a first look at the spatial distribution of the locations where the leaves were collected — and which genotype was recorded at that location. Elevation is available as an additional variable. Figures 2 and 3 Elevation (1/3, 2/3, 3/3 quantile) PC1 Genotype (aa, Aa, AA) Figure 2: Conditioned choropleth maps based on PC1 genotypes. show so-called conditioned choropleth (CC) maps (Carr et al., 2000, 2005). CC maps contain a series of small maps of the same region (arranged in a matrix layout) where the spatial information shown in each of the maps is conditioned on two conditioning variables. In Figure 2, the genotypes are estimated using PC1 of the RCC values. The three columns of the map matrix are corresponding to the genotypes aa, Aa, and AA, respectively, and the three rows of the map matrix are corresponding to the one-third, two-thirds, and three-thirds quantile of the elevation values, respectively. Using CC maps, we can explore possible interactive effects between genotypes and landscapes on shape. Compared to Figure 2, Figure 3 uses PC3 genotypes to produce the CC maps for the leaves dataset. As shown in these two figures, most of the leaves are labeled as genotype aa by PC1, while most of the leaves are labeled as genotype AA by PC3. Elevation does not seem to have any significant effect because the samples are clustered randomly for each row (corresponding to each 1/3 of Elevation quantile). This is confirmed in Table 1. Latitude does not seem to have any significant effect either because all the leaves are clustered on the middle when looking from the vertical direction. However, it is worth to mention that the longitude Elevation (1/3, 2/3, 3/3 quantile) PC3 Genotype (aa, Aa, AA) Figure 3: Conditioned choropleth maps based on PC3 genotypes. does have a noticeably significant effect interacting with the genotype. To be more specific, most shapes are located on the left (West) and middle for the three-thirds quantile of elevation, most shapes locate on the middle for the two-thirds quantile of elevation, but most shapes locate on the middle and right (East) for one-thirds quantile of elevation. These longitude effects hold for both, PC1 and PC3. By comparing the first row and the third row of Figure 2 and Figure 3, one can see the difference from the longitudinal direction. This indicates that there likely exist different environmental conditions (such as temperatures, sunlight, and precipitation) in these regions that could have some additional effects on the leaf shape. We also conducted Pearson’s Chi-square tests on the counts of the leaves in the nine cells for the two map matrices. Table 1 (left) shows the counts of leaves in the nine cells with respect to PC1 genotypes (Figure 2), and Table 1 (right) shows the counts of leaves in the nine cells with respect to PC3 genotypes (Figure 3). Neither of them gives a significant p-value (about 0.27 for PC1 and about 0.53 for PC3). We will further explore how the two PCs explain the variation of the shape of leaves in the following sections. Elevation 1/3 Quantile 2/3 Qunatile 3/3 Qunatile p-value Genotypes aa Aa AA 18 16 1 24 9 2 18 14 4 0.2684 Elevation 1/3 Quantile 2/3 Qunatile 3/3 Qunatile p-value Genotypes aa Aa AA 2 6 27 2 12 21 1 10 25 0.5299 Table 1: Pearson Chi-square test for PC1 (left) and PC3 (right) genotypes. 2.3 Data Enveloping for RCC Values Data enveloping is the process of subsetting the dataset into different classes, drawing bands around each class of data observations, and then filling each band with a different color. This technique was initially introduced for parallel coordinate plots (Inselberg et al., 1987) and it was later refined by Moustafa et al. (2011). Sharif and Symanzik (2012) applied data enveloping techniques to functional data. The bands often range from the minimum observation to the maximum observation for every variable, but sometimes, the extreme observations might be outliers that can cause heavy overlapping of the bands. In order to overcome this problem, two modifications are possible: (1) the bands could be drawn with a narrower range; for example, from the 25th percentile to the 75th percentile of a given class; and (2) alpha blending techniques (Porter and Duff, 1984) could be used for the colors to create transparency effects in order to be able to see the hidden parts of overlapping bands. Often, both of these modifications are used simultaneously. Data enveloping techniques are helpful to clearly see patterns for each class and to visually validate cluster analysis results (Sharif and Symanzik, 2012). In our case, we want to group the RCC curves of the same QTL genotype explained by PC1 and PC3, and then use this visual tool to explore the difference between each genotype. We are particularly interested in PC1 since it controls the majority shape variation. Figure 4 (A & C) shows three bands ranging from the minimum observation to the maximum observation for every angle degree of the three genotypes. For PC1 genotypes, the three bands in Figure 4 (A) overlay on each other. However, we still can see that most of the RCC values of genotype AA are higher than that of genotype aa at the angle around 90 degree, and most of the RCC values of genotype AA are lower than that of genotype aa at the angle around 180 degree. The band of genotype Aa is between the other two bands. This actually confirms the additive effect of the QTL. Nevertheless, some of the extreme observations may be outliers for each genotype, and cause the heavy overlapping among the classes. In order to achieve a better visualization, the bands could be drawn with a narrower range from the 25th to the 75th percentile of the RCC values for every angle degree of the three genotypes, as shown in Figure 4 (B & D). As we can see in Figure 4 (B), the RCC values differ mostly at the angle around 90, 180 and 360 degrees for the three genotypes. In order to explore the overall difference in their shapes, we transfer the RCC curves to a polar coordinate system (i.e., transfer from the vector space to the image domain) (A): RCC(Minimum − Maximum) for PC1 (C): RCC(Minimum − Maximum) for PC3 0.10 0.10 Genotype aa Aa 0.06 Genotype 0.08 RCC RCC 0.08 aa Aa 0.06 AA AA 0.04 0.04 0 90 180 270 360 0 90 Angle (B): RCC(25th − 75th Percentile) for PC1 270 360 (D): RCC(25th − 75th Percentile) for PC3 0.10 0.10 Genotype aa Aa 0.06 Genotype 0.08 RCC 0.08 RCC 180 Angle aa Aa 0.06 AA 0.04 AA 0.04 0 90 180 270 360 0 Angle 90 180 270 360 Angle Figure 4: Data envelopes for RCC data curves grouped by three genotypes of PC1 and PC3. in order to “recover” the shape of the leaves. The four plots in Figure 5 shows the three genotype bands ranging from the minimum to the maximum RCC values and from the 25th to the 75th percentile of the RCC values for both PC1 and PC3. The plots show that the leaves of genotype AA turn to be longer (vertical direction) and narrower (horizontal direction) than the leaves of genotype aa, and the leaves of genotype Aa are between those of AA and aa with respect to length and width based on the PC1 genotype labels. For PC3 genotype effects, the subtle differences are observed near the tip of the leaves. 2.4 Density-based Plots for RCC Values Although we can use data enveloping to explore the overall pattern of each genotype, we still cannot visualize variability and distribution of RCC values within each envelope. The density-based plots are created based on Jackson’s density strip plots (Jackson, 2008). The density strip is a shaded monochrome strip whose darkness at a point is proportional to the probability density of the quantity at that point, darkest at points of highest probability density, and white at points of zero density. The density strip plots can provide a general visualization of the distribution of the RCC values at different angle degrees, while data envelops can only shows the minimum, maximum, or a specific percentile of the RCC values. Figure 6 (bottom) shows a density strip plot for all the RCC curves. It is obvious that the RCC values are not uniformly distributed at each angle and (A): RCC(Minimum − Maximum) for PC1 (C): RCC(Minimum − Maximum) for PC3 270 270 0.100 0.100 0.075 0.075 0.050 aa 360 0.000 180 Aa Genotype 0.025 RCC 0.025 RCC 0.050 Genotype aa 360 0.000 180 Aa AA AA 90 90 Angle Angle (B): RCC(25th − 75th Percentile) for PC1 (D): RCC(25th − 75th Percentile) for PC3 270 270 0.100 0.100 0.075 0.075 0.050 aa 360 0.000 180 Aa Genotype 0.025 RCC RCC 0.050 Genotype 0.025 aa 360 0.000 180 AA Aa AA 90 90 Angle Angle Figure 5: Data envelopes for RCC data curves grouped by three genotypes of PC1 and PC3 in a polar coordinate system. most of the RCC curves are clustered in the center of the envelope. The top six graphs in Figure 6 show the RCC curves in three separate plots, based on the genotypes of their corresponding leaves. The left column relates to the genotypes with respect to PC1, the right column relates to the genotypes with respect to PC3. Figure 6 (left) shows that the distribution of the overall curve dynamic for AA is dramatically different from that of aa, no matter whether for PC1 or PC3. Looking at the three distributions of PC3, we notice that the majority of the differences are around 90 degree, which corresponds to the tip of the leaves. Figure 6 (right) shows that RCC values (based on PC3) exhibit much less variation among the three genotype groups. 2.5 Heatmaps for RCC Curves There are 106 RCC curves in our dataset, representing 106 different leaves. If we want to compare each individual RCC curve, the traditional way is to stack all curves in a fashion where it sometimes becomes difficult to fit all plots on one page or a computer screen. Instead of simply stacking all the curves, the data can be drawn as a heatmap (Eisen et al., 1998). Other terms for heatmaps that can be found in the literature are “colored histograms” (Wegman, 1990) and “data images” (Minnotte and West, 1998; Morphet and Symanzik, 2010). The idea underlying a heatmap is simple: First we break the the range of the curve into a small number, say 10, non-overlapping intervals 90 180 270 360 0 180 270 Angle Aa Genotype (PC3) 360 0.07 0.03 0.03 0.05 RCC 0.07 0.09 Angle 0.05 90 180 270 360 0 90 180 270 0.07 0.05 0.03 0.03 0.05 RCC 0.07 0.09 Angle aa Genotype (PC3) 0.09 Angle aa Genotype (PC1) 360 180 270 360 0 90 180 270 Angle Angle All Genotypes All Genotypes 360 0.05 0.03 0.03 0.05 RCC 0.07 0.09 90 0.09 0 0.07 RCC 0 RCC 90 Aa Genotype (PC1) 0.09 0 RCC 0.07 0.03 0.05 RCC 0.07 0.03 0.05 RCC 0.09 AA Genotype (PC3) 0.09 AA Genotype (PC1) 0 90 180 Angle 270 360 0 90 180 270 Angle Figure 6: Density-based plots for RCC values of all genotypes. 360 100 AA 100 80 Aa 80 60 40 Leaves 60 40 Leaves AA 20 Aa Aa 0.05 aa Angle 360 315 270 225 180 135 90 45 0 360 315 270 225 180 135 90 45 0 0.03 0.05 aa 0.07 AA Aa RCC 0.07 0.09 aa AA 0.03 RCC 0.09 20 aa Angle Figure 7: Heatmaps for PC1 (left) and PC3 (right) genotype labels. (I1 , I2 , . . . I10 ). Each interval gets assigned to a specific color. We typically use sequential color schemes (where I1 is related to darkest tone of a specific color and I10 is related to brightest tone of a specific color — or vice versa) or divergent color schemes (where I1 and I10 are related to different colors, say blue and red, that fade through some neutral colors that are related to I5 and I6 ). Each numeric value of each variable of the RCC is translated into a color, based on into which interval Ij this value falls. In a heatmap, the colors for each observation (here the leaves) for each variable (here the angles) are drawn according to this color assignment. Typically, observations make up the rows while variables make up the column. Rows and columns of a heatmap may or may not be sorted. This kind of categorization or discretization allows to visualize variation of all leaf shapes in a big picture (Sharif and Symanzik, 2012), and it becomes possible to stack a large amount of RCC curves. Figure 7 (left) shows the heatmap for all the RCC curves based on PC1 genotype labels, and Figure 7 (right) shows the heatmap for all the RCC curves based on PC3 genotype labels. There are 106 horizontal image strips in each plot, divided into three groups that are separated with bold horizontal black lines, which are corresponding to three genotypes, AA, Aa and aa. Within each genotype group, the image strips are resorted and clustered using a k-means algorithm. The 360 columns in this plot represent the variables, in this case, the 360 different angles. The colors of the image range from orange, representing the highest magnitude, to dark blue, representing the lowest magnitude. Figure 7 (left) shows that the RCC curves of genotype aa have much more dramatic fluctuation patterns than that of the other two groups because the highest magnitude (near 90 and 270 degree) and lowest magnitude (180 and 360 degree) both show up by taking turns of each other. Compared to aa, the fluctuation patterns of genotype Aa are relatively flat, with smooth color shifts except the high magnitude around 270 degree. The genotype AA has the smallest amount of sample leaves. Figure 7 (right) shows an opposite pattern as far as the fluctuation varying and the number of samples within each genotypes are concerned. We also notice that some of the RCC curves are quite similar across different genotypes. Recalling the heavy overlapping of the three raw data bands in Figure 4 (A & C), we can also conclude that many of the RCC curves are not well distinguishable with respect to the QTL genotype information only obtained from PC1. Figure 7 (bottom) shows the three genotypic curves (mean curves) at each angle for PC1 and PC3, respectively. These two graphs provide some summary information and allow some general comparison among the three genotypes. 2.6 Sorted Cumulative Sum Plots for RCC Curves Sorted cumulative sum plots are another way to compare the different genotypes. In a cumulative sum plot (not shown here), we simply sum up the individual values of the RCC curve for each observation. Mathematically, this is similar to the relationship between probability mass functions (pmf’s) and cumulative distribution functions (cdf’s). In a sorted cumulative sum plot, we do not sum up the individual values in sequential order, but, rather, we sort the observations first from smallest to largest and then sum up the ordered values. A sorted cumulative sum plot is closely related to a Lorenz curve (Lorenz, 1905). For a leaf that has a perfectly circular shape, all (sorted) values would be exactly the same and, thus, the resulting sorted cumulative sum plot would show a straight line for such a leaf. The more elliptical or irregularly shaped a leaf is, the further the resulting cumulative sum plot would deviate from a straight line. In fact, even though not explored here, a measure similar to the Gini coefficient for a Lorenz curve could be calculated that numerically quantifies the departure of a leaf from a perfectly circular shape. A brief summary of Gini’s earlier work, originally published in Italian, can be found in Gini (1921). A more detailed discussion of the relationship between Lorenz curve and Gini coefficient has been given in Gastwirth (1972) and a possible application for plant size has been introduced in Damgaard and Weiner (2000). Figure 8 shows the sorted cumulative sum plots for PC1 (A & B) and PC3 (C & D). The two top graphs (A & C) show the envelopes based on the minimum and maximum for each genotype. For PC3 (C), hardly any difference in the curves can be observed. The min–max envelope for AA almost completely covers the min–max envelopes for Aa and aa. In contrast, there is some separation of PC1 (A). Leaves with genotype aa are more circular and leaves with genotypes AA are least circular, which exactly match the results of lanceolate (AA) to ovate-orbicular (Aa) to ovate (aa) reported by Fu et al. (2013). The two bottom graphs (B & D) show the envelopes based on the (A): PC1 (Minimum − Maximum) 15 Genotype aa 10 Aa AA 5 Sum (RCC) Sum (RCC) 15 (C): PC3 (Minimum − Maximum) aa 10 Aa AA 5 0 0 0 90 180 270 360 0 90 180 270 360 Order Order (B): PC1 (25th − 75th Percentile) (D): PC3 (25th − 75th Percentile) 15 15 Genotype aa 10 Aa AA 5 0 Sum (RCC) Sum (RCC) Genotype Genotype aa 10 Aa AA 5 0 0 90 180 270 360 0 Order 90 180 270 360 Order Figure 8: Sorted cumulative sum plots for PC1 (left) and PC3 (right) genotype labels. Shown on top are the min–max envelopes and shown at the bottom are the envelopes based on the 25th percentile to the 75th percentile. 25th percentile to the 75th percentile. For PC3 (D), the envelope for Aa (in red) is covered by the envelopes for aa and AA, hence only two envelopes are observable. This indicates the dominant effect of the QTL detected by PC3. For PC1 (B), there is now a clear separation between the three bands, further supporting that aa leaves are more circular than Aa leaves and AA leaves, which indicate the additive effect of the QTL detected by PC1. 3. Conclusions The morphological shape measured by a camera and saved in a jpg file can be accurately described using Radius-Centroid-Contour curves. After shape analysis, each 360-dimensional RCC curve can uniquely represent the shape of an object. After alignment, the shape variations of all subjects caused by position, scale, and rotation will be removed. Then, the genetic effect on shape can be estimated by a Linkage Disequilibrium (LD) model (Fu et al., 2013). In this paper, we presented visualization techniques for exploring the RCC curves as a function of angle. In particular, we looked into ways of characterizing the shape feature variation caused by genetics and environment. Five visual tools have been adapted for shape analysis and they give different information from different aspects, including conditioned choropleth maps, data enveloping, density-based plots, heatmaps, and sorted cumulative sum plots. References Belongie, S., Malik, J. and Puzicha, J. (2002), ‘Shape Matching and Object Recognition Using Shape Contexts’, Pattern Analysis and Machine Intelligence, IEEE Transactions on 24(4), 509–522. Carr, D. B., Wallin, J. F. and Carr, D. A. (2000), ‘Two New Templates for Epidemiology Applications: Linked Micromap Plots and Conditioned Choropleth Maps’, Statistics in Medicine 19(17–18), 2521–2538. Carr, D. B., White, D. and MacEachren, A. M. (2005), ‘Conditioned Choropleth Maps and Hypothesis Generation’, Annals of the Association of American Geographers 95(1), 32–53. Damgaard, C. and Weiner, J. (2000), ‘Describing Inequality in Plant Size or Fecundity’, Ecology 81(4), 1139–1142. Debat, V., Debelle, A. and Dworkin, I. (2009), ‘Plasticity, Canalization, and Developmental Stability of the Drosophila Wing: Joint Effects of Mutations and Developmental Temperature’, Evolution 63(11), 2864–2876. Eisen, M. B., Spellman, P., Brown, P. O. and Botstein, D. (1998), ‘Cluster Analysis and Display of Genome–Wide Expression Patterns’, Proceedings of the National Academy of Sciences of the United States of America 95(25), 14863–14868. Fu, G., Bo, W., Pang, X., Wang, Z., Chen, L., Song, Y., Zhang, Z., Li, J. and Wu, R. (2013), ‘Mapping Shape Quantitative Trait Loci Using a Radius– Centroid–Contour Model’, Heredity 110(6), 511–519. Gastwirth, J. L. (1972), ‘The Estimation of the Lorenz Curve and Gini Index’, The Review of Economics and Statistics 54(3), 306–316. Gini, C. (1921), ‘Measurement of Inequality of Incomes’, The Economic Journal 31(121), 124–126. Gomez, J. M., Abdelaziz, M., Pajares, J. M. and Perfectti, F. (2009), ‘Heritability and Genetic Correlation of Corolla Shape and Size in ERYSIMUM MEDIOHISPANICUM’, Evolution 63, 1820–1831. Inselberg, A., Reif, M. and Chomut, T. (1987), ‘Convexity Algorithms in Parallel Coordinates’, Journal of the ACM (JACM) 34(4), 765–801. Jackson, C. H. (2008), ‘Displaying Uncertainty with Shading’, The American Statistician 62(4), 340–347. Klingenberg, C. P. (2003), ‘Quantitative Genetics of Geometric Shape: Heritability and the Pitfalls of the Univariate Approach’, Evolution 57, 191–195. Klingenberg, C. P. (2010), ‘Evolution and Development of Shape: Integrating Quantitative Approaches’, Nature Reviews Genetics 11, 623–635. Langlade, N. B., Feng, X. Z., Dransfield, T., Copsey, L., Hanna, A. I., Thebaud, C., Bangham, A., Hudson, A. and Coen, E. (2005), ‘Evolution Through Genetically Controlled Allometry Space’, Proceedings of the National Academy of Sciences 102, 10221–10226. Leamy, L. J., Klingenberg, C. P., Sherratt, E., Wolf, J. B. and Cheverud, J. M. (2008), ‘A Search for Quantitative Trait Loci Exhibiting Imprinting Effects on Mouse Mandible Size and Shape’, Heredity 101, 518–526. Lorenz, M. O. (1905), ‘Methods of Measuring the Concentration of Wealth’, Publications of the American Statistical Association 9(70), 209–219. Minnotte, M. C. and West, R. W. (1998), The Data Image: A Tool for Exploring High Dimensional Data Sets, in ‘1998 Proceedings of the Section on Statistical Graphics’, American Statistical Association, Alexandria, VA, pp. 25–33. Morphet, W. J. and Symanzik, J. (2010), ‘The Circular Dataimage, a Graph for High–Resolution Circular–Spatial Data’, International Journal of Digital Earth 3(1), 47–71. Moustafa, R. I., Hadi, A. S. and Symanzik, J. (2011), ‘Multi–Class Data Exploration Using Space Transformed Visualization Plots’, Journal of Computational and Graphical Statistics 20(2), 298–315. Myers, E. M., Janzen, F. J., Adams, D. C. and Tucker, J. K. (2006), ‘Quantitative Genetics of Plastron Shape in Slider Turtles’, Evolution 60, 563–572. Porter, T. and Duff, T. (1984), ‘Compositing Digital Images’, SIGGRAPH Computer Graphics 18(3), 253–259. Rohlf, F. J. and Marcus, L. F. (1993), ‘A Revolution in Morphometrics’, Trends in Ecology & Evolution 8, 129–132. Scarpella, E., Barkoulas, M. and Tsiantis, M. (2010), ‘Control of Leaf and Vein Development by Auxin’, Cold Spring Harbor Perspectives in Biology 2, a001511. Sharif, A. and Symanzik, J. (2012), Graphical Representation of Clustered Functional Actigraphy Data, in ‘2012 JSM Proceedings’, American Statistical Association, Alexandria, VA. (CD). Van der Knapp, E., Lippman, Z. B. and Tanksley, S. D. (2002), ‘Extremely Elongated Tomato Fruit Controlled by Four Quantitative Trait Loci with Epistatic Interactions’, Theoretical and Applied Genetics 104, 241–247. Wegman, E. J. (1990), ‘Hyperdimensional Data Analysis Using Parallel Coordinates’, Journal of the American Statistical Association 85(411), 664–675.

Log In

Visual Exploration on the Genetic and Landscape Mechanisms for Morphological Shape Variation

Visual Exploration on the Genetic and Landscape Mechanisms for Morphological Shape Variation

Related Papers

RELATED PAPERS