Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Received: 25 December 2020 Revised: 30 May 2021 Accepted: 13 June 2021 DOI: 10.1002/oa.3014 RESEARCH ARTICLE Best practice for osteological sexing in forensics and bioarchaeology: The utility of combining metric and morphological traits from different anatomical regions Denise U. Navitainuck1 | Werner Vach1,2 | Kurt W. Alt1,3 | Jörg Schibler1 1 Integrative Prehistory and Archeological Science (IPAS), Department of Environmental Sciences, University of Basel, Basel, Switzerland 2 Basel Academy for Quality and Research in Medicine, Basel, Switzerland 3 Center of Natural and Cultural Human History, Danube Private University, Krems, Austria Abstract This paper aims to systematically investigate the value of combining traits from different anatomical regions in osteological sexing by contrasting the utility of single traits and established scores with those of ensembles of traits from single or multiple anatomical regions, allowing metric and morphological traits to be combined. The utility was defined as the fraction of the population for whom we could reach a posterior probability above 95% of being male or female. A total of 675 adult individuals Correspondence Denise U. Navitainuck, Integrative Prehistory and Archeological Science (IPAS), Department of Environmental Sciences, University of Basel, Spalenring 145, Basel CH-4055, Switzerland. Email: denise.navitainuck@unibas.ch Funding information The study was funded by a scholarship of Studienstiftung des Deutschen Volkes. from the sixth to eighth century AD cemetery of Mannheim Bösfeld, Germany, were assessed, and 27 postcranial metric traits and 41 morphological traits from the pelvis, mandible, and cranium were used. In addition, 13 metric and 3 morphological scores were considered. Linear discriminant analysis (LDA) was used to construct rules and cross validation to determine accuracy and utility. These parameters were determined for single traits and scores, trait groups defined by anatomical regions and/or previously considered in the literature, and ensembles of traits defined by selecting several promising traits from different anatomical regions. Accuracy of single traits ranged from 0.76 to 0.94, with scores even reaching 0.97, but utility remained around 0.2–0.4 for metric traits and up to 0.6 for morphological traits. Only scores and ensembles combining traits from different anatomical regions reached a utility above 0.7; that is, sex could be estimated in more than 70% of the individuals with a posterior probability above 95%. When selecting a limited number of traits for systematic sexing in a human skeletal series, it is advisable to select traits from different anatomical regions to obtain a reasonably reliable result in as many individuals as possible. Large scale investigations covering all relevant anatomical regions and involving a wide range of populations are required for more precise recommendations. KEYWORDS latent class analysis (LCA), metric traits, morphological traits, sex estimation This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made. © 2021 The Authors. International Journal of Osteoarchaeology published by John Wiley & Sons Ltd. Int J Osteoarchaeol. 2021;1–14. wileyonlinelibrary.com/journal/oa 1 NAVITAINUCK ET AL. 2 1 | B A CKG R O U N D The starting point of the investigation was a list of publications on sex estimation considering various groups of traits. The traits iden- Biological sex, alongside age at death, is among the most essential tified in these publications were determined, when possible, in all information required about an individual in both a forensic and bio- adult skeletal individuals of an early medieval series. We then deter- archaeological context. It is also a key building block for population mined the utility of each single trait, and the previously suggested studies. However, an accurate assessment of these parameters is highly combinations of traits, before considering the utility of trait groups dependent on the preservation and integrity of the human remains. defined by anatomical regions and new ensembles of traits covering Both metric and morphological traits have been used previously for several anatomical regions, including both morphological and metric skeletal sexing (e.g., Buikstra & Ubelaker, 1994; Novotny, 1975; Rösing traits. The utility of a trait group is determined by developing a dis- et al., 2007; Stloukal, 1982). Traditionally, metric traits have been used criminant score specific for the series considered here; that is, we to construct binary decision rules based on the application of a discrimi- make no use of published sex estimation approaches. nant function to one or several metric traits (e.g., Albanese et al., 2005; In the Supporting Information for this article, we systematically Alt et al., 1995; Christensen et al., 2014; Garcia-Campos et al., 2018; compared the distribution of single traits in our population with those Henke, 1977). Morphological traits have typically been used to arrive at from the selected publications and investigated the distribution of the a 3- (female, indeterminate, and male) or 5-point (female, probable posterior probabilities obtained by the DPS2 tool in our sample. female, indeterminate, probable male, and male) ordinal scale. A fundamental step forward was the work by Murail et al. (2005) in developing the diagnose sexuelle probabiliste (DSP) and DSP2 (Bruzek et al., 2017) 2 | MATERIAL AND METHODS tools (based on classical discriminant analysis), enabling for any subset of at least 4 out of 10 metric pelvic traits to determine the (posterior) 2.1 | The skeletal sample probability of a skeletal individual representing a male or a female subject, respectively. The approach behind these tools combines two basic The basis of this study was the skeletal sample from the early medi- principles: flexibility and cautiousness. Flexibility in that it allows the eval cemetery of Mannheim-Seckenheim (“Hermsheimer Bösfeld”), sex estimation of any subject with a sufficient number of traits to be Germany, which has been previously studied (e.g., Hansen & conducted, regardless as to how many and which traits are measurable. Alt, 2012; Meyer et al., 2014; Meyer & Alt, 2012; Navitainuck This is in contrast to alternative published approaches using multiple et al., 2013). Given the number of graves and grave goods recov- traits, as they typically require all traits to be measurable. And cautious- ered from this cemetery, dating from the sixth to the eighth century ness, as Murail et al. (2005) require a posterior probability of at least AD, the skeletal sample represented a promising cohort for both 95% to be male, or of at least 95% to be female, in order to arrive at a osteological and archeological research. More than 900 individuals final sex estimation – an approach also used in other studies were found at this site, and a total of 675 adult individuals form the (e.g. Hora & Sladek, 2018; Santos et al., 2014). In this way, these basis of this study. Age estimation was carried out according to authors overcame the tradition of strictly binary decision rules for met- international ric traits and introduced an intermediate group for which the sex Lovejoy, 1985; Miles, 1963; Nemeskéri et al., 1960; Szilvassy, 1988; remains undetermined. Todd, 1920) as part of the general anthropological study of this However, this flexible and cautious approach can be applied to any group of traits, including morphological ones. For any given group standards (Lovejoy et al., 1985; Meindl & cemetery (Meyer & Alt, 2012), separating subadult from adult individuals (20 years and older). of traits, we can arrive at a posterior probability for any given sub- The skeletal sample from the Bösfeld site is assumed to represent group of these traits measurable in an individual using discriminant Franconian retinue warriors and their families (Koch, 2013; Koch & analysis. This allows us to compare different trait groups not only with Wirth, 2005). According to the archeological findings, the connection respect to their accuracy (as done in many published studies) but also between the Merovingian cemetery and the settlement indicates con- with respect to their utility for sex estimation. We define the utility of tinuous use. Burials with weapons as grave goods are assumed to be a trait group as the fraction of skeletal individuals for whom we can warriors, with a total of 10 fully armed warriors and about 80 burials, arrive at a final sex estimation, that is, at a posterior probability of at per generation, having been identified (Koch, 2013). Together with least 95% being either male or female. A high utility of a group of the historical background, this forms a well-founded picture of the traits requires both a high accuracy as well as a sufficiently high rate composition of the sample (Koch, 2004, 2007, 2013; Link, 2003; of measurable traits. Wirth et al., 2007). The aim of this paper is to systematically investigate the increase in utility when combining several traits using this flexible and cautious approach, based on a variety of traits that have already been established for sex estimation in general. Both metric and morphologi- 2.2 | Traits and trait groups selected from the literature cal traits were considered, and several anatomical regions were covered. The focus was to combine traits across several anatomical Ten publications on metric traits for sex estimation and five publica- regions, including combining morphological with metric traits. tions on morphological traits build the basis of our investigation NAVITAINUCK ET AL. 3 (cf. Table 1). From these publications, which covered recent This way, 13 scores involving only the selected metric traits and (e.g., Gapert et al., 2009) as well as traditionally used approaches 3 scores based on morphological traits were identified and included in (e.g., Phenice, 1969), we selected 27 main metric traits that are easily the analysis. Tables 1–4 give an overview on the traits and scores identified in measurable and have a high probability of preservation. The anatomical regions covered are os occipitalis, scapula, os coxae, patella, talus, the selected publications. and calcaneus. All 41 morphological traits from the publications were selected, covering os coxae, mandibula, and cranium. In addition, the literature was checked for the definition of scores, that is, explicit for- 2.3 | Data assessment mulas to combine the information from several traits into one number. All the trait measurements were taken according to the description in the cited literature using a digital sliding caliper (measuring accuracy T A B L E 1 Metric traits considered in this study stratified by the source publication Source and trait Laterality of 0.02–0.04 mm), though for some of the pelvic measurements, a friction divider was used (see Bruzek et al., 2017 and instructions including pictures in the DSP2 tool). Bilateral traits were measured on both sides. All measured values were entered into a Microsoft Access Gapert et al. (2009) (Basicranium) MLC: maximum length of condyle Bilateral MWC: maximum width of condyle Bilateral EHC: external hypoglossal canal openings distance Unilateral MxID: maximum interior distance between condyles Unilateral BCB: maximum bicondylar breadth Unilateral MnD: minimum distance between condyles Unilateral Helmuth and Rempe (1968) (Atlas and Axis) database, except for the metric pelvic measurements, which were entered into the DSP2 software provided by Bruzek et al. (2017). All data were assessed by one observer (DN). 2.4 | Reference standards To determine the utility and accuracy of a trait group, we needed a Atlasap: anterior–posterior distance Unilateral reference standard for the sex of the individuals. In this paper, we Axisheight: anterior full height Unilateral considered three different basic approaches to obtain such a standard: Özer et al. (2006) (Scapula) GCH: glenoid cavity height Bilateral GCB: glenoid cavity breadth Bilateral 1. Using subjects with a posterior probability for male or female above 0.95 based on DSP2. However, as this tool does not allow Murail et al. (2005) (Os coxae, pelvic) PUM: acetabulo-symphyseal pubic length Bilateral SPU: cotylo-pubic width Bilateral the combining of traits from the left and the right side, we actually obtained two reference standards. 2. Using latent class analysis (LCA) applied to all available traits in DCOX: innominate or coxal length Bilateral IIMT: greater sciatic notch height Bilateral ISMM: ischium post-acetabular length Bilateral threshold probability. LCA is a popular method to find different SCOX: iliac or coxal breadth Bilateral phenotypes in multivariable data. It is based on the simple assump- SS: spino-sciatic length Bilateral tion that there are classes in the population of interest that differ SA: spino-auricular length Bilateral in the distribution of variables. It allows the inclusion of continuous SIS: cotylo-sciatic breadth Bilateral as well as ordinal variables affected by missing values and is there- VEAC: vertical acetabular diameter Bilateral fore well suited to construct a data-dependent reference standard order to obtain a posterior probability for each subject, transforming the probabilities into a reference standard using a specific for sex. It has already previously been used for this purpose Bidmos et al. (2005) (Patella) MAXH: maximum height Bilateral MAXB: maximum breadth Bilateral (e.g., Passalacqua et al., 2013). 3. Using the archeological sex provided by C. Meyer and M. Stecher at the University of Mainz based on sex-specific grave goods typi- Bidmos and Dayal (2003) (Talus) TL: talar length Bilateral TH: talar height Bilateral TW: talar width Bilateral Introna et al. (1997) (Calcaneus) cally observed in early medieval (Franconian) burials (i.e., Brather, 2008; Koch, 1977). In this way, a sex was assigned to 267 individuals with multiple, nonconflicting pieces of information. We, therefore, had four reference standards, none of which was MAXL: maximum length Bilateral optimal. When using DSP2 as a reference to evaluate traits already BH: body height Bilateral included in the DSP2, or correlated with traits included in the DSP2, Note: For each trait, it is indicated whether it is expressed unilateral or bilateral. we may have been too optimistic. With respect to LCA, this applies to all traits used; indeed, the overoptimism may even be increased as the NAVITAINUCK ET AL. 4 TABLE 2 Scores based on metric traits considered stratified by source publication Traits involved Name Formula Cranium a 0.274 * MLC left + 0.286 * MWC right + 0.273 * MnD – 15.548 Gapert et al. (2009) (Basicranium) MLC left, MWC right, MnD BCB, MnD Cranium b 0.224 * BCB + 0.191 * MnD – 15.039 MLC, MWC, BCB, MxID, MnD Cranium c 0.238 * MLC right + 0.288 * MWC right + 0.114 * MLC left + 0.032a MWC left 0.049a BCB + 0.016a MxID + 0.277a MnD – 16.004 MLC left, MWC right, BCB, MnD Cranium d 0.244 * MLC left + 0.253 * MWC right + 0.042 * BCB + 0.253a MnD BCB, MxID, MnD Cranium e 0.198 * BCB + 0.040 * MxID + 0.193 * MnD – 15.235 MnD, MxID Cranium f 0.278 * MnD + 0.170 * MxID – 11.705 Scapula aa (0.143 * GCH) + (0.367 * GCB) – 14.340 Patella aa 0.223 * MAXH + 0.133 * MAXB – 14.581 Talus aa 0.256 * TL + 0.122 * TW + 0.044 * TH – 20.015 Talus ba 0.158 * TL + 0.204 * TH + 0.159 * TW – 20.103 Calcaneus aa 1.96 * MAXL + 2.42 * BH MAXLTal (=TL), MAXWTal (=TW) Talus ca 0.42002 * TL + 0.41096 * TW. BHCalc (=BH), MAXLTal (=TL), MAXWTal (=TW) CalTal aa 0.23126 * BHCalc + 0.31859 * MAXLTal + 0.51311 * MAXWTal 16.110 Özer et al. (2006) (Scapula) GCH, GCB Bidmos et al. (2005) (Patella) MAXH, MAXB Bidmos and Dayal (2003) (Talus) TL, TH, TW Bidmos and Dayal (2004) (Talus) TL, TH, TW Introna et al. (1997) (Calcaneus) MAXL, BH Steele (1976) (Calcaneus and Talus) Note: For each score, the traits involved, a name used in this publication, and the formula published to sum up the measurements are given. Scores that can be applied separately for the left and right side. a TABLE 3 Morphological traits considered stratified by publication Trait Female expression Male expression Glabella (3) Smooth Marked to massive, prominent Arcus superciliaris (2) Smooth (Very) marked, arched Tubera frontalia and parietalia (2) Marked Indistinct to missing Inclinatio frontale (1) Vertical (Strongly) inclined Processus mastoideus (3) (Very) small (Very) large Relief of Planum nuchale (3) Smooth Nuchal lines and occipital crest (with rough surface) Protuberantia occipitalis externa (2) Smooth (Very) marked Processus zygomaticus (3) (Very) thin, low (Very) thick, high Os zygomaticum (2) (Very) low, smooth surface (Very) high, irregular surface Crista supramastoidea (2) (Very) slight-present (Very) marked Margo supraorbitale Forma orbitae (1) Round, (very) sharp border Quadrangular, (very) rounded Overall aspect (3) (Mandibula) (Very) gracile (Very) robust Mentum (2) Small, rounded (Very) prominent Angulus mandibulae (2) Smooth (Strongly) marked eminences Margo (1) Thin Thick Ferembach et al. (1980) (cranium) 5-point scale Ferembach et al. (1980) (Mandibula) 5-point scale Kemkes-Grottenthaler et al. (2002) (Mandibula) 5-point scale Ramus flexure Straight edge Flexure at the level of the occlusal surface of the molars NAVITAINUCK ET AL. TABLE 3 5 (Continued) Trait Female expression Male expression Ferembach et al. (1980) (os coxae, pelvis) 5-point scale Sulcus praeauricularis (3) Deep, (well) delimited Slight-present/absent Incisura ischiadica major (3) (Very) wide, U-shaped (Narrow), (very) V-formed Angulus pubisa (2) (Strongly) obtuse to right angled, rounded (Strongly) acute angled, A-form Arc composé (2) Double curve Single curve Os coxae (2) Low, broad, with expanding ala ossis and slight muscle relief High, narrow with stronger muscle relief Foramen obturatum (2) Triangular, (with sharp rims) Oval, (rounded rim) Corpus ossis ischii (2) (Very) narrow, with less conspicuous Tuber ischiadicum (Very) broad, marked Tuber ischiadicum Crista iliaca (1) (Very) flat, S-formed Accented S-form Fossa iliaca (1) (Very) low, broad (Very) high, narrow Pelvis major (1)a (Very) broad, oval (Very) narrow, heart shaped Development of negative relief (preauricular surface 1) Deep depression well-limited (pits) Relief smooth or very slightly negative relief Aspect of grooves or pitting (preauricular surface 2) Pits or groove with closed circumference Depression with open circumference Development of positive relief on preauricular surface (preauricular surface 3) Lack of tubercle Tubercle present or clear protuberance Proportion of length of sciatic notch chords (sciatic notch1) Posterior chord segment longer than or equal to anterior chord Posterior chord shorter than anterior chord Form of contour notch chords (sciatic notch 2) Symmetry relative to depth in basal portion of sciatic notch Asymmetry relative to depth of sciatic notch Contour of posterior notch chord relative to line from point a to sciatic notch breadth (sciatic notch 3) Outline (contour) of posterior chord does not cross perpendicular line Contour of posterior chord crosses perpendicular line Composite arch, relation between outline of sciatic notch and outline of auricular surface Double curve Single curve Characterization of margo inferior ossis coxae (1) External eversion Direct course of medial part Absence or presence of the phallic ridge (inferior ossis coxae 2) Lack of the phallic ridge or presence of only little mound Clear presence of the phallic ridge Ischiopubic ramus aspect (inferior ossis coxae 3) Gracile aspect Robust aspect Relation between pubis and ischium lengths (ischiopubic proportion) Pubis longer than ischium Ischium longer than pubis Ventral arc Ventral arc on the ventral surface of the pubis Slight ridge on ventral aspect of the pubis Subpubic concavity Subpubic concavity/dorsal aspect shows lateral recurve in the ischiopubic ramus No concavity Ischiopubic ramus Medial aspect of the ischiopubic ramus female: ridge Broad surface Bruzek (2002) (os coxae, pelvis) 3-point scale Phenice (1969) (os pubis) 3-point scale Dar and Hershkovitz (2006) (os sacrum/ilium) 4-point scaleb SIB: sacro-iliac joint bridging Sacro-iliac joint bridging Note: For each trait, the anatomical name or a short description (with the name used in this paper in parantheses) and the typical expression for males and females are given. For the publications, the reference, the bone/anatomical region involved, and the scoring scale are given; 3-point scale refers to a coding conceptually corresponding to male–indefinite–female and 5-point scale to a coding conceptually corresponding to male–probably male– indefinite–probably female–female. The papers may not explicitly use these labels. a These two traits were not included in our analyses, as they could be assessed in 1 or 15 cases only. b The 4-point scale referred conceptually to the states, m, m?, 0, 0? The trait does not allow to come to the decision that a subject is female. NAVITAINUCK ET AL. 6 TABLE 4 Scores based on morphological traits considered stratified by source publication Traits involved Scoring rule Name All traits from Ferembach et al. (1980) cf. Table 3 The scoring rule is based only on the traits discernible. Each single trait is scored as 2, 1, 0, 1, or 2. The overall score is defined as the weighted sum of the trait specific scores (weights shown in Table 1) divided by the sum of the weights of the discernible traits. Ferembach score All traits from Bruzek (2002) cf. Table 3 For the three traits, preauricular surface, sciatic notch, and inferior ossis coxae, the three subtraits are reduced to a male (1)–indefinite (0)–female ( 1) decision based on a simple majority rule. Then a summary score is built over of these three traits, composite arch and ischiopubic proportion. Bruzek score Phenice-defined traits ventral arc, subpubic concavity, ischiopubic ramus The three traits are each scored as 1, 0, and 1. The paper does not include an explicit score but suggests a more informal majority rule. In the case of three traits scored on a 3-point scale, this is however close to considering the sum (or average) over the three traits. Phenice score Note: For each score, the traits involved, a description of the scoring rule, and a name used in this paper are given. construction of the reference standard is already based on our data. trait is measurable and the average number of measurable traits However, LCA is based on many traits, so that the effect for a single within the individuals with at least one measurable trait, which we trait may be negligible. The archeological reference standard does not refer to as the “effective number of traits.” The first aspect is depicted suffer from this problem, as it is completely based on external by computing an accuracy in the traditional sense: a subject is reg- information (grave goods). Conversely, DSP2 and the archeological arded as male if the cross-validated posterior probability is above standard provided a sex estimation in less than 50% of all individuals 50%; otherwise, it is female. This decision is then compared with the in our series, whereas LCA has the potential to define a reference reference standard, and the rate of agreement is reported as accuracy. In addition, the rate of misclassified subjects among those with a standard for a larger number of subjects. Consequently, we defined our final reference standard by com- posterior probability above 95% is reported. This number serves two bining all four reference standards. As each assigns a sex to only a purposes. First, the misclassification rate can be much less than 5%, if subset of all individuals, which of the four reference standards is actu- there are many subjects with a posterior probability higher than 95%. ally applicable to a particular individual must be taken into account. The misclassification rate can therefore inform us about the additional Therefore, the overall reference standard is “male,” if all reference potential of a trait group. Conversely, the computation of a posterior standards applicable to an individual indicate “male,” and “female” is probability depends on the assumption of a normal distribution of the defined in an analogous manner. The reference standard remains discriminant score. The actual distribution may differ from a normal undefined if none of the four reference standards is applicable or if distribution. For example, metric traits with skewed distributions or they result in contradicting classifications. affected by outliers may result in a distribution with heavier tails, or the distribution may be very discrete in the case of a single or only a few morphological traits. Then the misclassification rate may be above 2.5 | Computation of utilities and accuracies 5%, informing us about a violation of this assumption and the potential invalidity of the posterior probabilities. Misclassification rates are For any group of traits, the utility is determined in the following way: only reliable if they can be based on a sufficient number of subjects; for each subgroup of traits appearing as measurable traits in at least therefore, the computation is restricted to trait groups with at least one individual, linear discriminant analysis (LDA) is applied to deter- 100 individuals with a posterior probability above 95%. mine a new discriminant score, that is, a weighted sum of the measurements. Based on this score, the posterior probability of each individual (with this pattern of measurable traits) to be male is com- 2.6 | Analytic strategy puted. We combine this with cross validation by only using the data of the remaining individuals to determine the posterior probability for First, we checked the validity of the final reference standard by com- each single individual. In this way, we obtain a posterior probability paring the agreement between the four basic reference standards. for any individual for whom at least one trait is measurable, that is, a We then considered the utility of single traits and scores identified in flexible approach. The utility is then defined by the fraction of individ- the selected literature, with the central investigation considering the uals reaching a posterior probability above 95% or below 5%, utility of trait groups. Trait groups defined by an anatomical region or reflecting a cautious approach to sex estimation. previously considered in the literature were investigated before we The utility of a trait group depends on both the accuracy of the finally considered ensembles of traits by selecting a few promising traits involved and their measurable frequency. The second aspect is traits from different anatomical regions, allowing us also to combine depicted by reporting the number of individuals for which at least one morphological with metric traits. NAVITAINUCK ET AL. 7 Potential differences in population distributions when comparing variables are measured. Correlation matrices that failed to be positive our study population and the populations used in the selected publica- semidefinite were replaced by the least-squares positive-semidefinite tions are investigated in Appendix S1 for all traits. To evaluate the approximation. possible impact of differences in population distributions on sex esti- We have used LDA to develop a rule for sex estimation for a mation rules based on requiring a 95% posterior probability, the distri- given ensemble of traits due to its popularity in this field. In fact, bution of the posterior probabilities computed by the DSP2 tool in a recent systematic comparison also found no distinct advantages in our population is depicted in Appendix S2. using alternative approaches (Nikita & Nikitas, 2020). | 2.7 2.7.3 Details of statistical methods | Cross validation All computations were performed in Stata 15.1. A 10-fold cross validation was used. 2.7.1 | standard 3 Use of LCA to construct a reference | 3.1 RE SU LT S | Measurements All unilateral and bilateral metric traits and all morphological traits were used as input to the LCA. Because LCA assumes conditional Out of the 675 adult individuals, at least one of the metric traits was independence of the input variables used given the true sex, all bilat- measurable in 534, and at least one morphological trait was measur- eral traits entered this analysis once using the average of the two able in 597. In 621 individuals, at least one metric or one morphologi- measurements if both are available. Nevertheless, the resulting poste- cal trait was measurable. Information on the distribution of the single rior probabilities may be too optimistic, as we still have to expect cor- traits can be found in the Tables S4 and S5. relations among the traits beyond what we can explain by sex. To take this into account, threshold probabilities of 0.999 and 0.001, respectively, were used. The two classes obtained by LCA were com- 3.2 | Construction of reference standards pared with the archeological reference standard, and the class with the higher frequency of males (according to the archeological sexing) The application of LCA with the number of classes fixed to two obtained the label “male.” resulted in estimated class prevalences of 49.7% and 50.3%, corresponding with the expectation that the classes should represent sex. A total of 473 individuals obtained a posterior probability above 2.7.2 | Linear discriminant analysis 0.999 or below 0.001, respectively. Table 5 shows the distribution of the two sexes according to the different basic reference standards Metric traits entered the LDA with their actual measurements. Mor- and the agreement between the four basic reference standards. The phological traits entered the LDA after recoding to 1, 0, 1, 2 or agreement was nearly perfect. The final overall reference standard 1, 0, 1. Bilateral traits entered with both measurements if they identified 253 males and 263 females (cf. Table S6). DSP2 identified were available. In computing correlation matrices, as input to the LDA, more males than females, whereas the archeological sexing the available pair approach is used, that is, each single correlation identified more females than males. The first reflects the higher popu- between two variables is based on all subjects for whom both lation variation in males than in females in the reference sample used to 2, T A B L E 5 The number of males and females identified according to the different reference standards (left part) and the agreement between the reference standards (right part) Number of DSP right Agreement Females Males 50 113 DSP right DSP right DSP left LCA Archeological 100% 98.1% 98.5% 99.3% 100% DSP left 58 88 DSP left 92/92 LCA 238 235 LCA 158/161 144/145 Archeological 160 107 Archeological 66/67 62/62 99.6% 222/223 Note: The agreement is expressed as a percentage in the upper triangle and as the absolute number of agreeing individuals related to the absolute number of individuals for whom both reference standards could be applied in the lower triangle. Abbreviations: DSP, diagnose sexuelle probabiliste; LCA, latent class analysis. NAVITAINUCK ET AL. 8 to develop the DPS2 tool. This implies that large measurements are This reflects the discrete nature of these traits, which implies a poor more reliable in pointing to males than small measurements in normal approximation to the discriminant score distribution. To pointing to females. The second probably indicates that females tend evaluate the utility of these traits in a more specific manner, Table 9 to more often have a distinct, sex-specific pattern in their grave goods presents the exact posterior probabilities for those traits with an accu- than males. racy above 0.8 and a utility of at least 0.2, according to Table 7. All traits assigned a posterior probability above 95% to some individuals, but only for incisura ischiadica major this was the case for more than 3.3 | scores The utility of previously considered traits and 250 individuals. Two further traits (Arcus superciliaris, sciatic notch 3) assigned a posterior probability above 90% to more than 250 individuals, and this holds for three further traits (Arc compose, Glabella, sci- A complete list of the utilities for all traits is provided in Table S1. None of the unilateral traits reached an accuracy above 0.8, and all the utilities stayed below 0.05, reflecting the maximally moderate atic notch 2) when using a threshold of 85%. This underlines the high utility of these traits, at least when lowering the limit of a 95% posterior probability, which is very demanding for a single, discrete trait. accuracy and/or the low number of individuals in which the trait was Table 10 shows the results for the three existing morphological measurable. The highest utility of 0.04 was reached for the height of scores. All three scores have a high accuracy, and two have a remark- the second cervical vertebra (Axisheight), which was measurable in able high utility above 0.4. The high utility of the Ferembach and more than a third of the adult individuals with an accuracy of 0.78. Bruzek scores reflects that these do not require all traits to be In contrast, several bilateral traits with accuracies above 0.8 were measurable—they just use the information available. identified, and in three cases, a utility above 0.2 was reached; that is, the sex was estimated in more than 20% of the individuals with a posterior probability above 95% (see Table 6). These are two measure- 3.4 | The utility of trait groups ments from the scapula (glenoid cavity height [GCH] and glenoid cavity breadth [GCB]) and one pelvic measurement, the vertical ace- Table 11 shows the utility of some trait groups. We started by consid- tabular diameter (VEAC). These more favorable results for bilateral ering the different single skeletal regions covered by our traits and traits reflect the higher degree of accuracy, as well the necessity for observed the highest utility for the morphological traits of the cranium only one measurement from a single side to perform a sex estimation. (0.62) and the pelvis (0.59) and for the metric traits of the pelvis Table 7 shows the results for the metric trait scores. In spite of (0.47). These trait groups showed accuracies above 0.90; however, accuracies close or above 0.8, we always observed rather low utilities. the high utilities are additionally due to the fact that we were able to All these scores required several traits to be measurable, and for most determine at least some of the traits of each group in many individ- scores, this was fulfilled in less than 100 individuals. The most promi- uals. This point is nicely illustrated by the four scapula traits, which nent exception is the score Scapula a, applicable in nearly 200 subjects enabled to define a rule with an accuracy of 0.92. However, it was and reaching a utility above 0.2. Table 8 shows the results for those single morphological traits that reached an accuracy of 0.8 and above, with some reaching a utility above 0.4. Many of the traits are part of either the Bruzek (2002) approach or the Ferembach et al. (1980) score. However, we also observed that the misclassification rates were often above 0.05. TABLE 6 Trait TABLE 7 Utilities of all metric scores Score Utility N Accuracy Misclassification rate Unilateral scores Cranium a 0.01 72 0.76 - Cranium b 0.02 69 0.77 - Utilities of all metric traits with an accuracy over 0.8 Cranium c 0.01 38 0.84 - Utility Missclassification rate Cranium d 0.01 57 0.81 - Cranium e 0.02 53 0.87 - 0.02 57 0.85 - N Accuracy GCH 0.23 265 0.91 0.01 GCB 0.24 249 0.91 0.03 Cranium f Bilateral scores SPU 0.14 143 0.94 0.01 DCOX 0.03 75 0.85 - Scapula a 0.21 191 0.93 0.03 0.02 63 0.78 - ISMM 0.17 164 0.93 0.03 Patella a VEAC 0.28 349 0.91 0.02 Talus a 0.02 59 0.86 - 0.02 59 0.79 - MAXB 0.05 138 0.83 - Talus b TL 0.13 295 0.87 0.04 Talus c 0.05 116 0.81 - - Calcaneus a 0.04 92 0.80 - CalTal a 0.02 63 0.85 - TH 0.01 89 0.81 Note: See Table S1 for complete table of results. N: number of individuals for whom the trait was measurable. Note: N: number of individuals for whom the score could be computed. NAVITAINUCK ET AL. 9 T A B L E 8 Utilities of morphological traits with an accuracy above 0.8 Trait Utility N Accuracy Misclassification rate Glabella 0.31 358 0.87 0.06 Arcus superciliaris 0.60 412 0.92 0.08 Forma orbitae 0.15 360 0.83 0.12 Ventral arc 0.20 137 0.91 0.09 Sciatic notch 1 0.42 288 0.89 0.11 Sciatic notch 2 0.42 288 0.90 0.10 Sciatic notch 3 0.42 286 0.91 0.09 Composite arch 0.03 428 0.85 - Sulcus praeauricularis 0.40 406 0.85 0.06 Incisura ischiadica major 0.40 389 0.90 0.03 Corpus ossis ischia 0.27 182 0.91 0.08 Arc composé 0.23 427 0.85 0.05 Preauricular surface 1 0.25 409 0.86 0.05 Preauricular surface 2 0.20 396 0.85 0.03 Preauricular surface 3 0.13 271 0.81 0.01 Note: See Table S2 for complete list of all traits. N: number of individuals for whom the trait was measurable. TABLE 9 Exact frequencies of the traits shown in Table 6 and the associated empirical posterior probabilities Trait Result Absolute frequency PP (female) Result Glabella f 62 0.98 m 18 0.95 f/f? 164 0.85 m/m? 120 0.92 f 45 0.98 m 33 1.00 f/f? 184 0.91 m/m? 156 0.93 f 9 1.00 m 8 1.00 f/f? 71 0.91 m/m? 92 0.92 f 95 0.97 m 52 0.93 f/f? 177 0.85 m/m? 165 0.85 Incisura ischiadica major f 99 0.98 m 33 1.00 f/f? 182 0.86 m/m? 155 0.96 Sciatic notch 1 f 120 0.89 m 128 0.88 Sciatic notch 2 f 120 0.92 m 133 0.89 Sciatic notch 3 f 125 0.91 m 126 0.91 Preauricular surface 1 f 152 0.95 m 182 0.78 Preauricular surface 2 f 128 0.97 m 179 0.77 Sulcus praeauricularis f 47 1.00 m 96 0.94 f/f? 151 0.94 m/m? 179 0.79 Arcus superciliaris Corpus ossis ischia Arc compose Absolute frequency PP (male) Note: “Result” refers to the sex assigned according to the expression of the trait. The posterior probabilities are the fractions of individuals with the correct given sex among those with the result shown. Abbreviation: PP, posterior probability. TABLE 10 Utilities of all morphological scores Score Utility N Number of traits Effective number of traits Accuracy Missclassification rate Ferembach et al. (1980) 0.73 597 25 9.8 0.95 0.01 Bruzek (2002) 0.42 283 11 4.0 0.97 0.03 Phenice (1969) 0.18 121 3 0.6 0.97 0.03 Note: N: number of individuals for whom the score was computable. NAVITAINUCK ET AL. 10 TABLE 11 Utilities of metric and morphological trait groups and trait ensembles Utility N Number of traits Effective number of traits Accuracy Misclassification rate Cranium (morph) 0.62 530 11 6.8 0.90 0.05 Os occipitale (metric) 0.08 358 8 3.7 0.66 0.12 Mandibula (morph) 0.01 484 5 3.5 0.71 - Cervical vertebra (metric) 0.07 290 2 1.5 0.80 0.04 Scapula (metric) 0.30 315 4 2.3 0.92 0.01 Trait groups Region specific trait groups Pelvis (metric) 0.47 415 20 7.4 0.91 0.01 Pelvis (morph) 0.58 455 22 10 0.92 0.04 Patella (metric) 0.08 155 4 1.8 0.86 - Talus (metric) 0.14 308 6 2.3 0.85 0.05 Calcaneus (metric) 0.08 203 4 2.0 0.77 - Ferembach 0.55 450 8 3.8 0.92 0.03 Bruzek 0.53 443 11 6.0 0.89 0.05 Phenice 0.22 149 3 2.6 0.94 0.06 Cranium (morph) + Pelvis (morph) + Mandibula (morph) (Original proposal of Ferembach et al., 1980) 0.72 598 24 11.7 0.95 0.02 Subsets of morphological pelvis traits Ensembles of selected traits from different regions Cranium (morph) + Pelvis (morph) 0.70 544 6 4.0 0.95 0.02 Cranium (morph) + Pelvis (morph) + Mandibula (morph) 0.73 597 16 9.9 0.95 0.01 Cranium (morph) + Pelvis (morph) + Pelvis (metric) 0.71 556 12 7.0 0.95 0.01 Cranium (morph) + Pelvis (morph) + Scapula (metric) 0.74 561 14 7.3 0.95 0.02 Cranium (morph) + Pelvis (morph) + Pelvis (metric) + Scapula (metric) 0.74 563 16 8.2 0.96 0.01 Cranium (morph) + Pelvis (morph) + Scapula (metric) + Talus (metric) 0.74 574 16 7.9 0.95 0.01 Cranium (morph) + Pelvis (morph) + Mandibula (morph) + Scapula (metric) 0.74 583 16 8.5 0.95 0.01 Cranium (morph) + Pelvis (morph) + Pelvis (metric) + Scapula (metric) + Talus (metric) 0.72 568 19 8.1 0.97 0.01 Cranium (morph) + Mandibula (morph) + Pelvis (morph) + Pelvis (metric) + Scapula (metric) 0.75 590 17 8.8 0.96 0.01 All morphological traits 0.73 598 29 12.9 0.94 0.02 All metric traits 0.56 534 48 13.0 0.93 0.02 All metric and morphological traits 0.76 616 77 23.8 0.98 0.01 Large ensembles Note: See Table S3 for the traits included in each trait group. N: number of individuals for whom at least one trait of the trait group was measurable. applicable in only about 50% of all subjects, resulting in a utility of just 0.30. An essential step, however, may be to consider rules combining different regions, as we can then increase the likelihood of measuring The trait groups with a high utility already consist of 10 or more at least some traits in numerous individuals. Ferembach et al. (1980) traits; therefore, it is of interest to find subsets with fewer traits but already suggested combining morphological traits from the pelvis, the still with a high utility. Bruzek (2002), Ferembach et al. (1980), and cranium, and the mandible within an ensemble. Indeed, for this Phenice (1969) included subsets of pelvic traits in their proposal. The ensemble, we obtained a utility of 0.72 based on 24 traits. We tried to first subset allowed us to still reach a utility of 0.52 involving only extend this idea to various combinations of metric and morphological 11 traits, and the second 0.55 based on eight traits. The three traits traits from different anatomical regions, with the traits within each considered by Phenice are not sufficient to reach a high utility. anatomical region being selected based on their accuracy. In these NAVITAINUCK ET AL. 11 ensembles, we observed utilities from 0.70 up to 0.75. (The traits single trait is measurable. Consequently, they may have a high accu- selected for each ensemble are listed in Table S3.) racy but nevertheless a limited utility. Finally, to explore the maximum utility to be reached, we considered all the metric, all the morphological, and all the traits together as ensembles, giving us utility totals of 0.76 (all traits), 0.73 4.2 | Generalizability and limitations (all morphological traits), and 0.55 (all metric traits). The maximal utility of 0.76 implies that for a quarter of the skeletal Our results on the utility are based on one series of archeological skele- individuals, no sex estimation was possible. It is interesting to note that tons from a fully excavated early medieval site dating from the sixth to among the corresponding 160 individuals, only in 59 were all traits eighth century AD and the computation of posterior probabilities using immeasurable. A total of 43 were measurable in traits from only one of data from this site. We can expect similar results if we consider other the 10 anatomical regions considered in Table 11 and 17 in two series, if the traits in such a series have a similar accuracy and a similar regions. However, even 41 were measurable in traits from at least three rate of being measurable, and if the series is large enough to develop a different regions and therefore had a sound basis for a sex diagnosis. series specific reference standard, for example, based on a LCA. To summarize, we observed that when combining between With respect to determining the utility of trait groups, we made 12 and 19 traits from three or more anatomical regions in one ensem- use of a rather simple statistical approach to compute posterior prob- ble, we came very close to the optimal utility reachable when consid- abilities, namely, LDA. It cannot be excluded that, in particular in the ering all traits. case of large trait groups, further improvements are possible by using more sophisticated techniques. For the single traits, we did not investigate aspects like the intra- and interobserver variability or the influ- 4 | DISCUSSION ence of age at death in detail, which would be essential when discussing the value of single traits for sex estimation. Instead, we To the best of our knowledge, this is the first paper presenting a sys- focused on the utility of their combined use in daily practice. tematic investigation of the utility of combining metric and morpho- The numerical values of the utility depend on the threshold used logical traits from different anatomical regions. In order to assess the for the posterior probability—in our case, the value of 95%. Other utility (or usefulness) of an ensemble of traits, we considered a simple choices will lead to different values, but it is unlikely that this will measure: the probability of arriving at a sex estimation with a poste- change the overall results. Recently, Jerkovic et al. (2020) suggested rior probability above 95%. In our investigation, we initially studied working with lower thresholds to increase the number of individuals the utility of a variety of morphological and metric traits and several with a sex estimation and to aim for an overall accuracy of 95% at a scores suggested in the literature before we started investigating population level, instead of at an individual level. This would imply ensembles of traits built across different anatomical regions and/or a different definition of utility resulting in higher numerical values. combining morphological and metric traits. 4.1 | The benefits of using traits from different anatomical regions 4.3 | Using rules developed in external reference populations Nonetheless, if we are interested in single individuals or small series, Our investigation corroborates the simple expectation that combining we are still confronted with the question of how well a rule developed traits from different anatomical regions increases the chance in in one series will act in another series (Bašic et al., 2017; Sierp & obtaining a definitive sex estimation: If several regions are involved, Henneberg, the likelihood in observing a sufficient number of traits in an individual (cf. Bruzek & Murail, 2006), is not addressed by our considerations on increases. In our specific situation, we already achieved the maximally the utility of ensembles of traits; however, our additional investiga- possible utility of 0.76 when combining traits from three regions. Sin- tions presented in Appendices S1 and S2 do shed some light on this gle traits rarely reach such high utilities. This also holds for many pub- question. 2015). This question, regarding external validity lished scores combining several traits. When considering traits from First, we systematically investigated population differences single anatomical regions, the maximum utility was 0.47 for metric between our population and those used in the original publications. traits and 0.62 for morphological traits. Overall, these results can be For some metric and morphological traits, we observed a nearly per- seen as an extension of a recent finding by Nikita and Nikitas (2020), fect match. However, the most frequent pattern was a general shift in who observed a substantial increase in accuracy when combining pel- the population distribution, that is, a difference in the mean values vic and cranial traits. affecting both sexes to a similar degree. The direction of the shift was, Making efficient use of combining traits from different regions in the majority of cases, more towards “male” phenotypes compared requires to use flexible rules that assign a sex estimate also if some of with the original population. This may point to a generally more robust the traits are not measurable. This is the basic limitation of several anatomy of the Mannheim population when measured against published scores based on several traits, as they require that each populations in the original publications, which were mainly from NAVITAINUCK ET AL. 12 Europe or had at least a European ancestry (South African Whites). traits as their assessment is more subjective, potentially counter- Alongside population specificity of sexual dimorphism, which depends balancing the higher utility (cf. Santos et al., 2019; Sierp & on environmental factors (Bejdov a et al., 2013), temporal consistency Henneberg, 2015). Nevertheless, our population comparisons rev- is a crucial factor, because secular trends highly impact on the skeletal ealed no basic difference between metric and morphological traits morphology (i.e., Guyomarc'h et al., 2016; Walker, 2008). with respect to the probability of arriving at a match between the Such population differences, however, need not reflect true dif- actual and original population. Because we considered a skeletal pop- ferences in phenotypes as they may be due to differences in the appli- ulation with a relatively good state of preservation, we may, however, cation of the measurement methods. For example, the observer of have overestimated the value of morphological traits. Metric traits are this study (DN) tended to regard fewer individuals as indefinite for often seen as extending the possible range of estimation in poorly many of the morphological traits than in the original publications. preserved individuals, but this was not necessary in many of the indi- Second, we investigated the robustness of the posterior probabili- viduals in our population. ties computed by DSP2. In contrast to our study, DSP2 was developed based on a population pooled from many different samples, also taking into account interpopulation differences. Consequently, the standard deviations used should be larger than 4.5 | Can we estimate the sex of all skeletal individuals? those in any single population. We could corroborate this expectation with respect to our population: for all of the 16 single traits covered A very poor state of preservation will always limit the possibility of by DSP2, the population standard deviation in our sample was smaller performing sex estimations with a high degree of confidence. That than in the original sample, and the average reduction was 18.3%. The said, our investigations suggest that even for a small fraction of indi- combination of a cautious rule with such artificially increased popula- viduals (here about 40 out of 675), sex estimation may remain uncer- tion standard deviations should limit the likelihood of achieving false tain, in spite of the sound basis provided by the availability of traits sex estimations in spite of population differences for the single traits. from several anatomical regions. Indeed, we observed this robustness for the DSP2: in spite of popula- Of course, the number of uncertain individuals also depends on tion differences (all in the same direction) for four out of the eight the threshold used for posterior probability. The alternative bilateral traits covered by DSP2, we observed few misclassifications. approaches mentioned above will hence lead to different values. However, whether this property of DSP2 holds in general requires further investigations. | 4.6 4.4 | Which traits should be used? Outlook We have provided clear evidence that combining traits from different anatomical regions and allowing the combination of morphological The desirable features of traits to be used for sex estimation like ade- and metric traits is a useful strategy to allow sex estimation in as many quate population variability, low intra- and interobserver variation, individuals as possible with high confidence and limited efforts; how- validity over a wide age-at-death range, and nondestructive and ever, the question remains about which regions should be preferred objective assessment have been widely discussed in the literature and how we can arrive at sex estimation methods applicable across a (Boldsen et al., 2015; Bruzek & Murail, 2006; Ferembach et al., 1980; wide range of populations. The ranking of anatomical regions and the Inskip et al., 2019; Krüger et al., 2017; Rösing et al., 2007, etc.). By ranking of traits within regions would be helpful to transfer considering the traits proposed in previous publications, we implicitly the insights from this investigation into concrete guidance for future assumed that they fulfilled such basic requirements. Indeed, the focus practice. In the absence of a large population enabling the develop- of our paper was less on single traits and more on the utility of com- ment of population-specific discriminant scores, there is a need to bining traits from different anatomical regions. However, our investi- determine the most suitable traits for sex estimation across gation is limited by the a priori selection of specific anatomical populations and to develop tools similar to DPS2 to allow posterior regions, which do not cover all regions with an established value for probabilities for any measurable subset of these traits to be com- sex estimation. For example, some traditional measurements, such as puted. These tasks require access to large-scale anthropological data- the joints of the long bones (e.g., femur), were intentionally not cov- bases covering multiple populations and a broad range of traits. As ered in order to focus on less frequently used features. Within the shown in this investigation, it might not be necessary for the sex of metric traits considered, the pelvis and the scapula showed the individuals to be known. the highest utility. Within the morphological traits considered, the cranium and the pelvis showed the highest utility. One remarkable result of this investigation was the higher utility 5 | CONC LU SION of morphological compared with metric traits. Though the value of morphological traits for sex estimation is widely recognized among Assessing biological sex—one of the most important pieces of infor- anthropologists, they are often regarded as less valuable than metric mation about an individual—in human skeletons is a crucial step within NAVITAINUCK ET AL. 13 forensic and bioarchaeological research. When selecting a limited number of traits for systematic sex estimation in a series of skeletal remains, it is advisable to select traits from different anatomical regions in order to obtain a reasonably sure result in as many individuals as possible. In our approach, the maximally possible utility of 0.76 was reached when combining traits from three anatomical regions, whereas single traits never reach such high utilities. The development of more precise recommendations about the choice of regions and the applicability of rules developed in external reference populations requires large scale investigations, covering all relevant anatomical regions and involving a wide range of populations. ACKNOWLEDGMENTS The study was funded by a scholarship of Studienstiftung des Deutschen Volkes. The authors thank the reviewers for their helpful comments. CONF LICT OF IN TE RE ST The authors declare that there is no conflict of interest. DATA AVAI LAB ILITY S TATEMENT Data available on request from the authors ORCID Denise U. Navitainuck Werner Vach Kurt W. Alt Jörg Schibler https://orcid.org/0000-0002-6082-0778 https://orcid.org/0000-0003-1865-8399 https://orcid.org/0000-0001-6938-643X https://orcid.org/0000-0003-2290-3553 RE FE R ENC E S Albanese, J., Cardoso, H. F. V., & Saunders, S. R. (2005). Universal methodology for developing univariate sample-specific sex determination methods: An example using the epicondylar breadth of the humerus. Journal of Archaeological Science, 32, 143–152. Alt, K. W., Rieger, S., Vach, W., & Krekeler, G. (1995). Odontometrische Geschlechtsbestimmung. Evaluierung frühmittelalterlicher Bestattungen. Zeitschrift für Rechtsmedizin, 5(3), 82–87. Bašic, Ž., Kružic, I., Jerkovic, I., Anđelinovic, D., & Anđelinovic, Š. (2017). Sex estimation standards for medieval and contemporary Croats. Croatian Medical Journal, 58(3), 222–230. https://doi.org/10.3325/ cmj.2017.58.222 Bejdova, Š., Krajíček, V., Velemínska, J., Hor ak, M., & Velemínský, P. (2013). Changes in the sexual dimorphism of the human mandible during the last 1200 years in Central Europe. Homo, 64, 437–453. Bidmos, M. A., & Dayal, M. R. (2003). Sex determination from the Talus of South African Whites by discriminant function analysis. American Journal of Forensic Medicine and Pathology, 24, 322–328. Bidmos, M. A., & Dayal, M. R. (2004). Further evidence to show population specificity of discriminant function equations for sex determination using the talus of South African Blacks. Journal of Forensic Sciences, 49, 1165–1170. Bidmos, M. A., Steinberg, N., & Kuykendall, K. L. (2005). Patella measurements of South African Whites as sex assessors. Homo, 56, 69–74. Boldsen, J. L., Milner, G. R., & Boldsen, S. K. (2015). Sex estimation from modern American humeri and femora, accounting for sample variance structure. American Journal of Physical Anthropology, 158(4), 745–750. https://doi.org/10.1002/ajpa.22812 Brather, S. (2008). Kleidung, Bestattung, Identität. Die Präsentation sozialer Rollen im frühen Mittelalter. In S. Brather (Ed.), Zwischen Spätantike und Frühmittelalter. Archäologie des 4. bis 7. Jahrhunderts im Westen (pp. 237–273). Berlin; New York: Walter de Gruyter. Bruzek, J. (2002). A method for visual determination of sex, using the human hip bone. American Journal of Physical Anthropology, 117, 157–168. Bruzek, J., & Murail, P. (2006). Methodology and reliability of sex determination from the skeleton. In Forensic Anthropology and Medicine (pp. 225–242). Totowa, NJ: Humana Press. Bruzek, J., Santos, F., Dutailly, B., Murail, P., & Cunha, E. (2017). Validation and reliability of the sex estimation of the human os coxae using freely available DSP2 software for bioarchaeology and forensic anthropology. American Journal of Physical Anthropology, 164, 440–449. https:// doi.org/10.1002/ajpa.23282 Buikstra JE, Ubelaker DH. (ed.) 1994. Standards for data collection from human skeletal remains. Arkansas Archeological Survey Research Series 44. Christensen, A. M., Passalacqua, N. V., & Bartelink, E. J. (2014). Sex estimation. In Forensic Anthropology. Current Methods and Practice (pp. 199–222). Academic Press. Dar, G., & Hershkovitz, I. (2006). Sacroiliac joint bridging: Simple and reliable criteria for sexing the skeleton. Journal of Forensic Sciences, 51, 480–483. Ferembach, D., Schwidetzky, I., & Stloukal, M. (1980). Recommendation for age and sex diagnoses of skeletons. Journal of Human Evolution, 9, 517–549. Gapert, R., Black, S., & Last, J. (2009). Sex determination from the occipital condyle: Discriminant function analysis in an eighteenth and nineteenth century British sample. American Journal of Physical Anthropology, 138, 384–394.  n-Torres, M., Martín-Francés, L., Martínez de Garcia-Campos, C., Martino Pinillos, M., Modesto-Mata, M., Perea-Pérez, B., Zanolli, C., Labajo alez, E., Sanchez Sanchez, J. A., Ruiz Mediavilla, E., Tuniz, C., & Gonz Bermúdez de Castro, J. M. (2018). Contribution of dental tissues to sex determination in modern human populations. American Journal of Physical Anthropology, 166, 459–472. Guyomarc'h, P., Velemínska, J., Sedlak, P., Dobisíkova, M., Svenkrtova, I., & Bruzek, J. (2016). Impact of secular trends on sex assessment evaluates through femoral dimensions of the Czech population. Forensic Science International, 262. 284.e1-284.e6 Hansen, J. L., & Alt, K. W. (2012). An exceptional case of dental calculus in a Merovingian skeleton from Mannheim-Seckenheim. Bulletin of the International Association for Paleodontology, 6, 70–75. Helmuth, H., & Rempe, U. (1968). Über den Geschlechtsdimorphismus des Epistropheus beim Menschen. Zeitschrift für Morphologie und Anthropologie, 59, 300–321. Henke, W. (1977). On the method of discriminate function analysis for sex determination of the skull. Journal of Human Evolution, 6, 95–100. Hora, M., & Sladek, V. (2018). Population specificity of sex estimation from vertebrae. Forensic Science International, 291(279). e1-279. e12 Inskip, S., Scheib, C. L., Wohns, A. W., Ge, X., Kivisild, T., & Robb, J. (2019). Evaluating macroscopic sex estimation methods using genetically sexed archaeological material: The medieval skeletal collection from St John's Divinity School, Cambridge. American Journal of Physical Anthropology, 168(2), 340–351. https://doi.org/10.1002/ajpa.23753 Introna, F., di Vella, G., Campobasso, C. P., & Dragone, M. (1997). Sex determination by discriminant analysis of Calcanei measurements. Journal of Forensic Sciences, 42, 725–728. Jerkovic, I., Bašic, Ž., Anđelinovic, Š., & Kružic, I. (2020). Adjusting posterior probabilities to meet predefined accuracy criteria: A proposal for a novel approach to osteometric sex estimation. Forensic Science International, 311, 110273. Kemkes-Grottenthaler, A., Löbig, F., & Stock, F. (2002). Mandibular ramus flexure and gonial eversion as morphologic indicators of sex. Homo, 53, 97–111. NAVITAINUCK ET AL. 14 Koch, U. (1977). Das Reihengräberfeld bei Schretzheim (1st ed.). Berlin: Mann. Koch, U. (2004). Das merowingerzeitliche Gräberfeld im Hermsheimer Bösfeld, Mannheim-Seckenheim. Archäologische Ausgrabungen in Baden-Württemberg, 2003, 155–157. Koch, U. (2007). Mannheim unter fränkischer Herrschaft. Die merowingerzeitlichen Grabfunde aus dem Stadtgebiet. Die rechtsrheinischen Gebiete im fränkischen Merowingerreich. In Probst (Ed.), Die Frankenzeit: Der archäologische Befund. Teil 1. Aus der Mannheimer Namenkunde (Vol. 2) (pp. 10–15). Pustet (Mannheim vor der Stadtgründung): Regensburg, Verlag Friedrich Pustet. Koch, U. (2013). Das merowingerzeitliche Gräberfeld auf dem Hermsheimer Bösfeld. Chancen und Aufgaben. In S. Brather & D. L. Krausse (Eds.), Fundmassen. Innovative Strategien zur Auswertung frühmittelalterlicher Quellenbestände (pp. 51–64). Baden-Württemberg 97, Darmstadt: Materialh. Arch. Koch, U., & Wirth, K. (2005). Gefolgschaftskrieger des fränkischen Königs – das Gräberfeld auf dem Hermsheimer Bösfeld in MannheimSeckenheim. Archäologische Ausgrabungen in Baden-Württemberg, 2004, 199–202. Krüger, G. C., L'Abbé, E. N., & Stull, K. E. (2017). Sex estimation from the long bones of modern South Africans. International Journal of Legal Medicine, 131(1), 275–285. https://doi.org/10.1007/s00414-0161488-z. Epub 2016 Nov 8. PMID: 27826647 Link, T. (2003). Zwischen Adlern und Hamstern: fränkische Gräber im Hermsheimer Bösfeld, Mannheim-Seckenheim. Archäologische Ausgrabungen in Baden-Württemberg, 2002, 163–165. Lovejoy, C. O., Meindl, R. S., Pryzbeck, T. R., & Mensforth, R. P. (1985). Chronological metamorphosis of the auricular surface of the ilium: A new method for the determination of adult skeletal age at death. American Journal of Physical Anthropology, 68, 15–28. Meindl, R. S., & Lovejoy, C. O. (1985). Ectocranial suture closure: A revised method for the determination of skeletal age at death based on the lateral-anterior sutures. American Journal of Physical Anthropology, 68, 57–66. Meyer, C., & Alt, K. W. (2012). Die Steinkistengräber vom Hermsheimer Bösfeld, Mannheim-Seckenheim: Bioarchäologische Charakterisierung der menschlichen Skelettfunde eines frühmittelalterlichen Gräberfeldes. In Krohn und Koch. (Ed.), Grosso Modo. Quellen und Funde aus Spätantike und Mittelalter; Festschrift für Gerhard Fingerlin (Vol. 6) (pp. 165–179). 1. Aufl. Weinstadt: Greiner, Bernhard A. Meyer, C., Wirth, K., & Alt, K. W. (2014). Gold, Gewalt und Gebrechen. Die Beziehung zwischen sozialem Status und traumatischem Skelettbefund im frühen Mittelalter am Beispiel des Hermsheimer Bösfelds, Mannheim-Seckenheim. In T. Link & H. Peter-Röcher (Eds.), Gewalt und Gesellschaft. Dimensionen der Gewalt in ur- und frühgeschichtlicher Zeit (Vol. 259) (pp. 65–79). März 2013 an der JuliusMaximilians-Universität Würzburg. Universitätsforschungen zur Prähistorischen Archäologie: Internationale Tagung vom 14.–16. Miles, A. E. W. (1963). The dentition in the assessment of individual age in skeletal material. In Dental Anthropology (Brothwell ed.) (pp. 191–209). New York: Pergamon Press. Murail, P., Bruzek, J., Houët, F., & Cunha, E. (2005). DSP: A tool for probabilistic sex diagnosis using worldwide variability in hip-bone measurements. Bulletins Et Memoires De La Socite D'Anthropologie De Paris, 17, 167–176. Navitainuck, D., Meyer, C., & Alt, K. W. (2013). Degenerative alterations of the spine in an Early Mediaeval population from MannheimSeckenheim, Germany. Homo, 64, 179–189. Nemeskéri, J., Harsanyi, L., & Acsadi, G. (1960). Methoden zur Diagnose des Lebensalters von Skelettfunden. Anthropologischer Anzeiger, 24(1), 70–95. Nikita, E., & Nikitas, P. (2020). Sex estimation: a comparison of techniques based on binary logistic, probit and cumulative probit regression, linear and quadratic discriminant analysis, neural networks, and naïve Bayes classification using ordinal variables. International Journal of Legal Medicine, 134, 1213–1225. https://doi.org/10.1007/s00414019-02148-4 Novotny, V. (1975). Diskriminanzanalyse der Geschlechtmerkmale auf dem Os coxae beim Menschen. XIII Czechoslovakian Anthropological Congress Brno, 23. ir, M., & Güleç, E. (2006). Sex determination Özer, I., Katayama, K., Sag using the scapula in medieval skeletons from East Anatolia. Collegium Antropologicum, 30, 415–419. Passalacqua, N. V., Zhang, Z., & Pierce, S. J. (2013). Sex determination of human skeletal populations using latent profile analysis. American Journal of Physical Anthropology, 151, 538–543. Phenice, T. W. (1969). A newly developed visual method of sexing the os pubis. American Journal of Physical Anthropology, 30, 297–302. Rösing, F. W., Graw, M., Marré, B., Ritz-Timme, S., Rothschild, M. A., Rötzscher, K., Schmelingg, A., Schröderh, I., & Geserickg, G. (2007). Recommendations for the forensic diagnosis of sex and age from skeletons. Homo, 58(1), 75–89. https://doi.org/10.1016/j.jchb.2005. 07.002 Santos, F., Guyomarc'h, P., & Bruzek, J. (2014). Statistical sex determination from craniometrics: Comparison of linear discriminant analysis, logistic regression, and support vector machines. Forensic Science International, 245, 204. e1-204. e8 Santos, F., Guyomarc'h, P., Rmoutilova, R., & Bruzek, J. (2019). A method of sexing the human os coxae based on logistic regressions and Bruzek's nonmetric traits. American Journal of Physical Anthropology, 169, 435–447. Sierp, I., & Henneberg, M. (2015). The difficulty of sexing skeletons from unknown populations. Journal of Anthropology, 908535, (p. 13). https://doi.org/10.1155/2015/908535 Steele, D. G. (1976). The estimation of sex on the basis of the talus and calcaneus. American Journal of Physical Anthropology, 45, 581–588. Stloukal, M. (1982). Probleme der paläodemographischen Analyse unter besonderer Berücksichtigung der Alters- und Geschlechtsbestimmung am Skelett. Jahrbuch des Römisch-Germanischen Zentralmuseums Mainz, 29, 1–12. assy, J. (1988). Altersdiagnose am Skelett. In Anthropologie. Szilv Handbuch der vergleichenden Biologie des Menschen (Knußmann ed.) (pp. 421–443). Stuttgart: Fischer. Todd, T. W. (1920). Age changes in the pubic bone. I The male White pubis. American Journal of Physical Anthropology, 3, 285–334. Ubelaker, D. H., & Volk, C. G. (2002). A test of the phenice method for the estimation of sex. Journal of Forensic Sciences, 47, 19–24. Walker, P. L. (2008). Sexing skulls using discriminant function analysis of visually assessed traits. American Journal of Physical Anthropology, 136, 39–50. Wirth, K., Koch, U., & Rosendahl, W. (2007). Tatort Bösfeld. Die Entdeckung eines sensationellen Gräberfeldes. Badische Heimat, 87, 166–176. SUPPORTING INF ORMATION Additional supporting information may be found online in the Supporting Information section at the end of this article. How to cite this article: Navitainuck, D. U., Vach, W., Alt, K. W., & Schibler, J. (2021). Best practice for osteological sexing in forensics and bioarchaeology: The utility of combining metric and morphological traits from different anatomical regions. International Journal of Osteoarchaeology, 1–14. https://doi.org/10.1002/oa.3014