Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats

Ali Seman; Azizian Mohd Sapawi; Mohd Zaki Salleh

doi:10.1089/omi.2014.0136

Towards Development of Clustering Applications for Large-Scale Comparative Genotyping and Kinship Analysis Using Y-Short Tandem Repeats

OMICS. 2015 Jun;19(6):361-7. doi: 10.1089/omi.2014.0136. Epub 2015 May 6.

Authors

Ali Seman^{1

2}, Azizian Mohd Sapawi², Mohd Zaki Salleh¹

Affiliations

¹ 1 Integrative Pharmacogenomics Institute (iPROMISE), Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA (UiTM) , Shah Alam, Selangor, Malaysia .
² 2 Center for Computer Science Studies, Faculty of Computer and Mathematical Sciences Universiti Teknologi MARA (UiTM) , Shah Alam, Selangor, Malaysia .

Abstract

Y-chromosome short tandem repeats (Y-STRs) are genetic markers with practical applications in human identification. However, where mass identification is required (e.g., in the aftermath of disasters with significant fatalities), the efficiency of the process could be improved with new statistical approaches. Clustering applications are relatively new tools for large-scale comparative genotyping, and the k-Approximate Modal Haplotype (k-AMH), an efficient algorithm for clustering large-scale Y-STR data, represents a promising method for developing these tools. In this study we improved the k-AMH and produced three new algorithms: the Nk-AMH I (including a new initial cluster center selection), the Nk-AMH II (including a new dominant weighting value), and the Nk-AMH III (combining I and II). The Nk-AMH III was the superior algorithm, with mean clustering accuracy that increased in four out of six datasets and remained at 100% in the other two. Additionally, the Nk-AMH III achieved a 2% higher overall mean clustering accuracy score than the k-AMH, as well as optimal accuracy for all datasets (0.84-1.00). With inclusion of the two new methods, the Nk-AMH III produced an optimal solution for clustering Y-STR data; thus, the algorithm has potential for further development towards fully automatic clustering of any large-scale genotypic data.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Algorithms
Genotype
Haplotypes / genetics
Humans
Microsatellite Repeats / genetics*
Pedigree