An accurate inference of orthologs is essential in many research fields such as comparative genomics, molecular evolution, and genome annotation. Existing methods for genome-scale orthology inference are mostly based on all-versus-all similarity searches that scale quadratically with the number of species. This limits their application to the increasing number of available large-scale datasets. Here, we present Hieranoid, a new orthology inference method using a hierarchical approach. Hieranoid performs pairwise orthology analysis using InParanoid at each node in a guide tree as it progresses from its leaves to the root. This concept reduces the total runtime complexity from a quadratic to a linear function of the number of species. The tree hierarchy provides a natural structure in multi-species ortholog groups, and the aggregation of multiple sequences allows for multiple alignment similarity searching techniques, which can yield more accurate ortholog groups. Using the recently published orthobench benchmark, Hieranoid showed the overall best performance. Our progressive approach presents a new way to infer orthologs that combines efficient graph-based methodology with aspects of compute-intensive tree-based methods. The linear scaling with the number of species is a major advantage for large-scale applications and makes Hieranoid well suited to cope with vast amounts of sequenced genomes in the future. Hieranoid is an open source and can be downloaded at Hieranoid.sbc.su.se.
Copyright © 2013 Elsevier Ltd. All rights reserved.