Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Population geneticists have long sought to comprehend various selection traces present in the goat genome due to natural or human-driven selection and breeding practices. As a step forward to pinpoint the selection signals in the... more
Population geneticists have long sought to comprehend various selection traces present in the goat genome due to natural or human-driven selection and breeding practices. As a step forward to pinpoint the selection signals in the Pakistani Dera-Din-Panah (DDP) goat breed, whole-genome pooled-sequencing (n=12) was performed and 618,236,192 clean paired-end reads were mapped against ARS1 reference goat assembly. Five different selection signal statistics were applied here using four Site-Frequency Spectrum (SFS) methods (Tajima’s D (TD), Fay & Wu’s H (H), Zeng’s E (E),Pool – HMM) and one Reduced Local-Variability approach named pooled-heterozygosity (Hp). The under-selection regions were annotated with significant threshold values of –ZTD≥4.7, –ZH≥6, –ZE≥2.5, Pool-HMM≥12, and –ZHp≥5, which resulted in accumulative 364 candidate gene hits, while the highest signals were observed on Chr. 4, 6, 10, 12, 15, 16, 18, 20, 27 harborADAMTS6, CWC27genes associated with body-height,RELN, MYCBP2,...
The Pashmina and Barbari are two famous goat breeds found in the wide areas of the Indo-Pak region. Pashmina is famous for its long hair-fiber (Cashmere) production while Barbari is not-selected for this trait. So, the mRNA expression... more
The Pashmina and Barbari are two famous goat breeds found in the wide areas of the Indo-Pak region. Pashmina is famous for its long hair-fiber (Cashmere) production while Barbari is not-selected for this trait. So, the mRNA expression profiling in the skin samples of both breeds would be an attractive and judicious approach for detecting putative genes involved in this valued trait. Here, we performed differential gene expression analysis on publicly available RNA-Seq data from both breeds. Out of 44,617,994 filtered reads of Pashmina and 55,995,999 of Barbari which are 76.48% and 73.69% mapped to the ARS1 reference transcriptome assembly respectively. A pairwise comparison of both breeds resulted in 47,159 normalized expressed transcripts while 8,414 transcripts are differentially expressed above the significant threshold. Among these, 4,788 are upregulated in Pashmina while 3,626 transcripts are upregulated in Barbari. Fifty-nine transcripts harbor 57 genes including 32 LOC genes ...
Background:  Galaxy web-based platform for Next Generation Sequence (NGS) data analysis provides unprecedented opportunities to characterize, analyze and computationally visualize genomic landscapes with limited-resources. An initiative... more
Background:  Galaxy web-based platform for Next Generation Sequence (NGS) data analysis provides unprecedented opportunities to characterize, analyze and computationally visualize genomic landscapes with limited-resources. An initiative was taken to explore this pipeline for NGS data-analysis by using Galaxy platform, for its relative accessibility, reproducibility, transparency and scalability.   Methods:  Variant calling and associated workflows were executed on NGS pooled-seq data of 12 Pakistani Teddy goats. Different tools used in this pipeline are FastQC for quality checks, Trimmomatic for trimming data, SAM/BAM tools for conversion of file formats, Picard tools for marking deduplicates, VCFtools/FreeBayes for genomic variant detection and SnpSift to annotate the variants. Results:  Highly associated functionally untrivial 43,712 loci were percolated having 87,510 alleles. Besides, 1,548 variants with 1,134 SNPs, 23 mixed variants, 76 MNP, 183 insertions and 132 deletions were...
Background:  R is one of the renowned programming language which is an open source software developed by the scientific community to compute, analyze and visualize big data of any field including biomedical research for bioinformatics... more
Background:  R is one of the renowned programming language which is an open source software developed by the scientific community to compute, analyze and visualize big data of any field including biomedical research for bioinformatics applications. Methods:  Here, we outlined R allied packages and affiliated bioinformatics infrastructures e.g. Bioconductor and CRAN. Moreover, basic concepts of factor, vector, data matrix and whole transcriptome RNA-Seq data was analyzed and discussed. Particularly, differential expression workflow on simulated prostate cancer RNA-Seq data was performed through experimental design, data normalization, hypothesis testing and downstream investigations using EdgeR package. A few genes with ectopic expression were retrieved and knowhow to gene enrichment pathway analysis is highlighted using available online tools. Results:  Data matrix of (4×3) was constructed, and a complex data matrix of Golub  et al ., was analyzed through χ2 statistics by generating...
Mitochondrial Encephalohepatopathy (MEH) is an autosomal recessive neurodevelopmental disorder usually accompanied by microcephaly, white matter changes, cardiac and hepatic failure. Here, we applied the whole-exome sequencing (WES)... more
Mitochondrial Encephalohepatopathy (MEH) is an autosomal recessive neurodevelopmental disorder usually accompanied by microcephaly, white matter changes, cardiac and hepatic failure. Here, we applied the whole-exome sequencing (WES) framework on a trio family data with unaffected non-consanguineous parents and proband (neonate girl) with this inherited disorder. A total of 2,928,402 variants were observed with 2,613,746 SNPs, 112,336 multiple nucleotide polymorphisms (MNPs), 72,610 insertions, 113,207 deletions and 16,503 mixed variants. These variations are responsible for 82,813,631 effects on various genomic regions. Our pipeline uncovered candidate gene mutations from these variants and retained a handful of 5,277 variants harboring 3,598 genes, out of which, 8 genes codes for non-coding RNA while 178 genes are those with high impact severity. Among these 178 variants, 125 are de-novo variants that are not previously reported in the ClinVar database. Consistent to previous studi...
Advances in the next generation sequencing (NGS) technologies, their cost effectiveness and well-developed pipelines using computational tools/softwares has allowed researchers to reveal ground-breaking discoveries in multi-omics data... more
Advances in the next generation sequencing (NGS) technologies, their cost effectiveness and well-developed pipelines using computational tools/softwares has allowed researchers to reveal ground-breaking discoveries in multi-omics data analysis. However, there is still uncertainty due to massive upsurge in parallel tools and difficulty in choosing best practiced pipeline for expression profiling of RNA sequenced (RNA-seq) data. Here, we detail the optimized pipeline that works at a fast pace with enhanced accuracy on personal computer rather than using cloud or high-performance computing clusters (HPC). The steps include quality check, base filtration, quasi-mapping, quantification of samples, estimation and counting of transcript/gene expression abundances, identification and clustering of differentially expressed features and visualization of the data. The tools FastQC, Trimmomatic, Salmon and some other scripts in Trinity toolkit were applied on two paired-end datasets. An extensi...
SNP chip-based genome-wide association studies (GWAS) is an inspiring and fast scanning method for mapping variations within the genome and associating them with specific diseases/trait. This association information has enhanced the... more
SNP chip-based genome-wide association studies (GWAS) is an inspiring and fast scanning method for mapping variations within the genome and associating them with specific diseases/trait. This association information has enhanced the chances of improvement in disease diagnosis, understanding the causative variants locations and associated gene hunting strategies. GWAS have laid foundation of an era in which both personalized medicine and pharmacogenomics would be reinforced along with better understanding of functional genomics aspects of modern molecular genetics. Since the advent of first GWAS in 2002, thousands of genome wide association studies have been published which have proven GWAS a successful methodology in identifying significant variants in disease/trait association but application of GWAS outcomes to clinical settings demands more evaluation for validity. Here, we have divided the GWAS approach into various aspects including history, development, analysis strategies, ap...
An in-silico WES approach using the Galaxy platform was adopted in the current study to predict the genetic basis of Premature Ovarian Failure (POF), where three affected patients in a Saudi Arabian family of seven, found associated with... more
An in-silico WES approach using the Galaxy platform was adopted in the current study to predict the genetic basis of Premature Ovarian Failure (POF), where three affected patients in a Saudi Arabian family of seven, found associated with X-linked recessive mutations. The current analysis discovered 518,054 variants using FreeBayes variant caller that had 1,461,864 effects on variable sites in the genome revealed by SnpEff software. The causal genetic mutations were filtered and annotated with the ClinVar database using the GEMINI tool. This tool retained 369 pathogenic mutations harboring 130 genes. Among the total, 268 variants positioned on 69 genes are shared with three affected individuals, 61 variants on 23 genes are shared by any two of the affected individuals, and 40 of the variants on 38 genes are present in any one of the affected sample. Two mutations in one of the already POF-associated, POF1B gene were also observed e.g. (i) g.84563135T>A; p.M349L and (ii) g.84563194...
Recently submitted 784 SARS-nCoV2 whole genome sequences from NCBI Virus database were taken for constructing phylogenetic tree to look into their similarities. Pakistani strain MT240479 (Gilgit1-Pak) was found in close proximity to... more
Recently submitted 784 SARS-nCoV2 whole genome sequences from NCBI Virus database were taken for constructing phylogenetic tree to look into their similarities. Pakistani strain MT240479 (Gilgit1-Pak) was found in close proximity to MT184913 (CruiseA-USA), while the second Pakistani strain MT262993 (Manga-Pak) was neighboring to MT039887 (WI-USA) strain in the constructed cladogram in this article. Afterward, four whole genome SARS-nCoV2 strain sequences were taken for variant calling analysis, those who appeared nearest relative in the earlier cladogram constructed a week time ago. Among those two Pakistani strains each of 29,836 bases were compared against MT263429 from (WI-USA) of 29,889 bases and MT259229 (Wuhan-China) of 29,864 bases. We identified 31 variants in both Pakistani strains, (Manga-Pak vs USA=2del+7SNPs, Manga-Pak vs Chinese=2del+2SNPs, Gilgit1-Pak vs USA=10SNPs, Gilgit1-Pak vs Chinese=8SNPs), which caused alteration in ORF1ab, ORF1a and N genes with having function...
Emergence of COVID-19 pandemic has resulted in 8,578,283 total cases and 456,286 deaths worldwide as of June 19, 2020. We previously analysed genomic variants in two Northern Pakistani SARS-nCoV2 strains against USA and Chinese strains as... more
Emergence of COVID-19 pandemic has resulted in 8,578,283 total cases and 456,286 deaths worldwide as of June 19, 2020. We previously analysed genomic variants in two Northern Pakistani SARS-nCoV2 strains against USA and Chinese strains as reference, and hypothesized the putative role of observed variants in low severity of COVID-19 in Pakistan. Due to high variation rate in this virus, we further analysed the whole genome of Southern Pakistani SARS-nCoV2 MT500122 strain (Karachi-Pak) vs NC_045512 (Wuhan1-China) and observed 4 variants (3=SNPs,1=del). Three of variants at g.1604 (del ND447N), SNPs at g.1912 (p.=), g.10582 (p.=) and g.26022 (p.=) in ORF1ab and ORF3a genes respectively. ORF1ab encodes 16 non-structural polyproteins (nsps1-16) and plays role in viral replication. The codon change deletion in its sequence (as observed in MT500122) might have caused conformational alterations particularly in nsp2&5 structures which may obstruct its effectiveness. ORF3a is unique to SARS-n...
Whole genome pooled sequence data of 12 Pakistani Teddy goats is analyzed for positive selection signatures as their breed defining characteristics. Selection imprints left in the Teddy genome are unveiled by genomic differentiation after... more
Whole genome pooled sequence data of 12 Pakistani Teddy goats is analyzed for positive selection signatures as their breed defining characteristics. Selection imprints left in the Teddy genome are unveiled by genomic differentiation after the successful paired-end alignment of 635,357,043 reads with (ARS1) reference genome assembly. Pooled-heterozygosity ( ) and Tajima’s D (TD) are applied for validation and getting better hits of selection signals, while pairwise FST statistics is conducted on Teddy vs. Bezoar (wild goat ancestor) for genomic differentiation. Annotation of regions under positive selection reveals 59 genes underlying production and adaptive traits. score ≥ 5 detected six windows having highest scores on Chr. 29, 9, 25, 15 and 14 that harbor HRASLS5, LACE1 and AXIN1 genes which are candidate for embryonic development, lactation and body height. Secondly, TD value of ≤ -2.2 showed 4 windows with very strong hits on Chr.5 & 9 harbor STIM1 and ADM genes related to body ...