Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content
Tapash Ghosh

    Tapash Ghosh

    SARS-CoV-2 has caused a global pandemic that has costed enormous human lives in the recent past. The present study is an investigation of the viral codon adaptation, ORFs’ stability and tRNA co-adaptation with humans. We observed that for... more
    SARS-CoV-2 has caused a global pandemic that has costed enormous human lives in the recent past. The present study is an investigation of the viral codon adaptation, ORFs’ stability and tRNA co-adaptation with humans. We observed that for the codon usage bias in viral ssRNA, ORFs have near values of folding free energies and codon adaptation index with mRNAs of the human housekeeping CDS. However, the correlation between the stability of the ORFs in ssRNA and CAI is stronger than the mRNA stability and CAI of HKG, suggesting a greater expression capacity of SARS-CoV-2. Mutational analysis reflects polymorphism in the virus for ORF1ab, surface glycoprotein and nucleocapsid phosphoprotein ORFs. Non-synonymous mutations have shown non-polar substitutions. Out of the twelve mutations nine are for a higher t-RNA copy number. Viruses in general have high mutation rates. To understand the chances of survival for the mutated SARS-CoV-2 we did simulation for synonymous mutations. It resulted...
    Codon usage bias (CUB) and mRNA structural stability are important intrinsic features of mRNA that correlate positively with mRNA expression level. However, it remains unclear whether the mRNA expression level can be regulated by... more
    Codon usage bias (CUB) and mRNA structural stability are important intrinsic features of mRNA that correlate positively with mRNA expression level. However, it remains unclear whether the mRNA expression level can be regulated by adjusting these two parameters, influencing the mRNAs' structure. Here we explored the influence of CUB and mRNA structural stability on mRNA expression levels in Saccharomyces cerevisiae, using both wild type and computationally mutated mRNAs. Although in wild type, both CUB and mRNA stability positively regulate the mRNA expression level, any deviation from natural situation breaks such equilibrium. The naturally occurring codon composition is responsible for optimizing the mRNA expression, and under such composition, the mRNA structure having highest stability is selected by nature.
    Effective number of codons (N^c) and its variant N^'c (effective number of codons prime) are the two widely used methods for measuring unequal usage of synonymous codons in coding sequences, known as the codon usage bias (CUB). The... more
    Effective number of codons (N^c) and its variant N^'c (effective number of codons prime) are the two widely used methods for measuring unequal usage of synonymous codons in coding sequences, known as the codon usage bias (CUB). The mathematical formula used in calculating N^c and N^'c values is giving inappropriate measures of CUB in case of low abundance of amino acids. In addition, the magnitude of error also varies according to codon degeneracy. In this study, a modified formula for N^c and N^'c has been developed to measure the CUB more accurately. Online implementations of the modified formula are available in the web portal at http://agnigarh.tezu.ernet.in/~ssankar/cub.php.
    Compositional distributions in three different codon positions as well as codon usage biases of all available DNA sequences of Buchnera aphidicola genome have been analyzed. It was observed that GC levels among the three codon positions... more
    Compositional distributions in three different codon positions as well as codon usage biases of all available DNA sequences of Buchnera aphidicola genome have been analyzed. It was observed that GC levels among the three codon positions is I>II>III as observed in other extremely high AT rich organisms. B. aphidicola being an AT rich organism is expected to have A and/or T at the third positions of codons. Overall codon usage analyses indicate that A and/or T ending codons are predominant in this organism and some particular amino acids are abundant in the coding region of genes. However, multivariate statistical analysis indicates two major trends in the codon usage variation among the genes; one being strongly correlated with the GC contents at the third synonymous positions of codons, and the other being associated with the expression level of genes. Moreover, codon usage biases of the highly expressed genes are almost identical with the overall codon usage biases of all the...
    Pseudogenes, the nonfunctional homologs of functional genes and thus exemplified as 'genomic fossils' provide intriguing snapshots of the evolutionary history of human genome. These defunct copies generally arise by... more
    Pseudogenes, the nonfunctional homologs of functional genes and thus exemplified as 'genomic fossils' provide intriguing snapshots of the evolutionary history of human genome. These defunct copies generally arise by retrotransposition or duplication followed by various genetic disablements. In this study, focusing on human pseudogenes and their functional homologues we describe their characteristic features and relevance to protein sequence evolution. We recapitulate that pseudogenes harbor disease-causing degenerative sequence variations in conjunction with the immense disease gene association of their progenitors. Furthermore, we also discuss the issue of functional resurrection and the potentiality observed in some pseudogenes to regulate their functional counterparts.
    To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian)... more
    To date, numerous studies have been attempted to determine the extent of variation in evolutionary rates between human disease and nondisease (ND) genes. In our present study, we have considered human autosomal monogenic (Mendelian) disease genes, which were classified into two groups according to the number of phenotypic defects, that is, specific disease (SPD) gene (one gene: one defect) and shared disease (SHD) gene (one gene: multiple defects). Here, we have compared the evolutionary rates of these two groups of genes, that is, SPD genes and SHD genes with respect to ND genes. We observed that the average evolutionary rates are slow in SHD group, intermediate in SPD group, and fast in ND group. Group-to-group evolutionary rate differences remain statistically significant regardless of their gene expression levels and number of defects. We demonstrated that disease genes are under strong selective constraint if they emerge through edgetic perturbation or drug-induced perturbation...
    A compositional analysis on a set of human genes classified in several functional classes was performed. We found out that the GC3, i.e. the GC level at the third codon positions, of the genes involved in cellular metabolism was... more
    A compositional analysis on a set of human genes classified in several functional classes was performed. We found out that the GC3, i.e. the GC level at the third codon positions, of the genes involved in cellular metabolism was significantly higher than those involved in information storage and processing. Analyses of human/Xenopus ortologous genes showed that: (i) the GC3 increment of the genes involved in cellular metabolism was significantly higher than those involved in information storage and processing; and (ii) a strong correlation between the GC3 and the corresponding GCi, i.e. the GC level of introns, was found in each functional class. The non‐randomness of the GC increments favours the selective hypothesis of gene/genome evolution.
    BackgroundPseudogenes, the nonfunctional homologues of functional genes are now coming to light as important resources regarding the study of human protein evolution. Processed pseudogenes arising by reverse transcription and reinsertion... more
    BackgroundPseudogenes, the nonfunctional homologues of functional genes are now coming to light as important resources regarding the study of human protein evolution. Processed pseudogenes arising by reverse transcription and reinsertion can provide molecular record on the dynamics and evolution of genomes. Researches on the progenitors of human processed pseudogenes delved out their highly expressed and evolutionarily conserved characters. They are reported to be short and GC-poor indicating their high efficiency for retrotransposition. In this article we focused on their high expressivity and explored the factors contributing for that and their relevance in the milieu of protein sequence evolution.ResultsWe here, analyzed the high expressivity of these genes configuring processed or retropseudogenes by their immense connectivity in protein-protein interaction network, an inclination towards alternative splicing mechanism, a lower rate of mRNA disintegration and a slower evolutiona...
    Primary metabolism is essential to plants for growth and development, and secondary metabolism helps plants to interact with the environment. Many plant metabolites are industrially important. These metabolites are produced by plants... more
    Primary metabolism is essential to plants for growth and development, and secondary metabolism helps plants to interact with the environment. Many plant metabolites are industrially important. These metabolites are produced by plants through complex metabolic pathways. Lack of knowledge about these pathways is hindering the successful breeding practices for these metabolites. For a better knowledge of the metabolism in plants as a whole, evolutionary rate variation of primary and secondary metabolic pathway genes is a prerequisite. In this study, evolutionary rate variation of primary and secondary metabolic pathway genes has been analyzed in the model plant Arabidopsis thaliana. Primary metabolic pathway genes were found to be more conserved than secondary metabolic pathway genes. Several factors such as gene structure, expression level, tissue specificity, multifunctionality, and domain number are the key factors behind this evolutionary rate variation. This study will help to bet...
    When a cationic dye is added to a mixture of different anionic polymers, the dye will be distributed amongst them according to the binding strengths of the polymers. In such a competitive binding, DNA uses its intercalation ability to win... more
    When a cationic dye is added to a mixture of different anionic polymers, the dye will be distributed amongst them according to the binding strengths of the polymers. In such a competitive binding, DNA uses its intercalation ability to win the competition, provided the ligand can be intercalated within the DNA base pairs. For poorly or non‐intercalating ligands like 1,9‐dimethyl‐methylene blue or pinacyanol, the effective mode of binding is only electrostatic. For such ligands the strong polyelectrolyte poly(styrene sulfonate) wins the competition for cationic dyes, as shown by spectroscopic and dichroic probes. The results of the competitive binding of pinacyanol between the two weak chromotropes teichoic acid and DNA are not distinctly indicative, but the dichroic probe hints at a relatively stronger chromotropic ability of DNA.
    Figure S1. The differences between human small-scale and whole-genome duplicate pairs using the closest paralogs. Figure S2. Functional similarity between human small-scale duplicates with different sequence identity thresholds and... more
    Figure S1. The differences between human small-scale and whole-genome duplicate pairs using the closest paralogs. Figure S2. Functional similarity between human small-scale duplicates with different sequence identity thresholds and whole-genome duplicate pairs. Figure S3. Subcellular co-localization between human small-scale and whole-genome duplicate pairs. Figure S4. Differences in gene expression correlation between human small-scale and whole-genome duplicate pairs. (PDF 662Â kb)
    Background: Arabidopsis thaliana and Brassica rapa shared a common evolutionary clade but Brassica species experienced an extra whole genome triplication (WGT) event compared with the model plant A. thaliana. This extra round of WGT... more
    Background: Arabidopsis thaliana and Brassica rapa shared a common evolutionary clade but Brassica species experienced an extra whole genome triplication (WGT) event compared with the model plant A. thaliana. This extra round of WGT confers B. rapa more abiotic stress resistant. The study aims to unravel how the consequences of whole genome duplication steer the variation in stress adaptation competency between the two species.Result: Comparing the duplication status between abiotic stress resistant (ASR) genes in the two species, significant increase in the number of paralogs in ASR genes of B. rapa than A. thaliana was found. Investigation on the proteomic features suggests that the ohnologs pairs in both species are more enriched with intrinsically disordered residues (IDRs) than other duplicated pairs but IDRs only in B. rapa have showed a significant positive correlation with functional divergence between the duplicated pairs. The functional divergence helps to mediate more str...
    To reveal the factors influencing architecture of protein-coding genes in staphylococcal phages, relative synonymous codon usage variation has been investigated in 920 protein-coding genes of 16 staphylococcal phages. As expected for AT... more
    To reveal the factors influencing architecture of protein-coding genes in staphylococcal phages, relative synonymous codon usage variation has been investigated in 920 protein-coding genes of 16 staphylococcal phages. As expected for AT rich genomes, there are predominantly A and T ending codons in all 16 phages. Both Nc plot and correspondence analysis on relative synonymous codon usage indicates that mutation bias influences codon usage variation in the 16 phages. Correspondence analysis also suggests that translational selection and gene length also influence the codon usage variation in the phages to some extent and codon usage in staphylococcal phages is phage-specific but not S. aureus-specific. Further analysis indicates that among 16 staphylococcal phages, 44AHJD, P68 and K may be extremely virulent in nature as most of their genes have high translation efficiency. If this is true, then above three phages may be useful for curing staphylococcal infections.
    In human, highly expressed genes contain shorter and fewer introns and these have been attributed to selection for economy in transcription and translation. On the other hand, in plants, it has been shown that highly expressed genes tend... more
    In human, highly expressed genes contain shorter and fewer introns and these have been attributed to selection for economy in transcription and translation. On the other hand, in plants, it has been shown that highly expressed genes tend to be longer than lowly expressed genes. Here, in this study, we analyzed compositional influence on genome organization in both rice and human. We demonstrated that, in GC rich rice genes, highly expressed genes are less compact than lowly expressed genes. In GC-poor class, there is no difference in gene compactness between highly and lowly expressed genes. However, the scenario is different for human as there is no influence of GC composition on gene compactness due to their expression levels. We also reported that, highly expressed rice GC-rich pre-mRNA tend to form less stable secondary structure than that of lowly expressed genes. However, on removing intronic sequences, highly expressed mRNA form a stable secondary structure as compared to lowly expressed GC-rich genes. We suggest that in GC-rich rice genes long introns are under selection for enhancing transcriptional efficiency by modulating pre-mRNA secondary structural stability. Thus evolutionary mechanisms behind genome organization are different between these two genomes (human and rice).
    In this study codon usage bias of all experimentally known genes of Lactococcus lactis has been analyzed. Since Lactococcus lactis is an AT rich organism, it is expected to occur A and/or T at the third position of codons and detailed... more
    In this study codon usage bias of all experimentally known genes of Lactococcus lactis has been analyzed. Since Lactococcus lactis is an AT rich organism, it is expected to occur A and/or T at the third position of codons and detailed analysis of overall codon usage data indicates that A and/or T ending codons are predominant in this organism. However, multivariate statistical analyses based both on codon count and on relative synonymous codon usage (RSCU) detect a large number of genes, which are supposed to be highly expressed are clustered at one end of the first major axis, while majority of the putatively lowly expressed genes are clustered at the other end of the first major axis. It was observed that in the highly expressed genes C and T ending codons are significantly higher than the lowly expressed genes and also it was observed that C ending codons are predominant in the duets of highly expressed genes, whereas the T endings codons are abundant in the quartets. Abundance of C and T ending codons in the highly expressed genes suggest that, besides, compositional biases, translational selection are also operating in shaping the codon usage variation among the genes in this organism as observed in other compositionally skewed organisms. The second major axis generated by correspondence analysis on simple codon counts differentiates the genes into two distinct groups according to their hydrophobicity values, but the same analysis computed with relative synonymous codon usage values could not discriminate the genes according to the hydropathy values. This suggests that amino acid composition exerts constraints on codon usage in this organism. On the other hand the second major axis produced by correspondence analysis on RSCU values differentiates the genes into two groups according to the synonymous codon usage for cysteine residues (rarest amino acids in this organism), which is nothing but a artifactual effect induced by the RSCU values. Other factors such as length of the genes and the positions of the genes in the leading and lagging strand of replication have practically no influence in the codon usage variation among the genes in this organism.
    Synonymous codon and amino acid usage biases have been investigated in 903 Mimivirus protein-coding genes in order to understand the architecture and evolution of Mimivirus genome. As expected for an AT-rich genome, third codon positions... more
    Synonymous codon and amino acid usage biases have been investigated in 903 Mimivirus protein-coding genes in order to understand the architecture and evolution of Mimivirus genome. As expected for an AT-rich genome, third codon positions of the synonymous codons of Mimivirus carry mostly A or T bases. It was found that codon usage bias in Mimivirus genes is dictated both by mutational pressure and translational selection. Evidences show that four factors such as mean molecular weight (MMW), hydropathy, aromaticity and cysteine content are mostly responsible for the variation of amino acid usage in Mimivirus proteins. Based on our observation, we suggest that genes involved in translation, DNA repair, protein folding, etc., have been laterally transferred to Mimivirus a long ago from living organism and with time these genes acquire the codon usage pattern of other Mimivirus genes under selection pressure.
    Biased usage of synonymous codons has been elucidated under the perspective of cellular tRNA abundance for quite a long time now. Taking advantage of publicly available gene expression data for Saccharomyces cerevisiae, a systematic... more
    Biased usage of synonymous codons has been elucidated under the perspective of cellular tRNA abundance for quite a long time now. Taking advantage of publicly available gene expression data for Saccharomyces cerevisiae, a systematic analysis of the codon and amino acid usages in two different coding regions corresponding to the regular (helix and strand) as well as the irregular (coil) protein secondary structures, have been performed. Our analyses suggest that apart from tRNA abundance, mRNA folding stability is another major evolutionary force in shaping the codon and amino acid usage differences between the highly and lowly expressed genes in S. cerevisiae genome and surprisingly it depends on the coding regions corresponding to the secondary structures of the encoded proteins. This is obviously a new paradigm in understanding the codon usage in S. cerevisiae. Differential amino acid usage between highly and lowly expressed genes in the regions coding for the irregular protein secondary structure in S. cerevisiae is expounded by the stability of the mRNA folded structure. Irrespective of the protein secondary structural type, the highly expressed genes always tend to encode cheaper amino acids in order to reduce the overall biosynthetic cost of production of the corresponding protein. This study supports the hypothesis that the tRNA abundance is a consequence of and not a reason for the biased usage of amino acid between highly and lowly expressed genes.
    Regarding the existence of any specific correlation between optimal growth temperature and genomic GC levels, Musto et al. [FEBS Lett. 573 (2004) 73] have recently performed analysis on 20 prokaryotic families and showed that in most of... more
    Regarding the existence of any specific correlation between optimal growth temperature and genomic GC levels, Musto et al. [FEBS Lett. 573 (2004) 73] have recently performed analysis on 20 prokaryotic families and showed that in most of the families there exists a positive correlation between these two parameters. On the basis of these results they claimed that optimal growth temperature is one of the factors that influence genomic GC composition in prokaryotes. In a subsequent article, Marashi and Ghalanbor [Biochem. Biophys. Res. Commun. 325 (2004) 381] have demonstrated that the correlation values change substantially when very few points in some of the families were excluded from the data set of Musto et al. [FEBS Lett. 573 (2004) 73]. But Marashi and Ghalanbor have not provided any reason behind this. The points excluded by Marashi and Ghalanbor are actually the outliers in the data set, which strongly affect the correlation coefficients. But the presence of outliers in large data set hardly had any effect on the correlation values. Marashi and Ghalanbor have excluded points from only those families that have small sample sizes and observed a substantial change in correlation coefficient values. Therefore, we argue that any conclusion drawn for a small sample size having outliers is always questionable. Although Musto's approach is a novel one, but to make any generalization one needs to be careful about the flawlessness in the data set.