Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

AIJRSTEM19-

American International Journal of Research in Science, Technology, Engineering & Mathematics Available online at http://www.iasir.net ISSN (Print): 2328-3491, ISSN (Online): 2328-3580, ISSN (CD-ROM): 2328-3629 AIJRSTEM is a refereed, indexed, peer-reviewed, multidisciplinary and open access journal published by I. Innovation and Research (IASIR), USA International Association of Scientific (An Association Unifying the Sciences, Engineering, and Applied Research) Modelling of an offline and online software for normalization of microarray data of gene expression by Perl, Bioperl and PerlTkand PerlCGI Gaurav Kumar Srivastava1, Dr. Santosh Kumar2, Dr. Himanshu Pandey3 Research Scholar, Maharishi University of Information Technology, Lucknow, Uttar Pradesh, INDIA. 2 Assosiate Professor, Maharishi University of Information Technology, Lucknow, Uttar Pradesh, INDIA. 3 Assistant Professor, BBDNIIT, Lucknow, Uttar Pradesh, INDIA. 1 ABSTRACT: B-Chip Reverence is an online database which isfreely accessible for microarray redundancy removal & normalizationand various data analysis techniques are applied on the data. This software accurately handle the massive amount of data.The growing use of DNA microarrays in biomedical research has led to the proliferation of analysis tools. These software programs address different aspects of analysis (e.g. normalization and clustering within and across individual arrays) as well as extended analysis methods (e.g. clustering, annotation and mining of multiple datasets).After studying all the terms and problems related to Microarray technique, we tried to make an-open and user friendly software to deal with all the problems and to run all the steps of this technique, so that we used Perl & Perl-cgi.perl-cgi stands for Common Gateway Interface, is a standard programming interface between Web servers and external programs.perl-cgi executes external programs on the webserver. I. INTRODUCTION A microarray database is a repository containing microarray gene expression data. The key uses of a microarray database are to store the measurement data, manage a searchable index, and make the data available to other applications for analysis and interpretation. The concept behind this, a microarray is a pattern of ssDNA probes which are immobilized on surface of chip or a slide. The probe sequences are designed and placed on an array in regular patter of spots. The chip or slide is usually made of glass or nylon and is manufactured using technologies developed for silicon computer chips. Each microarray chip is arranged as a checkerboard of 105 or 106 spots or features, each spot containing millions of copies of a unique DNA probe (often 25 nt long).Microarray technology allows the monitoring of expression levels for thousands of genes simultaneously. Even in replicated experiment, some variations are commonly observed. Normalization is the process of removing some sources of variation which affect the measured gene expression levels. In gene expression microarray data analysis, selecting a small number of discriminative genes from thousands of genes is an important problem for accurate classification of diseases or phenotypes.The ability of microarray chip to capture the expressional level of thousands of genes in one snapshot becomes a major attraction for biologists. By performing parallel microarray experiments under different conditions, biologists seek useful information of the underlying biological process that lies in the hundreds of thousands of data points obtained. The first step of task is classifies data through a single step partition. In such task,cluster the genes into biological meaningful groups according to their pattern of expression, based on the assumption that expressional similarity of genes implies their functional similarity. The clustering methods which are used for this include conventional clustering methods (such as k-means clustering, and self-organizing maps).k-means clustering is a simple and a divisive approach. In this method, data are partitioned into k-clusters, which are prespecified at the outset. Self-Organizing Maps is pattern recognition algorithm employs neural networks and based on the machine-learning method. B chip reverence database dealing the removing microarray redenduncy and normalization, for that using bioperl (kmean and SOM executed by cpan module) dreamweaver8 (for designing web pages on website) photoshop (for logo designing) PERL CGI (for programs to interface with information servers such as HTTP (web servers).  II. AIM & OBJECTIVES To remove duplicates repetitive and blank genes from our raw data. After removing redundancy, normalize the datasets using the PERL-CGI. AIJRSTEM 19-235; © 2019, AIJRSTEM All Rights Reserved Page 188 Srivastava et al., American International Journal of Research in Science, Technology, Engineering & Mathematics,26(1), March-May 2019, pp. 188-193  To make user friendly, open and easily accessible interactive CGI interface for database and various tools for analysis of clustering (k-mean and SOM) for microarray using PERL-CGI algorithms.         III. MATERIALS & METHODS Perl & Perl-cgi Perl is a programming language developed by Larry Wall, designed for text processing. ThoughPerl is not officially an acronym but many times it is used as it stands for Practical Extraction and Report Language. It runs on platforms like Windows, Mac OS, and UNIX. Perl CGI is the Common Gateway Interface, a standard for programs to interface with information servers such as HTTP (web) servers. CGI allows the HTTP server to run an executable program or script in response to a user request, and generate output on the fly. This allows web developers to create dynamic and interactive web pages. Perl is a very common language for CGI programming as it is largely platform independent and the language’s features make it very easy to write powerful applications. Bio Perl BioPerl is a collection of Perl modules that facilitate the development of Perl scripts for bioinformatics applications. It has played an integral role in the Human Genome Project. BioPerl is an active open source software project supported by the Open Bioinformatics Foundation. Self-Organizing Maps (SOM) and K-Means Clustering (KMC) As a machine-learning method, a SOM belongs to the category of neural networks. It provides a technique to visualize the HD input data on an output map of neurons. The map is often presented in a 2D grid of neurons.KMC is a simple and widely used partitioning method for data analysis. It’s helpfulness in discovering group of co-expressed genes has been demonstrated. Dreamweaver8 and WampServer Dreamweaver8 allows creating professional web pages and also quickly adding objects and functionality to pages without having to program the HTML code manually. Wamp Server is a windows web development environment for Apachey, MySQL, PHP databases. It’s also virtual server for windows platform, allows it user to manage Website and its components. IV. TECHNIQUES/DATABASES USED We studied information about genes through the method of Gene Ontology by Gene Cards, next we did Microarray data retrieval from NCBI Geo profiles of SMAD7. Data Normalization & Redundancy Removal of Gene Expression does with the help of Microsoft office excel. Next we studied Perl elementary and their different algorithms and logics in Perl includes:a) Regular Expression A regular expression is a string of characters that defines the pattern or patterns you are viewing. The syntax of regular expressions in Perl is very similar to what you will find within other regular expression.There are three regular expression operators within Perl.  Substitute Regular Expression - s///  Transliterate Regular Expression - tr///  Match Regular Expression - m// b) File Handling A file handle is a named internal Perl structure that associates a physical file with a name. All file handles are capable of read/write access, so you can read from and update any file or device associated with a file handle. However, when you associate a file handle, you can specify the mode in which the file handle is opened. Three basic file handles are - STDIN, STDOUT, and STDERR, which represent standard input, standard output and standard error devices respectively. c) Sub-Routines A Perl subroutine or function is a group of statements that together performs a task. You can divide up your code into separate subroutines. How you divide up your code among different subroutines is up to you, but logically the division usually is so each function performs a specific task. d) Parsing Parsing is the process of analyzing an input sequence (read from a file or a keyboard, for example) in order to determine its grammatical structure with respect to a given formal grammar. It is formally named syntax analysis. A parser is a computer program that carries out this task. And Normalization& Redundancy removal (to make data sorted and fine/removal of repeated genes and blank genes). AIJRSTEM 19-235; © 2019, AIJRSTEM All Rights Reserved Page 189 Srivastava et al., American International Journal of Research in Science, Technology, Engineering & Mathematics,26(1), March-May 2019, pp. 188-193    Next we did the major/main part of the Microarray technique i.e. clustering: the clustering problem of microarray data only as an analysis to find genes that behave similarly over the experimental conditions. The first generation of clustering techniques includes hierarchical clustering, K-Means clustering and Self-Organizing Maps, we used K-means clustering (KMC): helpful in discovering group of coexpressed genes,& Self-organization map (SOM): provides a technique to visualize the HD input data on an output map of neurons. Beside all the steps which we did, we also studied CGI, HTML, Perl toolkit packages etc. in which we used Perl-cgi to develop the online web server B-Chip reverence software for users to perform Microarray Technique computationally. We also used BioPerl, we correlated K-Means Clustering and Self-Organizing Maps (SOM) algorithms to perform clustering with Bio Perl. Bio Perl is a collection of Perl modules and it facilitates the development of Perl scripts for bioinformaticsapplications. V. RESULT Figure 1: Display page of Pre-Clustering Figure 2: Pre-Clustering Results and analysis example AIJRSTEM 19-235; © 2019, AIJRSTEM All Rights Reserved Page 190 Srivastava et al., American International Journal of Research in Science, Technology, Engineering & Mathematics,26(1), March-May 2019, pp. 188-193 Figure 3: KMeans-Clustering Results and analysis example Figure 4: SOM-Clustering Results and analysis example AIJRSTEM 19-235; © 2019, AIJRSTEM All Rights Reserved Page 191 Srivastava et al., American International Journal of Research in Science, Technology, Engineering & Mathematics,26(1), March-May 2019, pp. 188-193 Figure 5: Common Gene of SOM and KMC Clustering VI. CONCLUSION B-Chip Reverence is a database, specially design for in silico analysis of microarray. In the field of Bioinformatics, B-chip Reverence fulfills the basic needs during the online microarray analysis.hence is one of a kind of its innovative database which provides web server for microarray redundancy removal & normalizationand various data analysis techniques are applied on the data. VII. SUMMARY B-Chip Reverence is a online database which is freely accessible for microarray redundancy removal & normalization and various data analysis techniques are applied on the data. This software accurately handle the massive amount of data. The growing use of DNA microarrays in biomedical research has led to the proliferation of analysis tools. These software programs address different aspects of analysis (e.g. normalization and clustering within and across individual arrays) as well as extended analysis methods (e.g. clustering, annotation and mining of multiple datasets). After studying all the terms and problems related to Microarray technique, we tried to make an-open and user friendly software to deal with all the problems and to run all the steps of this technique, so that we used Perl & Perl-cgi. perl-cgi stands for Common Gateway Interface, is a standard programming interface between Web servers and external programs. perl-cgi executes external programs on the web server. We also used BioPerl, we correlated K-Means Clustering and Self-Organizing Maps (SOM) algorithms to perform clustering with Bio Perl. Bio Perl is a collection of Perl modules and it facilitates the development of Perl scripts for bioinformatics applications. And Dreamweaver8 used to create professional web pages and also quickly add objects and functionality to pages without having to program the HTML code manually. B-Chip Reverence is a database, specially design for in silico analysis of microarray. In the field of Bioinformatics, B-chip Reverence fulfills the basic needs during the online microarray analysis.hence is one of a kind of its innovative database which provides web server for microarray redundancy removal & normalization and various data analysis techniques are applied on the data. References [1] [2] [3] [4] [5] Vroh Bi I, McMullen MD, Sanchez-Villeda H, Schroeder S, Gardiner J, Polacco M, Soderlund C, Wing R, Fang Z, Coe EH., Jr Single nucleotide polymorphisms and insertion-deletions for genetic markers and anchoring the maize fingerprint contig physical map. Crop Sci. 2006;46:12–21. doi: 10.2135/cropsci2004.0706. [Cross Ref] Wright SI, Vroh Bi I, Schroeder SG, Yamasaki M, Doebley JF, McMullen MD, Gaut BS. The effects of artificial selection on the maize genome. Science. 2005;308:1310–1314. doi: 10.1126/science.1107891. [PubMed] [Cross Ref] Schwartz AS, Pachter L. Multiple alignment by sequence annealing. Bioinformatics. 2006;23:e24–e29. doi: 10.1093/bioinformatics/btl311. [PubMed] [Cross Ref] Ewing B, Green P. Basecalling of automated sequencer traces using phred. II. Error probabilities. Genome Res. 1998;8:186– 194. [PubMed] Ewing B, Hillier L, Wendl M, Green P. Basecalling of automated sequencer traces using phred. I. Accuracy assessment. Genome Res. 1998;8:175–185. [PubMed] AIJRSTEM 19-235; © 2019, AIJRSTEM All Rights Reserved Page 192 Srivastava et al., American International Journal of Research in Science, Technology, Engineering & Mathematics,26(1), March-May 2019, pp. 188-193 [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucl Acids Res. 1994;22:4673–4680. doi: 10.1093/nar/22.22.4673. [PMC free article][PubMed] [Cross Ref] Edgar RC. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl Acids Res. 2004;32:1792–1797. doi: 10.1093/nar/gkh340. [PMC free article] [PubMed] [Cross Ref] Page RDM. TREEVIEW: An application to display phylogenetic trees on personal computers. Comp ApplBiosci. 1996;12:357– 358. [PubMed] Rozas J, Sánchez-DelBarrio SJ, Messeguer X, Rozas R. DnaSP, DNA polymorphism analyses by the coalescent and other methods. Bioinformatics. 2003;19:2496–2497. doi: 10.1093/bioinformatics/btg359. [PubMed] [Cross Ref] Hall TA. BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acids Symp Ser. 1999;41:95–98. Clamp M, Cuff J, Searle SM, Barton JG. The Jalview Java alignment editor. Bioinformatics. 2004;20:426–427. doi: 10.1093/bioinformatics/btg430. [PubMed] [Cross Ref] Pible O, Imbert G, Pellequer J-L. INTERALIGN: interactive alignment editor for distantly related protein sequences. Bioinformatics. 2005;21:3166–3167. doi: 10.1093/bioinformatics/bti474.[PubMed] [Cross Ref] Yamasaki M, Tenaillon MI, Vroh Bi I, Schroeder SG, Sanchez-Villeda H, Doebley JF, Gaut BS, McMullen MD. A large-scale screen for artificial selection in maize identifies candidate agronomic loci for domestication and crop improvement. Plant Cell. 2005;17:2859–2872. doi: 10.1105/tpc.105.037242. [PMC free article] [PubMed] [Cross Ref] Canaran P, Stein L, Ware D. Look-Align: An interactive web-based multiple sequence alignment viewer with polymorphism analysis support. Bioinformatics. 2006;22:885–886. doi: 10.1093/bioinformatics/btl028. [PubMed] [Cross Ref] Skolnick J, Zhang Y, Arakaki AK, Kolinski A, Boniecki M, Szilagyi A, Kihara D, (2003) TOUCHSTONE: A unified approach to protein structure prediction, Proteins (53 Suppl 6):469–479. Stark A., Sunyaev S., Russell R.B. (2003) A model for statistical significance of local similarities in structure. J. Mol. Biol. 2003;326:1307–1316. Sonnhammer, E., Eddy, S., and Durbin, R. (1997). Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins 28, 405–420. Tatusov, R., Galperin, M., Natale, D., and Koonin, E.V. (2000). The COG database: a tool for genome-scale analysis of protein functions and evolution. Nucleic Acids Res 28, 33–36. Pandey. H, Darbari. M and Singh. V.K, “Coalescence of Evolutionary Multi-Objective Decision making approach and Genetic Programming for Selection of Software Quality Parameter”, International Journal of Applied Information System (IJAIS), Foundation of Computer Science, New York, USA, Volume 7, No. 11, PP. ISSN: 2249-0868, Nov. 2014. Bansal. S and Pandey. H, “Develop Framework for selecting best Software Development Methodology”, International Journal of Scientific and Engineering Research, Volume 5, Issue 4, PP. 1067-1070, ISSN: 2229-5518, Apr. 2014. Srivastava. M and Pandey. H, “A Literature Review of E- Learning Model Based on Semantic Web Technology”, International Journal of Scientific and Engineering Research” Volume 5, Issue 10, PP. 174-178, ISSN: 2229-5518, Oct. 2014. Pandey. H and Singh. V.K, “A New NFA Reduction Algorithm for State Minimization Problem”, International Journal of Applied Information Systems (IJAIS), Foundation of Computer Science FCS, New York, USA, Volume 8, No.3, PP. 27-30, ISSN: 22490868, Feb. 2015. Pandey. H and Singh. V.K, “LR Rotation rule for creating Minimal NFA”, International Journal of Applied Information Systems (IJAIS), Foundation of Computer Science FCS, New York, USA, Volume 8, No.6, PP. 1-4, ISSN: 2249-0868, Apr. 2015. Pandey. H and Darbari. M, “Estimation of Software Quality Parameters Using Combination of Quality Function Deployment and Messy Genetic Algorithm”, Grenze Scientific Society, Associate publisher: McGraw-Hill Education, ISBN:978-93-392-2169-0. Feb. 2015. Darbari. M, Srivastava. G and Pandey. H, “New Assumption of Cognitive Model for Information Foraging on Web”, International Journal of Advances in Engineering & Technology, Volume 8, Issue 2, PP. 163-169, ISSN: 22311963, Apr. 2015. Pandey. H and Singh. V.K, “A Fuzzy Logic based Recommender System for E-Learning System with Multi-Agent Framework”, International Journal of Computer Applications, Foundation of Computer Science FCS, New York, USA, Volume 122, No.17, PP. 18-21, ISSN: 0975 – 8887, July. 2015. Rai. V and Pandey.H, “Estimation of Maintainability in Object Oriented Design Phase: State of the art”, International Journal of Scientific and Engineering Research” Volume 6, Issue 9, PP. 25-35, ISSN: 2229-5518, Sept. 2015. Tomita M, Hashimoto K, Takahashi K, Shimizu TS, Matsuzaki Y et. al.: E-Cell: software environment for whole-cell simulation. Bioinformatics 1999, 15:72-84. Von Dassow G, Meir E, Munro EM, Odell GM: The segment polarity network is a robust developmental module. Nature 2000, 406:188-92. Wilson CA, Kreychman J, Gerstein M (2000). Assessing annotation transfer for genomics: Quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores, J MolBiol297(1):233–249. Wu, C., Huang, H., Yeh, L., and Barker, W. (2003).Protein family classification and functional annotation.CompBiolChem 27, 37–47. AIJRSTEM 19-235; © 2019, AIJRSTEM All Rights Reserved Page 193