Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
236 views

Lab Report 1 Bioinformatics

The document is a lab report summarizing an experiment exploring biological databases with reference to NCBI. It provides background on biological databases and describes an aim to view and use various databases available online. The procedure involves exploring specific databases to analyze information on 7 different proteins from various organisms. For each protein, the report lists the number of search results, relevant accession numbers, FASTA sequences, and screenshots of database outputs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
236 views

Lab Report 1 Bioinformatics

The document is a lab report summarizing an experiment exploring biological databases with reference to NCBI. It provides background on biological databases and describes an aim to view and use various databases available online. The procedure involves exploring specific databases to analyze information on 7 different proteins from various organisms. For each protein, the report lists the number of search results, relevant accession numbers, FASTA sequences, and screenshots of database outputs.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

SCHOOL OF PHARMACY

LAB REPORT 1

BIOLOGICAL DATABASES WITH


REFERENCE TO NCBI

BIOINFORMATICS
(PBI20201P)
NAME STUDENT ID
RABIATUL ADAWIYAH BINTI 012020091691
HASBULLAH

PROGRAMME:
BACHELOR OF PHARMACEUTICAL
TECHNOLOGY (BPHT)

LECTURER :
AP DR SANTOSH FATTEPUR AND
DR ALICIA NG

DATE OF SUBMISSION:
2nd DECEMBER 2020
Practical 1: Biological Databases with Reference to NCBI

Biological data is highly complex and interrelated. Vast amount of biological information needs
to be stored organized and indexed so that the information can be retrieved and used. There
are five major types of biological databases namely nucleotide databases, protein databases,
protein structure databases, metabolic pathway databases and the bibliographic databases.

Introduction:

The collection of the biological data on a computer which can be controlled to seem in
shifting arrangements and subsets is respected as a database. The biological information
can be put away in different databases. Each database has its possess site with special
route devices. The biological databases are, in common, freely available. Biological
databases can be generally partitioned into two categories which are primary databases and
secondary databases. Primary databases are also called as archieval database. They are
populated with tentatively derived information such as nucleotide sequence, protein
sequence, or macromolecular structure. Experimental results are submitted specifically into
the database by analysts, and the information is basically archival in nature. Once given a
database increase number, the information in primary databases are never changed: they
frame part of the scientific record. For examples, ENA, GenBank and DDBJ (nucleotide
sequence), Array Express Archive and GEO (functional genomics data) and Protein Data
Bank (PDB; coordinates of three-dimensional macromolecular structures). Next, Secondary
databases include information inferred from the comes about of analysing primary data.
Secondary databases regularly draw upon data from various sources, counting other
databases (primary and secondary), controlled vocabularies, and logical literature. They are
profoundly curated, frequently using a complex combination of computational calculations
and manual analysis and translation to determine new information from the public record of
science. For examples, InterPro (protein families, motifs and domains), UniProt
Knowledgebase (sequence and functional information on proteins) and Ensembl (variation,
function, regulation and more layered onto whole genome sequences). The importance of
biological databases include databases are utilized to store and organize information in such
a way that data can be recovered effortlessly via an assortment of look criteria. Next, it
permits knowledge revelation, which alludes to the identification of associations between
pieces of data that were not known when the data was, to begin with, entered. This
encourages the revelation of modern biological experiences from raw information. Lastly, it
helps to solve cases where numerous clients need to get to the same sections of
information. NCBI is presently a driving source for public biomedical databases, computer
program devices for analyzing atomic and genomic information, and investigation in
computational science. Nowadays NCBI makes and keeps up over 40 coordinates
databases for the therapeutic and logical communities as well as the general public. NCBI
provides a wide assortment of information analysis tools such as literature, health, genomes,
genes, proteins, and chemicals that permit clients to control, adjust, visualize, and assess
organic information.

Aim

To view and use the various biological databases available on the World Wide Web.
Procedure:

1. Open your web browser and type the web address of the required database.
2. Explore the database and analyze the various information available in the database.
3. Use the tools provided by the databases.
4. Print screen your output and paste on MS word.

You are required to explore the database and analyze the various information available in the
database based on SEVEN (7) proteins as follow :

i. Cystatin C (Organism: Human)


ii. G-protein (Organism: Daphnia magna)
iii. Keratin (Organism: Mus musculus)
iv. Albumin (Organism: Theobroma cacao)
v. Hemoglobin (Organism: Staphylococcus aureus)
vi. Collagen (Organism: Human)
vii. Myosin (Organism: Zea mays)
i. Cystatin C (Organism: Human)
 Number for total search items

 Number of search items based on top organisms

 Accession number

ACCESSION CAA29096

 Sequence in FASTA format

>CAA29096.1cystatinC[Homosapiens]MAGPLRAPLLLLAILAVALAVSPAAGSSPGKPPRLV
GGPMDASVEEEGVRRALDFAVGEYNKASNDMYHSRALQVVRARKQIVAGVNYFLDVELG
RTTCTKTQPNLDNCPFHDQPHLKRKAFCSFQIYAVPWQGTMTLSKSTCQDA

 Graphics
ii. G-protein (Organism: Daphnia magna)

 Number for total search items

 Number of search items based on top organisms

 Accession number

ACCESSION KZS12823

 Sequence in FASTA format

>KZS12823.1 putative G-protein coupled receptor Mth 5 [Daphnia magna]


MSEWIATQLLLLIIQLASSTALENLESDGPVVGRSSYWIHVQKCCPEAHMMVEVASRTTATT
HNTGSKFECQLQNDTTFLWAPDFLDEQNKLQAFEGSLDAESVDSDFNITKFCIPNGPCISAI
VGKPQCEPYRAWPIFTYAGFQEDLQLRPFGVLRHVVDKLAQVPRHHEYALDQYCVDGVRL
LGSVSHYGHGLPDGAENTSSAPLSPPSDGPVYYALICEPGPLDESDFFEMFVQVFYPIGLG
VGLLVLLTLTGIHLVLKELRDLSGCMLISLVVSMIVTLTSNLILSAADSRPSPYLNLLFLESVVH
GSDVAVHFWLSAIGHRAWTAVRFPRKEAMRPEGKRYVFYSLYSWGSAVCVTGLAVLVHFF
MEDQSLGTTSPHSFFTWYRIGWLGLALFCSTCVFLFLVNIYVYFATRSTLNNQTNYGRTFH
RTKGYFRAFTRLFLIVETVWIIQTFSWLQYPILVGLRILADITLPFLIFWAALKGRRVMHLLKIR
LQMSQCWLCRRCFPPKTRDRGHYYGEEMMALGTPI

 Graphics
iii. Keratin (Organism: Mus musculus)

 Number for total search items

 Number of search items based on top organisms

 Accession number

ACCESSION AAA39370

 Sequence in FASTA format

>AAA39370.1keratin,partial[Musmusculus]EVVKKQCIGVQDSIADAEQHGEHAIKDARGKLT
DLEEALQQCREDLARLLRDYQELMNTKLSLDVEIATYRKLLEGEECRMSGDFSDNVSVSITS
STISSSMASKTGFGSGGQSSGGRGSYGGRGGGGGGGSSYGSGGRSSGSRGSGSGSGG
GGYSSGGGSRGGSGGGYGSGGGSRGGSGGGYGSGGGSGSGGGYSSGGGSRGGSGG
GGASSGGGSRGGSSSGGGSRGGSSSGGGGYSSGGGSRGGSSSGGQDLALKREVLGQG
KVVAQV

 Graphics
iv. Albumin (Organism: Theobroma cacao)

 Number for total search items

 Number of search items based on top organisms

 Accession number

ACCESSION EOY34699

 Sequence in FASTA format

>EOY34699.1 Bifunctional inhibitor/lipid-transfer protein/seed storage 2S albumin superfamily


protein,putative[Theobromacacao]MPTILLMENSSSPTTRNCPGPTHALYKKARASPIRGSHN
KLSGKNMPKVGLMLLLVSMLVAAVPRHVNAAITCEEVTYYLIPCIGYGVFGGTVAPSCCTGI
KTLDAAAKTTEDRREKCNCVKEGAARIPGLNYTRVNEIPGLRGTTCPYKVTPDVDCSKVN

 Graphics
v. Hemoglobin (Organism: Staphylococcus aureus)

 Number for total search items

 Number of search items based on top organisms

 Accession number

ACCESSION EFE26080

 Sequence in FASTA format

>EFE26080.1 hemoglobin [Staphylococcus aureus subsp. aureus 58-424]


MTTTPYDIIGKEALYDMIDYFYTLVEKDERLNHLFPGDFAETSRKQKQFLTQFLGGPNIYTEE
HGHPMLRKRHMAFTITEFERDAWLENMQTAINRAAFPQGVGDYLFERLRLTANHMVNS

 Graphics
vi. Collagen (Organism: Human)

 Number for total search items

 Number of search items based on top organisms

 Accession number

ACCESSION BAA04809
 Sequence in FASTA format

>BAA04809.1 collagen [Homo sapiens]


MHPGLWLLLVTLCLTEELAAAGEKSYGKPCGGQDCSGSCQCFPEKGARGRPGPIGIQGPT
GPQGFTGSTGLSGLKGERGFPGLLGPYGPKGDKGPMGVPGFLGINGIPGHPGQPGPRGP
PGLDGCNGTQGAVGFPGPDGYPGLLGPPGLPGQKGSKGDPVLAPGSFKGMKGDPGLPG
LDGITGPQGAPGFPGAVGPAGPPGLQGPPGPPGPLGPDGNMGLGFQGEKGVKGDVGLP
GPAGPPPSTGELEFMGFPKGKKGSKGEPGPKGFPGISGPPGFPGLGTTGEKGEKGEKGIP
GLPGPRGPMGSEGVQGPPGQQGKKGTLGFPGLNGFQGIEGQKGDIGLPGPDVFIDIDGAV
ISGNPGDPGVPGLPGLKGDEGIQGLRGPSGVPGLPALSGVPGALGPQGFPGLKGDQGNP
GRTTIGAAGLPGRDGLPGPPGPPGPPSPEFETETLHNKESGFPGLRGEQGPKGNLGLKGI
KGDSGFCACDGGVPNTGPPGEPGPPGPWGLIGLPGLKGARGDRGSGGAQGPAGAPGLV
GPLGPSGPKGKKGEPILSTIQGMPGDRGDSGSQGFRGVIGEPGKDGVPGLPGLPGLPGD
GGQGFPGEKGLPGLPGEKGHPGPPGLPGNGLPGLPGPRGLPGDKGKDGLPGQQGLPGS
KGITLPCIIPGSYGPSGFPGTPGFPGPKGSRGLPGTPGQPGSSGSKGEPGSPGLVHLPELP
GFPGPRGEKGLPGFPGLPGKDGLPGMIGSPGLPGSKGATGDIFGAENGAPGEQGLQGLT
GHKGFLGDSGLPGLKGVHGKPGLLGPKGERGSPGTPGQVGQPGTPGSSGPYGIKGKSGL
PGAPGFPGISGHPGKKGTRGKKGPPGSIVKKGLPGLKGLPGNPGLVGLKGSPGSPGVAGL
PALSGPKGEKGSVGFVGFPGIPGLPGISGTRGLKGIPGSTGKMGPSGRAGTPGEKGDRGN
PGPVGIPSPRRPMSNLWLKGDKGSQGSAGSNGFPGPRGDKGEAGRPGPPGLPGAPGLP
GIIKGVSGKPGPPGFMGIRGLPGLKGSSGITGFPGMPGESGSQGIRGSPGLPGASGLPGLK
GDNGQTVEISGSPGPKGQPGESGFKGTKGRDGLIGNIGFPGNKGEDGKVGVSGDVGLPG
APGFPGVAGMRGEPGLPGSSGHQGAIGPLGSPGLIGPKGFPGFPGLHGLNGLPGTKGTH
GTPGPSITGVPGPAGLPGPKGEKGYPGIGIGAPGKPGLRGQKGDRGFPGLQGPAGLPGAP
GISLPSLIAGQPGDPGRPGLDGERGRPGPAGPPGPPGPSSNQGDTGDPGFPGIPGFSGLP
GELGLKGMRGEPGFMGTPGKVGPPGDPGFPGMKGKAGARGSSGLQGDPGQTPTAEAV
QVPPGPLGLPGIDGIPGLTGDPGAQGPVGLQGSKGLPGIPGKDGPSGLPGPPGALGDPGL
PGLQGPPGFEGAPGQQGPFGMPGMPGQSMRVGYTLVKHSQSEQVPPCPIGMSQLWVG
YSLLFVEGQEKAHNQDLGFAGSCLPRFSTMPFIYCNINEVCHYARRNDKSYWLSTTAPIPM
MPVSQTQIPQYISRCSVCEAPSQAIAVHSQDITIPQCPLGWRSLWIGYSFLMHTAAGAEGG
GQSLVSPGSCLEDFRATPFIECSGARGTCHYFANKYSFWLTTVEERQQFGELPVSETLKAG
QLHTRVSRCQVCMKSL

 Graphics
vii. Myosin (Organism: Zea mays)

 Number for total search items

 Number of search items based on top organisms

 Accession number

ACCESSION AHI45153
 Sequence in FASTA format

>AHI45153.1 myosin [Zea mays]


MASMLNIVIGSHVWVEDKDLSWVDGEVFRIDGQNAHVHTTKGKTVIANISNIHPKDTEAPPD
GVDDMTRLSYLHEPGVLDNLAVRYAKNIIYTYTGNILIAINPFQRLPSLVDALTMEKYKGANL
GDLDPHVFAIADAAYRQMINEGKSNSVLVSGESGAGKTETTKLLMRYLAFLGGRSGTGERT
VEQQVLESNPVLEAFGNAKTVRNNNSSRFGKFVEIQFDKSGKISGAAIRTYLLERSRVCQIN
SPERNYHCFYFLCAAPSEDLKKYKLGDPSSFHYLNQSACIQVDGINDAEEYLATRNAMDTV
GITDQEQEAIFRVVAAVLHLGNINFAKGREVDSSIIKDDKSRFHLKTAGELLMCDCEKLENALI
KREINTPEGVITTTVGPNSATISRDGLAKQIYSRLFEWLVNRINASIGQDPDSNKLIGVLDIYG
FESFKTNSFEQLCINFTNEKLQQHFNQNVFKMEQEEYTREQINWSYIEFVDNQ
DVLDLIEKKPGGIIALLDEACMFPKSTHETLSQKLYEKFKTHKRFTKPKLSRTAFTIQHYAGD
VTYQSDQFLDKNKDYVVAEHQELLNGSKCSFVSGLFPPATEENTKSSKSSIATRFKMQLHE
LMETLSSTEPHYIRCIKPNSVLKPGIFENTNVLQQLRCSGVLEAIRISCAGYPTRKQFHDFLH
RFRVLAPEILKEKNDEKVSCQKILDKMGLQGYQIGRTKVFLRAGQMAELDGRRTEMRNSAA
RGLQSQFRTHVAREQFLVLRDTSICLQSFVRARLACKQHELLRQQTAVLRIQKNARWYFA
WKTYYQMRLSAITLQAGLRAMAARNEFTFRKRNKASVHIQSQWRCHRDYSNFMKLKRAAL
TYQCAWRRSVARKELRKLKMAARDTQALKVAKEKLEEHVEELKSCLGREKKLRADFEKSK
AEEVSKLKEALYEMEQQVEEVKAMQEQESAKKAVEEALTQEREKISLLTTEIEGLNALLVAE
REENDVMKKVHANALETNEELNKKISDADEKIKQFSDTVQRLEGTVSEASLLTEKEQNASTL
KLLTEAQLRIEELIKKLEGSNRKSDSLLDTITRLEQDVSAKEVLLLTEKQAHEATRKTLTEVQE
KSEELLKKIHDNDKHILQLQFTIQRLEETTVANENLLLREREQNDTTTKAHNESQEKYEELLK
KFIDVDRKIDLLQGTIERFGENTTAKDSLLLSERHEKAAIKKALTEAEEKNEELLMKVEDANE
KIGHLQTKINMLEDNVAAKDVSLEAAIKENDATRKSLTEAQERNGELLKKISDSDYRIHLLQD
TVQKLQVDAISRLSSFVMEKQESDIAKRAVTEAHERNEDLLKRNEDLLKRNDDLIKKIEDSSK
IVTQLQEALQRLEGKASNLEVENQVLRQHATSTPPSTAKSPASRSKISRIHVKSEENGHILN
GDIRQTEMKPSTGTSAAITSVVNVPDLGDQKEFEHGEKLQRIPKQKYQPSHNQQPQDDQQ
WLVTCISQYLGFSGSKPVAALLIYQCLLHWKSFEAMKTGVFDSILHAINSATEVQNDMRTLA
YWLSNLSTLTVFLQRSFKTTRTTISTPQRRRFSSERMFGNQTSNAGLAYLSGQSVGSAGLL
QVEAKYPALLFKQQLVDLIEKVYGMISDSVKKELNPLLELCIQDPQTSQSSIAKGNLNGMGQ
QNQLTHWLGIVKILTSYLDVLRENHVPSILVHKLFTQMFSLIDVQLFNRLLLRRECCSFSNGE
YIRAGLAELKHWSDNATREFAGSAWEALRHIRQAVDFLVISLKPMRTLREIRTDVCPALSIQ
QLERIVSMYWDDVNGTNTISAEFTSSLKSAIRDESNMATNFSILLDDDSSIPFSLDDITKTLPV
IELADDDFLPFVHENPSFAFLLQRGE

 Graphics
Conclusion

Biological databases play a central part in bioinformatics. They offer researchers the
opportunity to get to a wide assortment of biologically important data, including the genomic
sequences of a progressively wide extend of organisms. This unit gives a brief outline of major
sequence databases and entries, such as GenBank, the UCSC Genome Browser, and
Ensembl. Demonstrate living being databases, counting WormBase, the Arabidopsis Data
Asset (TAIR), and those made accessible through the Mouse Genome Informatics (MGI) asset
are too secured. Non-sequence-centric databases, such as Online Mendelian Legacy in Man
(OMIM), the Protein Data Bank (PDB), MetaCyc, and the Kyoto Encyclopedia of Genes and
Genomes (KEGG) are moreover talked about.

You might also like