Wait a second!
More handpicked essays just for you.
More handpicked essays just for you.
Quiz on bioinformatics
Don’t take our word for it - see why 10 million students trust us with their essay needs.
Recommended: Quiz on bioinformatics
1 Chapter 1
1.1 Introduction
1.2 Bioinformatics
The general definition of bioinformatics is a field that uses advanced information and computational techniques to solve complex problems in molecular biology. Bioinformatics uses these advanced computational techniques to manage and extract useful information form the DNA, RNA and protein sequence data being generated and stored in large databases. Certain methods for analysing genomes and protein data have been found to be extremely computationally intensive, providing the need for the use of powerful computers.
In summary, bioinformatics involves the creation of databases algorithms and computational techniques for solving problems in molecular biology and has many practical applications (Luscombe et al. n.d.).
1.2.1 Bioinformatics Research Areas
Bioinformatics is a multi-discipline field of study which include computer science, statistics, mathematics to develop algorithms and systems that are capable of solving molecular biology problems .The primary goal of bioinformatics is to understand and solve complex molecular biology problems. This goal can be achieved by developing and also applying computational techniques and information storage, such as data mining, HCP algorithms and database creation. All these techniques are meant to support multiple areas of scientific research including:
Sequence analysis: these are biological sequence such as DNA, RNA, and protein sequence which are the most fundamental object for a biological system at the molecular level (Rhee & Dickerson 2006). There are tools that are used to extract useful information from these sequences and make meaningful.
Genome annotation: it is a process of attaching biological information to DNA sequences. T...
... middle of paper ...
...g was the solution to all of the problems. Below there are a set of objectives that will be done on the thesis.
Objective 1– Investigate the evolution of sequence alignment algorithms.
Objective 2 – design a timeline for the sequence alignment algorithms
Objective 3– Investigate the High performance computing techniques.
Objective 4– Design and construct cluster for evaluation of the sequence alignment algorithm.
Objective 5– testing of the performance of the cluster environment.
1.7 Expected results
The main goal of this thesis is to address the need of high performance computing in the field of bioinformatics. A detailed timeline of the sequence alignment algorithms will be presented. The need for high performance computing will be showed by a series of experiments where will be comparing the performance of a personal computer against cluster computer.
"Polymerase Chain Reaction (PCR) Fact Sheet." National Human Genome Research Institute. 10 Dec. 2007. National Institutes of Health. .
Proteogenomics is a kind of science field that includes proteomics and genomics. Proteomic consists of protein sequence information and genomic consists of genome sequence information. It is used to annotate whole genome and protein coding genes. Proteomic data provides genome analysis by showing genome annotation and using of peptides that is gained from expressed proteins and it can be used to correct coding regions.Identities of protein coding regions in terms of function and sequence is more important than nucleotide sequences because protein coding genes have more function in a cell than other nucleotide sequences. Genome annotation process includes all experimental and computational stages.These stages can be identification of a gene ,function and structure of a gene and coding region locations.To carry out these processes, ab initio gene prediction methods can be used to predict exon and splice sites. Annotation of protein coding genes is very time consuming process ,therefore gene prediction methods are used for genome annotations. Some web site programs provides these genome annotations such as NCBI and Ensembl. These tools shows sequenced genomes and gives more accurate gene annotations. However, these tools may not explain the presence of a protein. Main idea of proteogenomic methods is to identify peptides in samples by using these tools and also with the help of mass spectrometry.Mass spectrometry searches translation of genome sequences rather than protein database searching. This method also annotate protein protein interactions.MS/MS data searching against translation of genome can determine and identify peptide sequences.Thus genome data can be understood by using genomic and transcriptomic information with this proteogenomic methods and tools. Many of proteomic information can be achieved by gene prediction algorithms, cDNA sequences and comparative genomics. Large proteomic datasets can be gained by peptide mass spectrophotometry for proteogenomics because it uses proteomic data to annotate genome. If there is genome sequence data for an organism or closely related genomes are present,proteogenomic tools can be used. Gained proteogenomic data provides comparing of these data between many related species and shows homology relationships among many species proteins to make annotations with high accuracy.From these studies, proteogenomic data demonstrates frame shifts regions, gene start sites and exon and intron boundaries , alternative splicing sites and its detection , proteolytic sites that is found in proteins, prediction of genes and post translational modification sites for protein.
... similarities between proteins exhibiting homology, and inspecting the AFP nucleic acid sequence in comparison with proteins showing similarities.
A genome is all of the DNA in a given organism. The DNA is split up into smaller groups of nucleotides called genes. Every gene contains the information for the production of a different protein. The human genome was once thought to have over 100,000 genes but it was recently found to have around 30,000 genes. The proteins produced by the genes determine different characteristics of the organism such as hair color, the ability to fight infection, some aspects of behavior, all enzymes, hormones, and almost all other characteristics of the organism. Genes are passed down from generation to generation.
An important biological consideration that dictates which bioinformatics tools should be used is whether sequence data is taken from a prokaryotic or eukaryotic organism. Many tools will have options to select what classification your sequence comes from and some will only work with a certain classification. This is because there are major differences in the organisation and processing of genetic information between prokaryotes and eukaryotes.
Genomic sequencing is the process of determining the precise order of nucleotides within a DNA molecule. It includes any method or technology that is used to determine the order of the four DNA bases – thiamine, adenine, guanine, and cytosine– in the strand of DNA (NHGRI, 2011). In each organism, these bases are arranged in a unique and specific sequence, and it is this sequence that is the genetic code of the organism. Genomic sequencing has had an impact on nearly every field of biological research including human genetics and genomics, plants and agriculture, microbes, medicine, viruses and infectious diseases, environmental genetics and evolutionary biology. By first examining the development of gene sequencing technology we will be able to view its role in evolutionary biology, its contribution to phylogenetics, and how it has changed our understanding of the biological tree of life.
Modern techniques , rather than the gene map , maps the map of the DNA within the gene itself : the positions of short sequences " marker " are used as markers signaling over the cromosssomas . Once a gene is discovered, it is necessary to unravel its base sequence prior to its function being studied . The sequencing has become easier with the development of methods for cloning the DNA - producing large amounts of identical fragments. In the method most widely used DNA sequencing , the chain is denatured into single strands . These are then used as templates for DNA synthesis , but such that replication to as the double helix reaches a certain growth in the mold base . In addition to provide DNA polymerase and the four bases, A - G -C- T, also using small amounts of these dideoxynucleotide bases. This is incorporated , as the normal bases, the double helix growth but prevent the continuation of the chain. The fragments are then separated by gel electrophoresis and the base seq...
Genomics is undergoing rapid development from the analysis, mapping and sequencing of genomes to development about genome function. [Hieter and Boguski, 1997] Genomics looks at the analysis of DNA sequences whilst functional genomics is used to understand the relation of genes and proteins. [Fields et al., 1999] The analysis of genomes has more recently been divided into two groups; functional and structural genomics. Structural genomics is the first phase of genome analysis, which produces an organisms’ genetic, transcript and physical maps. [Hieter and Boguski, 1997] The purpose of structural genomics is the allocation of three-dimensional structures to proteomes; which has given a new viewpoint on protein families and folds, and domain structures within gene sequences. [Teichmann et al., 1999]
The use of genetic sequencing in the medical field has innumerable possibilities; genomic medicine, as this new field is now called, will enable the human race to make immense advances in understanding how our genetic heredity makes us susceptible to some illnesses and immune to others. The detection of diseases with a high rate of heredity is just one facet of the gem that is genomics; once researchers are able to map out all of the vital components and rare alleles that sometimes play a large factor in disease, it will be possible to target these specific gene combinations, functional elements, and alleles. Because of the fact that protein, produced by our cells’ ribosomes, has an effect on the pathways that help express our inherited traits, it is important that we understand the relationship between DNA and protein, and how this affects the phenotype of an individual’s genetic attributes. For example, sickle-cell anemia is caused by a flaw in one nitrogenous base sequence in DNA. This flaw then translates into RNA, then into amino acids that determine the phenotype that the subject will have. The discrepancy in something as minute as a nitrogenous base and one amino acid makes the difference between a healthy, normal life and a life ...
DNA sequencing is a way of identifying genes based on their DNA sequences. It is important when comparing base sequences of different organisms to determine the relationship between them. Polymerase chain reaction its the process of making several copies of DNA. When picking a sequence of DNA to copy, it is important to choose something universal that way the sequences can be accurately compared. A good choice is the Cytochrome Oxidase Gene found in the mitochondria. It is found in almost all living organisms because COI helps make ATP and living things need energy to live. The sequencing of COI can lead to major leaps in biotechnology in the area of identifying unknown species on a regular and consistent basis.
Sections of DNA contain sequences of bases that repeat several times (Saferstein 44). Genes contain the code for making proteins and arrange them int...
DNA Computing is a Bio-molecular Computation (BMC) which makes use of biological methods for performing massively parallel computation. This can be a lot quicker than a conventional Silicon Chip computer, for which large quantities of hardware needed for performing parallel computation. These DNA computers [] don’t just make use of massively parallel computation, but also uses ultra-compact information storage in which large amount of information that can be stashed in a more compact away with which massively exceeds in conventional electronic media, (i.e., A single gram of DNA[] comprises 1021 DNA bases which equals to 108 terabytes. A hardly few grams of DNA possibly contains all data stored in world.
VMD can be useful to a range of audience, molecular structural data obtained from VMD can be integrated with bioinformatics, which will then provide useful information to researchers of biological system, not only individuals involved with the field of biological science, but also theoretical and experimental researchers of chemical science can utilize the information to scrutinize the chemical structure of molecules. It can also be used in educational institutions to display molecules in a very descriptive manner to students, to give them a broader idea of the structural function.
Every cell in every living organism contains DNA, or deoxyribonucleic acid. DNA is wound up around proteins to form chromosomes, and along these chromosomes are sections which code for different traits in the organism, known as genes. Thus the program of genetics is written in the language of DNA (Steitz undated). Chromosomes are comprised of thousands of genes, each having specific sequences of nucleotides which code for specific traits in the organism or functions within each cell. These features could include eye or hair colour of a human, or a specific protein or enzyme which can produce an organism’s inherited traits (Steitz undated).
I have always been fascinated by Biology and Computer Science which propelled me to take up my undergraduate studies in the field of Bioinformatics. As a part of my undergraduate curriculum, I have been exposed to a variety of subjects such as “Introduction to Algorithms”, “System Biology”, “PERL for Bioinformatics”, “Python”, “Structure and Molecular Modeling” and “Genomics and Proteomics” which had invoked my interest in areas such as docking algorithms, protein structure prediction, practical aspects of setting and running simulation, gene expression prediction through computational analysis. These fields have both a strong computational flavour as well as the potential for research which is what attracts me towards them.