First bioinformatics primer for undergraduates, co-authored by a biologist and computer scientist. Focuses on fundamentally important algorithms at the core of bioinformatics. Integrates biology and computer science in an accessible way for biology students with limited experience in computer programming.
I. MOLECULAR BIOLOGY AND BIOLOGICAL CHEMISTRY. The genetic material. Nucleotides. Orientation. Base pairing. The central dogma of molecular biology. Gene structure and information content. Promoter sequences. The genetic code. Open reading frames. Introns and exons. Protein structure and function. Primary structure. Secondary, tertiary and quaternary structure. The nature of chemical bonds. Anatomy of an atom. Valence. Electronegativity. Hydrophilicity and hydrophobicity. Molecular biology tools. Restriction enzymes. Gel electrophoresis. Blotting, hybridization and microarrays. Cloning. Polymerase chain reaction (PCR). DNA sequencing. Genomic information content. C value paradox. Reassociation kinetics. II. DATA SEARCHES AND PAIRWISE ALIGNMENTS. Dot plots. Simple alignments. Scoring. Gaps. Simple gap penalties. Origination and length penalties. Scoring matrices. Dynamic programming: The Needleman and Wunsch algorithm. Local and global alignments. Global and Semi-global alignments. The Smith-Waterman algorithm. Database searches. BLAST and its relatives. Other algorithms. Multiple sequence alignments. III. SUBSTITUTION PATTERNS. Patterns of substitutions within genes. Mutation rates. Functional constraint. Synonymous vs. nonsynonymous changes. Indels and psuedogenes. Substitutions vs. mutations. Fixation. Estimating substitution numbers. Jukes/Cantor model. Transitions and transversions. Kimura's two-parameter model. Models with even more parameters. Substitutions between protein sequences. Variations in substitution rates between genes. Molecular clocks. Relative rate tests. Causes of rate variation in lineages. Evolution in organelles. IV. DISTANCE-BASED METHODS OF PHYLOGENETICS. History of molecular phylogenetics. Advantages to molecular phylogenies. Phylogenetic trees. Terminology of tree reconstruction. Rooted and unrooted trees. Gene vs. species trees. Character and distance data. Distance matrix methods. UPGMA. Estimation of branch lengths. Transformed distance method. Neighbor's relation method. Neighbor-joining methods. Maximum likelihood approaches. Multiple sequence alignments. V. CHARACTER-BASED APPROACHES TO PHYLOGENETICS. Parsimony. Informative and uninformative sites. Unweighted parsimony. Weighted parsimony. Inferred ancestral sequences. Strategies for faster searches. Branch and bound. Heuristic. Consensus trees. Tree confidence. Bootstrapping. Parametric tests. Comparison of phylogenetic methods. Molecular phylogenies. The tree of life. Human origins. VI. GENOMICS AND GENE RECOGNITION. Prokaryotic genomes. Prokaryotic gene structure. Promoter elements. Open reading frames. Conceptual translation. Termination sequences. GC-content. Prokaryotic gene density. Eukaryotic genomes. Eukaryotic gene structure. Promoter elements. Regulatory protein binding sites. Open reading frames. Introns and exons. Alternative splicing. CpG islands. GC-content. Isochores. Codon usage bias. Gene expression. cDNAs and ESTs. Serial analysis of gene expression (SAGE). Microarrays. Transposition. Repetitive elements. Eukaryotic gene density. VII. PROTEIN FOLDING. Polypeptide composition. Amino acids. Backbone flexibility, phi and psi. Secondary structure. Accuracy of predictions. Chou-Fasman/GOR method. Tertiary and quaternary structure. Hydrophobicity. Disulfide bonds. Active structures vs. most stable structures. Protein folding. Lattice models. Off-lattice models. Energy functions and optimization. Structure prediction. Comparative modeling. Threading: Reverse protein folding. Predicting RNA secondary structures. VIII. PROTEOMICS. From genomes to proteomes. Protein classification. Enzyme nomenclature. Families and superfamilies. Folds. Experimental techniques. 2D electrophoresis. Mass spectrometry. Protein microarrays. Inhibitors and drug design. Ligand screening. Docking. Database screening. X-ray crystal structures. Empirical methods and prediction techniques. Postranslational modification prediction. Protein sorting. Proteolytic cleavage. Glycosylation. Phosporylation and sulfation. Appendix 1: A gentle introduction to programming and data structures. Introduction. The basics. Creating and compiling computer programs. Variables and values. Data typing. Basic operations. Program control. Statements and blocks. Conditional execution. Loops. Readability. Structured programming. Comments. Descriptive variable names. Data structures. Arrays. Pointers and dynamic memory allocation. Strings in PERL. Input and output. Appendix 2: Enzyme kinetics. Enzymes as biological catalysts. The Henri-Michaelis-Menten equation. Vmax and Km. Direct plot. Lineweaver-Burk reciprocal plot. Eadie-Hofstee plot. Simple inhibition systems. Competitive inhibition. Noncompetitive inhibition. Reversible and irreversible inhibition. Effects of pH and temperature. Appendix 3: Sample programs in PERL and worksets. Conceptual translation. Dot matrix. Relative rate test. UPGMA. Common ancestor. Splice junction recognition. Hydrophobicity calculator. DNA binding domains. Lineweaver-Burk plot.