# the human genome project fostered development of faster, less expensive sequencing techniques
...
genomics
approach for studying whole sets of genes and their interactions, from SEQUENCING of GENOME of any species
bioinformatics
application of computational methods to store and analyze biological data, organizing the data from genomics
human genome project
project for sequencing the entire human genome, publicly finded, 20 large sequencing centers in 6 countries + smaller labs for smaller projects
in the human genome project, individuals’ dna was sequenced, scientists reviewed and agreed on a ___
reference genome
reference genome
a full sequence that best represents the genome of a species
goal in mapping genome
determine complete nucleotide seq of each chromosome
human genome mapping accomplished by
sequencing machines, dideoxy (ddNTP) chain termination
whole genome shotgun approach
initial approach to map human genome; j craig venter 1.) cloning + seq random dna fragments (of randomly cut dna) 2.) computer assembles overlapping short seq into contin. Seq
metagenomics
helped tech advances; (meta = beyond); dna from community of species collected from environmental sample then sequenced. Computer sorts partial seq + assembles into individual parts of species’ genome Advantage: can seq dna of mix microbial population, no need for culturing each species in lab # scientists use bioinformatics to analyze genomes and their functions ## centralized resources for analyzing genome sequences
ncbi
national center for biotech info; maintained by nlm (lib of medicine) + nih (institute of health) Databases, software…
genbank
ncbi sequences database
blast
software available on ncbi (basic local alignment search tool); compare a dna sequence w every sequence in genbank
Conserved domains
common stretches of amino acids ## identifying protein coding genes and understanding their functions Goal: dna seq → id protein coding gene → id function
Gene annotation
uses 3 lines of evidence to identify a gene Search for patterns that indicate gene presence
Gene presence indicator
transcriptional, translational stop + start signals, rna splicing sites, telltale signs of protein coding genes (promotor sequences), short seq (specify mRNA)
ESTs
expressed sequence tags::short seq (specify mRNA)
knocking out
blocking or disabling a gene to see how phenotype is affected. E.g. crispr cas9 system = technique used to block gene function ## understanding genes and gene expression at the systems level
epigenome
epigenetic features of the genome of hundreds of human cell types + tissue ### systems bio
proteomics
approach of studying sets of proteins and their properties (abundance, modifications, interactions)
proteome
entire set of proteins expressed by a cell or group of cells
systems biology
aims to model the dynamic behavior of both genes and proteins, the interactions among the system’s parts Gene and protein interaction networks in saccharomyces cerevisiae yeast, knock out pairs of genes to make doubly mutant cell. Fitness compared to single mutants, if matches, then genes didnt interact, if doesn't, then genes interacted ### application of systems bio + medicine
metastatic tumors
tumors that have dispersed from primary tumors and invaded organs far away in the body # genomes vary in size, number of genes, gene density Difference in Mb (million base parts) between prokaryotes and eukaryotes, but not amongst eularyotes (jap canopy has 149000Mb, human 3000, no systematic relationship between genome and phenotype) ## number of genes How can humans (vertebrate) have nearly the same amount of genes as nematodes?::alternative splicing of rna transcripts, multiple proteins made from one gene in two ways: hundreds of alternatives and 2 alternatives only ## gene density and noncoding dna Humans have more base pairs than bacteria but less genes (more noncoding dna, alternative splicing), so density is lower than bacteria # multicellular eukaryotes have a lot of noncoding dna and many multigene families
pseudogenes
former games that have accumulated mutations over a long time and no longer produces functional protein , unique noncoding dna
most of DNA between functional genes are
repetitive DNA
repetitive DNA
consists of sequences present in multiple copies in the genome ## transposable elements and related sequences
transposable elements
stretch of dna in both pro and eukaruotes that can move from one location to another within the genome During transposition, these genetic elements move from one site in a cell’s dna to a diff target site by a recombination process Never detcach from dna, og and new dna sites js brought rlly close together by enzymes and other proteins (bending dna) Two types: transposon + retrotransposon ### transposon and retrotransposon movement
transposons
transposable element; move within a genome by a dna intermediate Can cut and paste, removing original element from og site, Can copy and paste, leaving copy of og behind Both mechanisms require transposase, encoded by transposon
Transposase
protein that allows for the transposon dna property, encoded by transposon
retrotransposon
transposable element; move by means of an rna intermediate, a transcript of retrotransposon dna Always leaves a copy at the original site during transposition Synthesis of single strand rna intermediate of retrotransposon Reverse transcriptase synthesizes dna strand complementary to rna strand Reverse transcriptase synthesizes second dna, complementary to dna strand made in 2 Mobile copy of reverse transposon made and inserted (insertation) to new dna site ### sequences related to transposable elements
alu elements
shorter than most transposable elements, dont code for proteins but are transcribed into rna, some of which are thought to help in gene expression
line1/L1
type of retrotransposon, longer than alu elements, low rate of transposition, transcription of these retrotransposons = crucial for development of early embryos Some transposable elements can encode proteins but these proteins dont carry out normal cellular function, so theyre still noncoding ## other repetitive dna (e.g. Single seq dna)
repetitive dna not relatable to transposable elements probably arose from ___
mistakes during dna replication/recombination Eg: duplications of long stretches of dna, simple sequence dna
simple sequence dna
stretcges if dba containing many copies of tandemly repeated short seq (2-500 nucleotide patterns)
short tandem repeat (STR)
series when a simple sequence dna strand unit is 2-5 nucleotides
STRs provide challenges for
whole genome shotgun sequences b/c short repeats hinders accurate fragment reassembly in computers, leads to sequences being permanent drafts!! ## genes and multigene families
multigene families
collections of two or more identical or very similar genes, arose from duplication from the same gene, like globin family # duplication, rearrangement, mutation of dna contribute to genome evolution ## duplication of entire chromosome sets
what facilitates evolution of genes
polyploidy usually thru accident in meiosis (such as failure to separate homologs in meiosis 1). A set og polyploid genes can provide essential functions for an organism and it can accumulate mutations and diverge. Related to plant speciation ## alterations of chromosome structure ## duplication and divergence of gene size regions of dna
lysozymes
enzyme helping protect animals against bacterial infections by hydrolyzing bacterial cell walls
alpha-lactalbumin
nonenzymatic protein playing a role in milk production in mammals
___ present in birds but not rest of mammals. ___ is an evolved version of ___ (protein associated with key mammalian function of milk production
lysozyme, alpha-lactalbumin, lysozyme
__ may have promoted evolution of new proteins by ))
introns; facilitating the duplication of shuffling exons ## rearrangements of parts of genes: exon duplication and shuffling
exon shuffling
occasional mixing and matching of diff exons within a gene or two different genes (nonallelic). Could lead to new proteins with novel combinations of functions ## how transposable elements contribute to genome evolution
if a transposable element inserts within a regulatory sequence, the transposition may lead to
increased or decreased production in one or more proteins
transposable elements are thought to contribute to genome evolution in 3 ways
promote recombination, disrupt cellular genes/control elements, carry entire genes/individual exons to new locations # comparing genome sequences provides clues to evolution and development ## comparing genomes
genes that differentiate humans from chimpanzees
code transcription factors
FOXP2
transcription factor coding gene involved in speech acquisition in humans
copy number variants (cnv)
loci where some individuals have one or multiple copies of a particular gene or genetic region rather than the 2 standard copies (one on each homolog) Result from duplication or deletion inconsistent within the population Play a part in complex diseases and disorders; more likely to have phenotypic consequences bc on longer stretches of dna ## widespread conservation of developmental genes among animals
evo-devo
evolutionary developmental biology; comparison of developmental processes of different multicellular organisms
homeotic gene
encodes transcription factors regulating gene expression, specifying identity of body segments, all include homeoboxes
homeobox
specific dna sequence, ~180 nucleotide bases long, codes for 60 amino acid homeodomain in encoded proteins
homeodomain
60 amino acid domain in an encoded protein; part of the protein that binds to DNA when the protein functions as a transcription factor