Replication is the copying of a genome so that a cell can divide. An origin of replication which is the Y shaped fork in DNA replication an AT rich region which requires less energy to break. A collection of enzymes and factors called a replisome assemble at the replication form to add complementary base pairs in a 5’ to 3’ direction. The leading strand of DNA is synthesised continuously, but because DNA polymerase can only read in a 3’ to 5’ direction and only synthesise in a 5’ to 3’ direction, the other strand does not have a free 3’ end. We call this strand the lgging strand and is synthesised in Okazaki fragments using a sliding clamp which releases old and re-attaches new Okazaki fragments. Afterwards, the DNA ligase joins these Okazaki fragments. This enzyme links the 3’-OH and 5’-PO4 groups between nucleotides to form a phosphodiester bond. Methylation occurs afterwards to make the new strand identical to the old one. This is called semiconservative replication.
DNA needs to be opened from its supercoiled state to allow it to replicate. DNA gyrase first removes the supercoiling and then DNA helicase unwinds the double helix, breaking the hydrogen bonds - then single stranded binding proteins bind to the single strands to prevent the DNA from reannealing.
Circular chromosomes may become catented after replication and so Topoisomerase IV releases these two links by cutting one of the chromosomes and re-joining it.
DNA polymerase requires a pre-existing 3’-OH end to be able to start replication as it can only add nucleotides to existing 3’-OH ends. An 11-12 base pair RNA primer is made. The leading strand requires only 1 RNA primer but the lagging strand requires one RNA primer for every Okazaki fragment. DNA polymerase can then begin replication at the site of the RNA primer. PriA removes SSB proteins so that primase (DnaG), which is an RNA polymerase can synthesise a short RNA primer sequence directly on the DNA. Then each strand, the leading and lagging gets its own molecule DNA polymerase III (PolIII), they bind to the 3’-OH ends on the primers and synthesis of new DNA begins.
PolIII is the main bacterial DNA polymerase. It has a sliding clamp on it which is a dimer of DnaN proteins (or beta-subunits), each of these clamps surrounds one of the DNA strands at the point of replication, this is done by clamp loader enzymes. These clamps then bind the main two core enzymes of PolIII, with each core enzyme being made of: DnaE (which joins nucleotides), DnaQ (which proofreads), and HolE (which stabilises the two previous subunits). PolIII can synthesise DNA at 1000 bp/s.
The lagging strand
The lagging strand has many RNA primers still attached to it from primase. These need to be removed and fixed as they also contain nicks between the previous Okazaki fragment and the next primer. PolI removes RNA primers (with its exonuclease activity) and then fills in the blanks with DNA base pairs (with its polymerase activity). Finally DNA ligase comes and fills in ligates the nicks around what once were the primers. The activity of PolI can also be done by RnaseH.
Mismatch bases will cause a bulge in the double helix at the error site. The cell always assumes that the new base pair is the incorrect one. In E. coli the repair system is called MutSHL and is able to tell which strand is old and which strand is new based on their methylation properties. After replication is complete DNA is in a hemimethylated form - that is that only one strand is methylated (the old strand). DNA adenine methylase (Dam) and DNA cytosine methylase (Dcm) add methyl groups at methyl groups to DNA. They are slow enzymes meaning that the repair system is able to fix erros before methylation is complete. In E. coli the three genes for repair are mutS, which codes for a protein that finds sequence bulges and distortions, mutH which codes for a protein which finds GATC sites and nicks the nonmethylated strand, and mutL which codes for a protein which holds together the MutS, mismatch, MutH, GATC complex together. The DNA between these enzymes is degraded and PolIII resynthesises the DNA. (See fig 4.7)
Different types of replication
The main differences between replication in prokaryotes and eukaryotes arises from the fact that prokaryotes have circular genomes and eukaryotes have linear genomes.
Bacteria have one origin of replication, called oriC and the replication starts here and goes in both directions around the circle, meeting at the other side (terC). This is called theta replication. Many plasmids also use this type of replication.
Some plasmids may also use rolling circle replication. The origin of replication causes only 1 strand to be nicked and the plasmid is unrolled. The single stranded origin is then the place where DNA synthesis begins. The dangling strand is then nicked off, ligated to form a circle and the same process happens for its single stranded origin. This type of replication can also be done for viral DNA. Some viruses will continuously unroll and replicate making a very long dangling strand (made up of several genome length pieces) which is then cut into individual genomes and packaged into viral particles. These can be left linear or can be circularised.
Eukaryotes do not have a dual purpose PolI enzyme and so primers are removed by MF1 and the DNA polymerase delta unit fills in the gaps. Linear chromosomes get shorter with each replication and so telomers which are tandem repeats are found on the end of chromosomes to absorb this loss so it does not affect genes. Telomerase can lengthen the telomers by using RNA templates to synthesise new tandem repeats. Not all cells can do this though - once the telomers shorten to nothing the cell kills itself.
Recall that eukaryotic replication only occurs at around 50 base pairs per second. Chromosomes are extremely long and so multiple origins of replication exist in eukaryotes which shortens the amount of time required to replicate by initiating replication in both directions. Replication strands will then meet each other. Eukaryotes have a cell cycle. During the G1 phase the cell is resting, then during the S phase (S for synthesis), the genome is replicated. This takes 40 minutes for yeast. Then the G2 phase is another rest phase before mitosis in the M phase, this is when the cell wall and membrane moves as to partition the cell into two cells, moving one copy of the genome into each partition. Cytokinesis then occurs and the two cells fully divide. During mitosis the nuclear membrane has to be dissolved. Chromosomes are pulled apart by spindle fibres which attach to a section on the chromosome called the centromere via the kinetochore.
In Vitro DNA synthesis
In the lab, DNA is unwound using high heat instead of enzymes. The high heat forces the hydrogen bonds to break forming ssDNA. RNA primers are not used either, as it degrades quicky. Instead DNA primers are used - a short single stranded oligo. DNA polymerase doesn’t care if the primer is DNA or RNA as long as there’s a free 3’-OH. Some of the sequence must be known for the primer to anneal. DNA polymerase (typically Taq polymerase) and a pool of dNTPs is mixed with the DNA to be synthesised. Thermal cycling given that one had dsDNA to begin with allows for amplification of the DNA. This is PCR.
Chemical DNA synthesis
Chemical DNA synthesis is typically used to make primers or oligos. Chemical synthesis is done in the 3’->5’ direction, is the opposite to in vivo synthesis. The first nucleotide is attached to a bead made of controlled pore glass (CPG) by means of a spacer molecule which binds to the 3’ end of the nucleotide. A chemical blocking group (DMT) is added to the 5’ end so only the 3’ end is available to react. DMT is sequentially removed and added to elongate the chain from the 5’ end. Nucleotides are added as phosphoaromatides which is the nucleotide with the 3’ phosphate blocked from reacting.
Template DNA is typically dsDNA taken from an organism. DNA primers are oligos which anneal to a specific region on each strand of the target sequence. These DNA primers bracket the sequence to be amplified. Taq polymerase is the enzyme used in the polymerase chain reaction. Firstly the target DNA is heated to a near boiling temperature (94 degrees) causing the two strands to melt. The temperature is lowered (50-60 degrees) and the primers anneal to the individual strands. Then the temperature is raised again to 72 degrees so that Taq polmerase can bind to the 3’ ends of the primers and begin synthesis of the DNA. The temperature is then raised so that the newly synthesised DNA melts, then lowered so that primers re-attach, then heated to synthesise more DNA. This is repeated over and over again in a thermal cycler. From a single piece of DNA, one can amplify it to 1 billion pieces within 30 cycles.
One can amplify an unknown sequence by cloning it into a vector and amplifying that, because part of the vector sequence is known. Primers can be designed to anneal to lengths of DNA flanking the unknown sequence.
Degenerate priemrs can be made to account for wobble. Adding wobble bases when making primers makes a collection of primers that will either bind perfectly, with mismatches or not at all to the target. This gives some flexibility in the annealing if one wants to amplify genes from different organisms. The genes may be similar but not exactly the same, thus reducing the specificity of this PCR.
Inverse PCR is used when the sequence is only known for one side of the target - that is you can only design one primer. Restriction enzymes cut at areas which flank the left and the right of the known sequence and then these are ligated to form a circular piece of DNA. Then one can design two primers that anneal to the known sequence and amplification then breaks the known sequence in half and PCR can continue as normal (figure 4.20).
RE-PCR uses reverse transcriptase to make cDNA copies from mRNA. This is used when trying to amplify a gene from a eukaryote without the associated introns/non coding DNA. Reverse transcriptase finds the 3’ end of the primers which are made of repeating Ts to anneal to the poly(A) tail of the mRNA. Then the RNA is copied but producing a DNA copy instead of RNA. Then the cDNA is amplified using regular PCR.
PCR primers can be used to create new restriction enzyme sites at the end of target sequences which makes them easier to clone into vectors (figure 4.22). The primers are synthesises so they have 5’ overhangs which will form the restriction enzyme cut and ligation site, the first part of the primer anneals and the 5’ end doesn’t. This means that the target is elongated by a short amount represented by the primer overhangs.
Alternatively one can use TA cloning to clone any PCR product. This is because Taq polymerase will add a single adenine overhang to the ends of the products it creates with its terminal transferase activity. Then one can clone these products into vectors with T overhangs (figure 4.23).
One can also make overlap primers which can join two different segments of DNA together. One needs three primers here, one complementary for the beginning of the first gene, one for the end of the second and one which is complementary to both. PCR then fuses the two DNA pieces together and amplifies then (figure 4.24).
DNA sequencing can be done with PCR using chain termination sequencing. PCR is used as normal but synthesis is stopped at newly synthesised base pairs to create many different fragments of differnt sizes. Due to the large number of molecules there will be a mixture of fragments that differ by one base pair only. These can be separated using gel electrophoresis to create a fragment ladder. The final nucleotide (if known) can be read directly from the gel, then put together to know the sequence. The final base pair is known because during the PCR a percentage of nucleotides with no 3’-OH are added meaning that elongation cannot occur after this base has been added. These are called dideoxynucleotides. One must use polyacrylamide gel electrophoresis because the fragments differ by only 1 base pair. In cycle sequencing the reaction is done with all four dideoxynucleotides in solution with the target and taq polymerase. Each of the 4 dideoxynucleotides then has its own unique fluorophore. The sample is amplified to create a mixture of different length fragments each ending with either A, T, G or C. The fragments can then be separated by gel electrophoresis and 4 different colours can be seen, each representing the terminal nucleotide in that fragment. Reading upwards then generates the sequence. Automatic DNA sequencers can do this using capillary tubes with a gel to separate the DNA automatically, and each tube has a fluorescent activtor and detector to read the sequence of fragments as they come through the gel.
Next generation sequencing
Also called parallel sequencing are technologies that can hold millions of DNA fragments in different location, multiplexing the process to make it cheaper and faster. This means that different DNA fragments need to be barcoded by having index seuqences added to them so that after a genome is fragmented the sequences can be pieced back together after multiplexed sequencing. For example illumina sequencing uses glass nano-wells with deposited DNA primers in them to capture DNA fragments. Once a fragment is in each nano-well, rapid amplification occurs so that each nano-well has only 1 type of DNA fragment in it, making it a pure sample. In 454 sequencing this is done with beads and emulsion PCR to amplify DNA on the beads. This DNA is next denatured into ssDNAs, and nucleotides are added one by one to the ssDNAs and the activity of nucleotide addition is recorded. This is sequencing by synthesis. The beads are put into picotiter plates, or in the case of illumina sequencing the flow cell acts as one of these plates, and one of the four nucleotides is flowed across this plate along with DNA polymerase. If the nucleotide binds to a complementary base pair on the 3’ end of the growing dsDNA, then a flash of light is released because the picotitre plate contains luciferase adn sulfuryase which react with the side products of the base addition. Only the cells where the next base pair bonds will generate a flash of light meaning that separate detectors for each well can measure what base pairs are added. Due to barcoding the single reads can be compiled into one long genomic read. There is also significant overlap in reads so that there is redundancy in the sequencing and errors can be corrected.
References: my notes are made from, and follow the structure of my course textbook which is Biotechnology 2nd edition by David P. Clark, which can be found for purchase here.