[Lecture Notes] Fundamentals of Biotechnology - L10: Protein Engineering

Most industrial enzymes are hydrolases which are able to function in harsh oxidising conditions. They have been engineered to function at these harsh conditions which are not typical inside living cells.

Disulphide bond engineering

Disulphide bonds are one of the strongest bonds found between protein residues responsible for maintaining the 3D structure of the protein. Engineering in additional disulphide bonds is easy and increases the harshness of the operating conditions that the protein can tolerate before it loses function. This is done by introducing pairs of cysteines into the protein’s structure which under oxidising conditions will bond with one another, holding the 3D shape of the protein. This obviously only works if the protein folds in order to bring the two residues close together in the first place. A rule of thumb is that more stability is induced into the protein the longer the distance between the two cysteine residues is, so long as bond strain is not added as well.

A case study in disulphide bond engineering was the T4 bacteriophate lysozyme. It has two cysteine residues but they do not form disulphide bridges, so one of them was changed to Thr and in 3 other possible locations another cysteine was mutated in. Then these two residues were able to bond with one another. All three residue locations were tested by measuring the melting temperature of the protein and the one which the highest melting temperature is considered the most stable, but often this doesn’t mean the activity is retained.

Stability improvements

It is known that the larger the number of ways to arrange a system is, the higher the entropy of that system. Therefore entropically it makes sense to improve the stability of a protein by decreasing the number of misfolded or unfolded conformations. One can replace glycine, which only has -H as its R group with proline, which has a ring for its R group reduces the conformational freedom of the protein. One can make similar substitutions paying attention to not change the actual structure of the protein. One may also fill holes in the hydrophobic core of a protein with larger hydrophobic residues so long as it doesn’t change the structure of the protein.

Another way of promoting stability is to have a residue close to the N terminus of an alpha helix, because it has a slight net positive charge at that end. This can stabilise the alpha helix and this is commonly found in nature.

One can also try to avoid asparagine and glutamine because at high temperatures they are converted to aspartic acid and glutamic acid respectively, and when they lose their proton the negative charge left on the residue may cause conformational changes in the protein. Therefore replacing them with uncharged residues helps with increasing stability.

Binding site engineering

One can alter the specifity of an active site without necessarily altering the catalytic mechanism of the enzyme. This is done by changing either substrate of cofactor specificity. Typically enzymes use either NAD or NADP, with dehydrogenases preferring NAD and biosynthetic enzymes preferring NADP. NADP has a negatively charged phosphate group attached meaning that biosynthetic enzymes have a positively charged residue inside the binding site, and NADH favouring enzymes instead have a negative charge inside of the active site.

Lactate dehydrogenase has been engineered to replace the aspartate residue with a neutral one so that it can also bind NADP as well as NAD. If replaced with a positive residue like lysine or arginine then its preference can be switched entirely to NADP. Changing the size of the substrate binding site can also change the length of the carbon chain which it can catalyse.

Scaffold engineering

The binding sites and active site of an enzyme involve very few actual peptides, most of the protein is involved in building the scaffold around these sites to allow them to keep their shapes. The idea behind engineering the scaffold is to reduce its size as it is thought that much of the scaffold is unnecessary. The smaller the protein the more protein you can make per amount of media supplied to your organism. This is similar to the nanobody concept in antibody engineering. It is sometimes possible to take the binding site from a large protein and fuse it to a smaller protein and retain the binding site function.

Directed evolution

This technology actually won a Chemistry Nobel Prize this year. The gene of an enzyme is randomly mutated to create a library of slightly different enzymes. These enzymes are then screened for function and the best performing enzymes are then isolated. Then sequences for these enzymes can then be taken and further mutated to create another library of enzymes which undergo further screening. This is done over and over again which directs the changes in an enzyme towards increased function. The random mutagenesis is introduced using error prone PCR by using awful Taq polymerases with manganese chloride which can stabilise base mismatching. Overall on average some of the incorrect replication will occur at the active site of the enzyme which in some cases may increase the efficiency.

One can perform site-directed random mutagenesis by looking at the genetic differences between two different versions of the same enzyme. Changes in the activity may occur due to the changes found in the genetic code, and therefore one can direct mutagenesis towards specific parts of the amino acid sequence.

Another form of directed evolution involves the shuffling of domains from different enzymes together to create random sequences based on these enzymes.

Recombining domains

Examples of this have been shown where a non-specific endonuclease is fused to a specific DNA binding domain in order to create novel restriction endonucleases. These binding domains may come from proteins such as Gal-4, which has the transcription activation domain and a binding domain. Removing the transcription domain from Gal-4 and fusing it to FokI can create an enzyme which cuts at the Gal4 recognition site.

One can create novel restriction enzymes which can recognise longer sequences using the zinc finger domain. These are segments of DNA wrapped around a zinc ion, and each zinc finger motif recognises 3 base pairs. Therefore joining three motifs creates a 9 base pair recognition sequence, which can be bound to a FokI nuclease domain to create a restriction enzyme with a specificity of 9 base pairs, meaning that over a whole genome it may only cut once.

DNA shuffling

Another way of directed evolution is DNA shuffling whereby the gene of interest is sliced into segments 100-300bp in length, and randomly reassembled to create a library using overlap PCR. These segments can be mutated using either site directed or random mutagenesis. Or one can start with many variations of the same gene with different mutations and cut them up, mix them and recombine the segments to form new gene sequences with segments from the different gene versions, this ensures that the original sequences remain in the same order

Combinatorial protein libraries

This technique can be used to create new proteins with totally new chemical functions. Known DNA modules from existing proteins of about 25 codons long are taken and randomly shuffled and combined using PCR. These modules may be alpha helices, beta-sheets, metal ion binding domains, etc. Randomised joining or specific ordering may be used to generate novel gene sequences which can be expressed and screened for interesting activity chosen from the modules used. To ensure correct expression, the experiment is constrained so that the start and end modules are promoters and terminator sequences.

Exon shuffling takes the idea that individual exons are naturally occurring modules, and so it is theorised that many enzymes evolved as random shuffling of exons.

##De novo proteins

These are totally new polypeptide chains forming proteins which have never existed before, with totally new and novel function. Random DNA sequences are generated and expressed in cells, and the resulting polypeptide chains may be screened for function. Most expressed proteins will be totally useless, and not fold into any meaningful shape. In practise, random DNA sequences are combined with known motifs such as alpha helices and beta-pleated sheets which increases the probability that a novel protein will be found. Certain combinations of these motifs are known to fold in certain ways and so introducing other randomness and studying the result allows scientists to begin learning how to predict De novo protein structures.

These de novo proteins can begin to be screened using microarrays to test their binding to thousands of different small molecule targets.

Expanding the genetic code: new amino acids

E. coli can be modified to produce tRNAs with non natural amino acids bound to them. Tyrosol tRNA synthetase has been modified to incorporate the non-natural amino acid pBpa at UAG codons. The mutated tyrosol tRNA synthetase does not recognise any of the naturally occuring tRNAs inside E. coli. The gene for this mutant synthetase and the gene for the tRNA which was modified to recognise UAG were both supplied to the cell. When both are expressed a tRNA recognising the UAG stop codon with a non-natural amino acid is synthesised. Orthogonal tRNAs mean that occasionally instead of the stop codon causing the translation to end, a pBpa will be inserted instead, and translation will continue. One can prevent this by removing all of the clashing stop codons from the genome and replacing them with other stop codons.

New amino acids can have functions such as allowing for UV activated cross-linking, infrared probing, and isotopic labelling.

References: my notes are made from, and follow the structure of my course textbook which is Biotechnology 2nd edition by David P. Clark, which can be found for purchase here.