[Lecture Notes] Fundamentals of Biotechnology - L9: Recombinant Proteins

Expression of eukaryotic proteins in prokaryotes

Expression vectors are special plasmids which increase protein expression because they provide the strong promoter which drives expression of that gene and contain the Ab resistance gene which allows for selection of those bacteria which have it. When cloning eukaryotic proteins into prokaryotes, cDNA is used to provide the continuous uninterrupted coding region for the bacteria, and this is also done when cloning into eukaryotes because it means minimising the amount of DNA that one needs to work with.

Cloned mRNA may experience problems such as low binding affinity to the ribosome, it may have secondary structures such as hairpins or be unstable and it may have the wrong sequence for the organism due to codon bias.

Cloning of recombinant proteins example: Insulin

Insulin is made of an A-chain and a B-chain held together by disulphide bonds. Insulin is encoded for by a single gene, and therefore when translated is made up of a single polypeptide chain. The A-chain and the B-chain are connected by some peptides called the C-chain. Post-translational modification and processing turns this preproinsulin into functioning insulin. The N terminus sequence is a secretion signal and is removed by a peptidase. This is proinsulin and this is then processed by endopeptidases which cut out the C-chain. The terminal arginine and lysine residues are removed by carboxypeptidase-H to form insulin.

It can therefore be seen that if the gene for insulin were to be encoded into bacteria, they would simply make preproinsulin because they lack much of the abilities to perform post-translational modification like eukaryotes. Moreover the conditions within E. coli’s cytoplasm make it unfavourable for disulphide bonds to form, whereas their formation is more likely in the periplasmic space. Simple proteins with few disulphide bonds such as HCG may fold correctly in such bacteria but due to the low amounts of DSB proteins in E. coli more complex proteins such as insulin will not fold properly (progress has been made into addressing this issue by overexpressing these DSB proteins).

It is too complex to produce preproinsulin and all the accompanying modification enzymes in one organism and expect to get functional human insulin. Instead we clone in two artificial mini genes, one for the A-chain and one for the B chain into two different bacterial cultures so that they are produced separately (often expressed as a fusion protein of the chain+beta-galactosidase and then cleaved by cyanogen bromide). They are then mixed and chemically treated to induce disulphide bridge formation.

Insulin however tends to clump. If a high concentration of insulin is left it will form hexamers which is not an active form of the protein. It takes time for the hexamer to dissociate fully in the body meaning injected insulin may take hours to stabilise a patient’s blood glucose levels. In vivo produced insulin is secreted from cells in such a way and carried off so that it does not form hexamers.

On the surface of the insulin chains which touch each other when forming hexamers, there is a proline group which is uncharged. Engineering the insulin so that this proline residue is replaced with an aspartic acid group which is negatively charged means that these chains now repel one another - thus preventing the clumping. This makes the insulin fast acting whch is preferential for type I diabetics but the slow acting hexamer form is better for type II diabetics.

Translational expression vectors

Recall that the closer the the ribosome binding site sequence is to the Shine-Dalgarno consensus sequence, the more stronly it will bind and initiate translation. Expression vectors are concerned with increasing expression through translation, but there are translational expression vectors which seek to increse expression through increasing translation efficiency. To design a translational expression vector one needs the consensus Shine-Dalgarno sequence and the start codon (ATG) at the optimal 8 base pairs downstream of the sequence. The restriction enzyme NcoI is used because its cut site includes ATG and so the gene can be cloned in directly overlapping the start codon. Like with other expression vectors, translational expression vectors possess the usual antibiotic resistance genes and promoter. If base pairing occurs in the mRNA between in areas around the ribosome binding site, this will cause the mRNA to be degraded. This can be alleviated by exploiting wobble in the third base pair position of each codon and making a few changes.

Codon bias

Different organisms have different distributions of triplet code usage due to the redundant nature of the genetic code. This bias has to be accounted for when cloning a gene from one organism to another. If the organism you are cloning from uses a certain triplet code more than the organism you are cloning into, then translation will slow down at those triplet codes because of the lower concentration of the required tRNA available. This is fixed by changing the third base in the triplet codes as to better match the distribution of tRNAs within the target organism. Codon optimisation may lead to 10-fold increases in expression levels. It can be labour intensive to change the third base of every single codon in a gene sequence and alternatively one can clone in the genes for rare tRNAs into the organism. An example of this is argU which is the gene coding for the AGA and AGG reading tRNA - supplying this gene to a cell will increase levels of that tRNA, making translation more efficient.

There is however an evolutionary reason for codon bias to exist intentionally. Some proteins will only fold properly if translation is slowed down at certain locations to allow the protein to reach the right conformation during co-translational folding. Intentional codon bias can provide this slowdown so that the protein can fold correctly.

Protein overexpressing and inclusion bodies

Protein overexpression forms inclusion bodies in cells which are large clumps of incorrectly folded proteins, and these clumps can harm and kill the cell producing it. pET and pBAD vectors can regulate protein expression and so are used to avoid this problem. pET works in the classic way by using components of the lac operon. pET has the lac operon onboard and a T7 promoter and terminator. When IPTG is added, the lacI is released allowing the expression of the T7 promoter, which can then induce the production of T7 RNA polymerase and the cloned gene which in turn can start to produce the mRNA required for protein production.

One can prevent inclusion bodies from forming by inducing the production of chaperone proteins which bind to the protein during translation preventing folding and only allow folding in a controlled manner after the translation is complete. The protein can also be delivered to the preplasmic space or secreted extracellularly. Chaperone molecules are inherently costly to the cell to produce and so aren’t used that often. Instead one may allow IBs to form and then denature them and refold them in refolding tanks under certain conditions such as in the presence of urea, guanidine and other chaotropic agents. This technique does result in a lower yield but often is profitable so is done anyway.

Protein stability

The N-terminus of a protein can change its half-life, as to PEST sequences (proline, glutamate, serene and threonine). Therefore one can change protein half-life to prevent it being degraded by proteolytic enzymes by changing the N terminus which largely doesn’t have an effect on protein function, whereas changing PEST sequences within the protein would.

Protein secretion

Secretion signals exist on the N-terminus of proteins which allow secretion signals to recognise if a protein is to stay in the cytoplasm, the periplasmic space or to be secreted outside of the cell. A goal of biotechnology is to secrete stable proteins out of the cell because these are inherently easier to purify.

We can add signal sequences to the N terminus of proteins so that they are moved to the periplasmic space, and then the outer membrane can be made permeable as to harvest the protein. An issue with this is that a build up of proteins with the signal sequence can occur within the cytoplasm and form inclusion bodies. Sec genes which increase the protein expression of the secretory system.

Alternatively one can create a fusion protein of the desired protein and a protein which is normally to the preiplasmic space and then protease cleavage can be used to separate the two proteins. This is generally most effective for short proteins.

Both of the above options can be instead done in gram-positive bacteria which lack the outer cell membrane meaning that proteins secreted cannot go into the periplasmic space because there isn’t one. Therefore they are secreted out of the cell directly. An example of a bacteria which can do this is Bacillus sp or often times animal cells are used. If this isn’t done then one may induce secretion directly into the extracellular space in E. coli by using highly specialised secretion systems.

Fusion proteins

Fusion proteins are made when two coding sequences are put side by side so that a single long polypeptide is made. These may be peptide tags on proteins, or much more complicated fusion proteins. As mentioned previously, one can fuse a normally secreted protein onto the N-terminus of another protein causing the whole thing to be secreted. Small eukaryotic proteins such as hormones are unstable in bacterial cytoplasm and may not be long enough to fold and so expressing them as a fusion protein of the eukaryotic protein joined to a native bacterial protein can increase their stability.

Protiein fusion vectors contain both the genes for the carrier protein and the protein of interest. A good carrier protein is exported from the cell and relatively easy to collect into a pure product. It can be made even better by engineering the consensus Shine-Dalgarno sequence ahead of it to induce strong ribosome binding.

MalE is a protein which exports maltose out of the cell. The gene of interest is cloned downstream of the MalE gene and between them a protease cleavage site is cloned in. This allows one to cleave the two proteins apart. The genes are given a strong promoter to increase production and after a sufficient amount has been collected, it is passed over a column which binds the MalE, and then protease is eluted over it to cleave off the protein of interest.


Glycosylation is the addition of short sugar chains to the protein after translation. It is one example of a post-translational modification. Glycosylation is important for many eukaryotic protein functions and without it, eukaryotic recombinant proteins expressed in bacteria do not function exactly as required. Bacteria glycosylate on side-chain oxygen atoms wheras eukaryotes glycosylate on nitrogen atoms. An N-linked glycosylation pathway in Campylobactr was discovered and cloning this into E. coli will allow some eukaryotic proteins to be correctly glycosylated when expressed.

Otherwise typically we use insect cells which can partially glycosylate proteins, although now there are insect cell lines engineering to fully glycosylate proteins like mammals do.

Glycosylation affects the half-life of proteins within the body. This is why mAbs produced in bacteria have a low biological half-life in humans. Increased glycosylation of erythropoiten is associated with lower receptor binding affinity but due to the increased biological half-life, its overall activity in the body is increased. Therefore other hormones can have their activity and biological half-life regulated by over or under glycosylating them.

Eukaryotic expression systems

Bacterial expressions systems have disadvantages: they do not properly glycosylate or do other post-translational modifications which activate the protein (see insulin), their interiors are poor environments for disulphide bridge formation, the proteins may misfold or form IBs, the protein may be inactive and bacterial component contamination is a problem. This contamination may be toxic or raise an immune response. Therefore often times eukaryotic expression systems are preferred to express eukaryotic proteins. Eukarytes tend to possess most if not all of the enzymes required for this post translational modification. These usually require shuttle vectors, where the genetic engineering is done in bacteria and then the same vector is transfected into eukaryotes for expression.


Yeast is used due to its scalability and safety. It also secretes so few proteins that if engineered to secrete a specific protein, one can be confident that it will be quite pure. One can electroporate its cell wall to get vectors in, such as the 2 micron circle. It is also very capable at many post-translational modifications. Signal sequences from mating factor alpha (which is secreted) are cloned upstream of the protein gene. Between them Lys-Arg is cloned in which is recognised by yeast signal peptidase and cleaves off the signal before secretion so that the protein in its desired form is ready. The disadvantage of using the 2 micron circle for cloning is that only a single cloned gene is present, meaning its copy number is 1. This is more stable as plasmids are usually lost, but it does lower the expression of the protein. YACs are still not convenient and multiple genes on one 2 micron circle can cause crossing over.

Insulin and hepatitis treatments are currently produced in yeast, however problems such as low yields discussed previously, over-glycosylation, and proteins being trapped between the cell membrane and cell wall are common problems.

Insect cells

Insect cells require less stringent media than mammal cells and are more robust, making them less susceptible to sheer forces in bioreactors. They can also provide us with the much needed post-translational modifications. Like with some mammalian systems, we use vectors derived from viruses which can infect insect cells as our vectors. Baculovirus which is used forms polyhedrons, which are packages containing virus particles which are able to spread to other cells. They are protected by polyhedrin which is a protein which once injested by another insect is broken down allowing the polyhedrons to release more virus particles. It is this polyhedrin gene’s promoter which is exploited in the insect cell cloning system, because is it particularly strong, causing it to be made in large amounts. We remove the polyhedrin gene and replace it with the cDNA version of the gene for the protein of interest. The most commonly used virus is MNPV.

Firstly a transfer vector, which is an E. coli plasmid, which contains part of the virus’s DNA has the polyhedrin gene replaced with a MCS inside of the E. coli. Then the gene of interest is cloned into the MCS, and the whole vector is recombined into the baculovirus. Other methods are using Bacmids, which are shuttle vectors. They are plasmids in E. coli (i.e they have a bacterial ori) but acts to replicate virses when transfected into insect cells.

Mammalian cells

Again, one uses shuttle vectors with multiple origins of replication in order to clone in recombinant proteins into mammalian systems. The origin of replication for the mammalian cells is typically one which starts the formation of a virus which infects the cell. A common virus used is simian virus 40. Once again, virus promoters are explited because they are very strong, or strongly expressed gene promoters can be used such as the promoters for somatotropin. C

loned cDNA needs to include the poly-A tail and once transfected and grown, cells with high protein expression must be selected for. Geneticin is an antibiotic which can kill animal cells and therefore can be used as a selection method for screening successfully transfected cells. The npt gene which codes for neomycin phosphotransferase which adds a phosphate group to geneticin therefore can be cloned in to block the toxicity of the molecule.

Another screening method is to knock out the production of certain enzymes to create mutant cells. Knocking out the DHFR gene stops the production of dihydrofolate-reductase, killing the cell because no folic acid can be produced. This gene can be cloned into a shuttle vector and transfected into mutant cells, which can then be selected for because they will not die if they have the vector.

If a protien of interest is made of multiple subunits which combine to form a quaternary structure, then separate vectors are needed for each subunit can be used, or a single vector with multiple promoters for each gene for the subunits. Alternatively a single vector with both genes in an operon to produce polycistronic mRNA can be used with certain animal viruses to ensure that multiple coding sequences on a single mRNA are translated. This is done using internal ribosobomal entry sites or IRESs. This means that two ribosomes will bind to the mRNA, one which binds at the start of the first gene and one which binds at the start of the second gene. This way two independent proteins will be produced by different ribosomes unlike in bacteria where one ribosome produces all the proteins in the operon.

References: my notes are made from, and follow the structure of my course textbook which is Biotechnology 2nd edition by David P. Clark, which can be found for purchase here.