Peptide Structure Analysis Essay
Proteins perform most of the work of living cells. This versatile class of macromolecule is involved in virtually every cellular process: proteins replicate and transcribe DNA, and produce, process, and secrete other proteins. They control cell division, metabolism, and the flow of materials and information into and out of the cell. Understanding how cells work requires understanding how proteins function.
The question of what a protein does inside a living cell is not a simple one to answer. Imagine isolating an uncharacterized protein and discovering that its structure and amino acid sequence suggest that it acts as a protein kinase. Simply knowing that the protein can add a phosphate group to serine residues, for example, does not reveal how it functions in a living organism. Additional information is required to understand the context in which the biochemical activity is used. Where is this kinase located in the cell and what are its protein targets? In which tissues is it active? Which pathways does it influence? What role does it have in the growth or development of the organism?
In this section, we discuss the methods currently used to characterize protein structure and function. We begin with an examination of the techniques used to determine the three-dimensional structure of purified proteins. We then discuss methods that are used to predict how a protein functions, based on its homology to other known proteins and its location inside the cell. Finally, because most proteins act in concert with other proteins, we present techniques for detecting protein-protein interactions. But these approaches only begin to define how a protein might work inside a cell. In the last section of this chapter, we discuss how genetic approaches are used to dissect and analyze the biological processes in which a given protein functions.
The Diffraction of X-rays by Protein Crystals Can Reveal a Protein's Exact Structure
Starting with the amino acid sequence of a protein, one can often predict which secondary structural elements, such as membrane-spanning α helices, will be present in the protein. It is presently not possible, however, to deduce reliably the three-dimensional folded structure of a protein from its amino acid sequence unless its amino acid sequence is very similar to that of a protein whose three-dimensional structure is already known. The main technique that has been used to discover the three-dimensional structure of molecules, including proteins, at atomic resolution is x-ray crystallography.
X-rays, like light, are a form of electromagnetic radiation, but they have a much shorter wavelength, typically around 0.1 nm (the diameter of a hydrogen atom). If a narrow parallel beam of x-rays is directed at a sample of a pure protein, most of the x-rays pass straight through it. A small fraction, however, is scattered by the atoms in the sample. If the sample is a well-ordered crystal, the scattered waves reinforce one another at certain points and appear as diffraction spots when the x-rays are recorded by a suitable detector (Figure 8-45).
X-ray crystallography. (A) A narrow parallel beam of x-rays is directed at a well-ordered crystal (B). Shown here is a protein crystal of ribulose bisphosphate carboxylase, an enzyme with a central role in CO2 fixation during photosynthesis. Some of the (more...)
The position and intensity of each spot in the x-ray diffraction pattern contain information about the locations of the atoms in the crystal that gave rise to it. Deducing the three-dimensional structure of a large molecule from the diffraction pattern of its crystal is a complex task and was not achieved for a protein molecule until 1960. But in recent years x-ray diffraction analysis has become increasingly automated, and now the slowest step is likely to be the generation of suitable protein crystals. This requires large amounts of very pure protein and often involves years of trial and error, searching for the proper crystallization conditions. There are still many proteins, especially membrane proteins, that have so far resisted all attempts to crystallize them.
Analysis of the resulting diffraction pattern produces a complex three-dimensional electron-density map. Interpreting this map—translating its contours into a three-dimensional structure—is a complicated procedure that requires knowledge of the amino acid sequence of the protein. Largely by trial and error, the sequence and the electron-density map are correlated by computer to give the best possible fit. The reliability of the final atomic model depends on the resolution of the original crystallographic data: 0.5 nm resolution might produce a low-resolution map of the polypeptide backbone, whereas a resolution of 0.15 nm allows all of the non-hydrogen atoms in the molecule to be reliably positioned.
A complete atomic model is often too complex to appreciate directly, but simplified versions that show a protein's essential structural features can be readily derived from it (see Panel 3-2, pp. 138–139). The three-dimensional structures of about 10,000 different proteins have now been determined by x-ray crystallography or by NMR spectroscopy (see below)—enough to begin to see families of common structures emerging. These structures or protein folds often seem to be more conserved in evolution than are the amino acid sequences that form them (see Figure 3-15).
X-ray crystallographic techniques can also be applied to the study of macromolecular complexes. In a recent triumph, the method was used to solve the structure of the ribosome, a large and complex cellular machine made of several RNAs and more than 50 proteins (see Figure 6-64). The determination required the use of a synchrotron, a radiation source that generates x-rays with the intensity needed to analyze the crystals of such large macromolecular complexes.
Molecular Structure Can Also Be Determined Using Nuclear Magnetic Resonance (NMR) Spectroscopy
Nuclear magnetic resonance (NMR) spectroscopy has been widely used for many years to analyze the structure of small molecules. This technique is now also increasingly applied to the study of small proteins or protein domains. Unlike x-ray crystallography, NMR does not depend on having a crystalline sample; it simply requires a small volume of concentrated protein solution that is placed in a strong magnetic field.
Certain atomic nuclei, and in particular those of hydrogen, have a magnetic moment or spin: that is, they have an intrinsic magnetization, like a bar magnet. The spin aligns along the strong magnetic field, but it can be changed to a misaligned, excited state in response to applied radiofrequency (RF) pulses of electromagnetic radiation. When the excited hydrogen nuclei return to their aligned state, they emit RF radiation, which can be measured and displayed as a spectrum. The nature of the emitted radiation depends on the environment of each hydrogen nucleus, and if one nucleus is excited, it influences the absorption and emission of radiation by other nuclei that lie close to it. It is consequently possible, by an ingenious elaboration of the basicNMR technique known as two-dimensional NMR, to distinguish the signals from hydrogen nuclei in different amino acid residues and to identify and measure the small shifts in these signals that occur when these hydrogen nuclei lie close enough together to interact: the size of such a shift reveals the distance between the interacting pair of hydrogen atoms. In this way NMR can give information about the distances between the parts of the proteinmolecule. By combining this information with a knowledge of the amino acid sequence, it is possible in principle to compute the three-dimensional structure of the protein (Figure 8-46).
NMR spectroscopy. (A) An example of the data from an NMR machine. This two-dimensional NMR spectrum is derived from the C-terminal domain of the enzyme cellulase. The spots represent interactions between hydrogen atoms that are near neighbors in the protein (more...)
For technical reasons the structure of small proteins of about 20,000 daltons or less can readily be determined by NMR spectroscopy. Resolution is lost as the size of a macromolecule increases. But recent technical advances have now pushed the limit to about 100,000 daltons, thereby making the majority of proteins accessible for structural analysis by NMR.
The NMR method is especially useful when a protein of interest has resisted attempts at crystallization, a common problem for many membrane proteins. Because NMR studies are performed in solution, this method also offers a convenient means of monitoring changes in protein structure, for example during protein folding or when a substrate binds to the protein. NMR is also used widely to investigate molecules other than proteins and is valuable, for example, as a method to determine the three-dimensional structures of RNA molecules and the complexcarbohydrate side chains of glycoproteins.
Some landmarks in the development of x-ray crystallography and NMR are listed in Table 8-8.
Landmarks in the Development of X-ray Crystallography and NMR and Their Application to Biological Molecules.
Sequence Similarity Can Provide Clues About Protein Function
Thanks to the proliferation of protein and nucleic acid sequences that are catalogued in genome databases, the function of a gene—and its encoded protein—can often be predicted by simply comparing its sequence with those of previously characterized genes. Because amino acid sequence determines protein structure and structure dictates biochemical function, proteins that share a similar amino acid sequence usually perform similar biochemical functions, even when they are found in distantly related organisms. At present, determining what a newly discovered protein does therefore usually begins with a search for previously identified proteins that are similar in their amino acid sequences.
Searching a collection of known sequences for homologous genes or proteins is typically done over the World-Wide Web, and it simply involves selecting a database and entering the desired sequence. A sequence alignment program—the most popular are BLAST and FASTA—scans the database for similar sequences by sliding the submitted sequence along the archived sequences until a cluster of residues falls into full or partial alignment (Figure 8-47). The results of even a complex search—which can be performed on either a nucleotide or an amino acid sequence—are returned within minutes. Such comparisons can be used to predict the functions of individual proteins, families of proteins, or even the entire protein complement of a newly sequenced organism.
Results of a BLAST search. Sequence databases can be searched to find similar amino acid or nucleic acid sequences. Here a search for proteins similar to the human cell-cycle regulatory protein cdc2 (Query) locates maize cdc2 (Subject), which is 68% identical (more...)
In the end, however, the predictions that emerge from sequence analysis are often only a tool to direct further experimental investigations.
Fusion Proteins Can Be Used to Analyze Protein Function and to Track Proteins in Living Cells
The location of a protein within the cell often suggests something about its function. Proteins that travel from the cytoplasm to the nucleus when a cell is exposed to a growth factor, for example, may have a role in regulating geneexpression in response to that factor. A protein often contains short amino acid sequences that determine its location in a cell. Most nuclear proteins, for example, contain one or more specific short sequences of amino acids that serve as signals for their import into the nucleus after their synthesis in the cytosol (discussed in Chapter 12). These special regions of the protein can be identified by fusing them to an easily detectable protein that lacks such regions and then following the behavior of this surrogate protein in a cell. Such fusion proteins can be readily produced by the recombinant DNA techniques discussed previously.
Another common strategy used both to follow proteins in cells and to purify them rapidly is epitope tagging. In this case, a fusion protein is produced that contains the entire protein being analyzed plus a short peptide of 8 to 12 amino acids (an “epitope”) that can be recognized by a commercially available antibody. The fusion protein can therefore be specifically detected, even in the presence of a large excess of the normal protein, using the anti-epitope antibody and a labeled secondary antibody that can be monitored by light or electron microscopy (Figure 8-48).
Epitope tagging allows the localization or purification of proteins. Using standard genetic engineering techniques, a short epitope tag can be added to a protein of interest. The resulting protein contains the protein being analyzed plus a short peptide (more...)
Today large numbers of proteins are being tracked in living cells by using a fluorescent marker called green fluorescent protein (GFP). Tagging proteins with GFP is as simple as attaching the gene for GFP to one end of the gene that encodes a protein of interest. In most cases, the resulting GFP fusion protein behaves in the same way as the original protein, and its movement can be monitored by following its fluorescence inside the cell by fluorescence microscopy. The GFP fusion protein strategy has become a standard way to determine the distribution and dynamics of any protein of interest in living cells. We discuss its use further in Chapter 9.
GFP, and its derivatives of different color, can also be used to monitor protein-protein interactions. In this application, two proteins of interest are each labeled with a different fluorochrome, such that the emission spectrum of one fluorochrome overlaps the absorption spectrum of the second fluorochrome. If the two proteins—and their attached fluorochromes—come very close to each other (within about 1–10 nm), the energy of the absorbed light will be transferred from one fluorochrome to the other. The energy transfer, called fluorescence resonance energy transfer (FRET), is determined by illuminating the first fluorochrome and measuring emission from the second (Figure 8-49). By using two different spectral variants of GFP as the fluorochromes in such studies, one can monitor the interaction of any two protein molecules inside a living cell.
Fluorescence resonance energy transfer (FRET). To determine whether (and when) two proteins interact inside the cell, the proteins are first produced as fusion proteins attached to different variants of GFP. (A) In this example, protein X is coupled to (more...)
Affinity Chromatography and Immunoprecipitation Allow Identification of Associated Proteins
Because most proteins in the cell function as part of a complex with other proteins, an important way to begin to characterize their biological roles is to identify their binding partners. If an uncharacterized protein binds to a protein whose role in the cell is understood, its function is likely to be related. For example, if a protein is found to be part of the proteasome complex, it is likely to be involved somehow in degrading damaged or misfolded proteins.
Protein affinity chromatography is one method that can be used to isolate and identify proteins that interact physically. To capture interacting proteins, a target protein is attached to polymer beads that are packed into a column. Cellular proteins are washed through the column and those proteins that interact with the target adhere to the affinity matrix (see Figure 8-11C). These proteins can then be eluted and their identity determined by mass spectrometry or another suitable method.
Perhaps the simplest method for identifying proteins that bind to one another tightly is co-immunoprecipitation. In this case, an antibody is used to recognize a specific target protein; affinity reagents that bind to the antibody and are coupled to a solid matrix are then used to drag the complex out of solution to the bottom of a test tube. If this protein is associated tightly enough with another protein when it is captured by the antibody, the partner precipitates as well. This method is useful for identifying proteins that are part of a complex inside cells, including those that interact only transiently—for example when cells are stimulated by signal molecules (discussed in Chapter 15).
Co-immunoprecipitation techniques require having a highly specific antibody against a known cellular protein target, which is not always available. One way to overcome this requirement is to use recombinant DNA techniques to add an epitope tag (see Figure 8-48) or to fuse the target protein to a well-characterized marker protein, such as the small enzyme glutathione S-transferase (GST). Commercially available antibodies directed against the epitope tag or the marker protein can then be used to precipitate the whole fusion protein, including any cellular proteins associated with the protein of interest. If the protein is fused to GST, antibodies may not be needed at all: the hybrid and its binding partners can be readily selected on beads coated with glutathione (Figure 8-50).
Purification of protein complexes using a GST-tagged fusion protein. GST fusion proteins, generated by standard recombinant DNA techniques, can be captured on an affinity column containing beads coated with glutathione. To look for proteins that bind (more...)
In addition to capturing protein complexes on columns or in test tubes, researchers are also developing high-density protein arrays for investigating protein function and protein interactions. These arrays, which contain thousands of different proteins or antibodies spotted onto glass slides or immobilized in tiny wells, allow one to examine the biochemical activities and binding profiles of a large number of proteins at once. To examine protein interactions with such an array, one incubates a labeled protein with each of the target proteins immobilized on the slide and then determines to which of the many proteins the labeled molecule binds.
Protein-Protein Interactions Can Be Identified by Use of the Two-Hybrid System
Methods such as co-immunoprecipitation and affinity chromatography allow the physical isolation of interacting proteins. A successful isolation yields a protein whose identity must then be ascertained by mass spectrometry, and whose gene must be retrieved and cloned before further studies characterizing its activity—or the nature of the protein-protein interaction—can be performed.
Other techniques allow the simultaneous isolation of interacting proteins along with the genes that encode them. The first method we discuss, called the two-hybrid system, uses a reporter gene to detect the physical interaction of a pair of proteins inside a yeast cell nucleus. This system has been designed so that when a target protein binds to another protein in the cell, their interaction brings together two halves of a transcriptional activator, which is then able to switch on the expression of the reporter gene.
The technique takes advantage of the modular nature of gene activator proteins (see Figure 7-42). These proteins both bind to DNA and activate transcription—activities that are often performed by two separate protein domains. Using recombinant DNA techniques, the DNA sequence that codes for a target protein is fused with DNA that encodes the DNA-binding domain of a gene activator protein. When this construct is introduced into yeast, the cells produce the target protein attached to this DNA-binding domain (Figure 8-51). This protein binds to the regulatory region of a reporter gene, where it serves as “bait” to fish for proteins that interact with the target protein inside a yeast cell. To prepare a set of potential binding partners, DNA encoding the activation domain of a gene activator protein is ligated to a large mixture of DNA fragments from a cDNA library. Members of this collection of genes—the “prey”—are introduced individually into yeast cells containing the bait. If the yeast cell has received a DNA clone that expresses a prey partner for the bait protein, the two halves of a transcriptional activator are united, switching on the reporter gene (see Figure 8-51). Cells that express this reporter are selected and grown, and the gene (or gene fragment) encoding the prey protein is retrieved and identified through nucleotide sequencing.
The yeast two-hybrid system for detecting protein-protein interactions. The target protein is fused to a DNA-binding domain that localizes it to the regulatory region of a reporter gene as “bait.” When this target protein binds to another (more...)
Although it sounds complex, the two-hybrid system is relatively simple to use in the laboratory. Although the protein-protein interactions occur in the yeast cell nucleus, proteins from every part of the cell and from any organism can be studied in this way. Of the thousands of protein-protein interactions that have been catalogued in yeast, half have been discovered with such two-hybrid screens.
The two-hybrid system can be scaled up to map the interactions that occur among all of the proteins produced by an organism. In this case, a set of bait fusions is produced for every cellular protein, and each of these constructs is introduced into a separate yeast cell. These cells are then mated to yeast containing the prey library. Those rare cells that are positive for a protein-protein interaction are then characterized. In this way a protein linkage map has been generated for most of the 6,000 proteins in yeast (see Figure 3-78), and similar projects are underway to catalog the protein interactions in C. elegans and Drosophila.
A related technique, called a reverse two-hybrid system, can be used to identify mutations—or chemical compounds—that are able to disrupt specific protein-protein interactions. In this case the reporter gene can be replaced by a gene that kills cells in which the bait and prey proteins interact. Only those cells in which the proteins no longer bind—because an engineered mutation or a test compound prevents them from doing so—can survive. Like knocking out a gene (which we discuss shortly), eliminating a particular molecular interaction can reveal something about the role of the participating proteins in the cell. In addition, compounds that selectively interrupt protein interactions can be medically useful: a drug that prevents a virus from binding to its receptor protein on human cells could help people to avoid infections, for example.
Phage Display Methods Also Detect Protein Interactions
Another powerful method for detecting protein-protein interactions involves introducing genes into a virus that infects the E. coli bacterium (a bacteriophage, or “phage”). In this case the DNA encoding the protein of interest (or a smaller peptide fragment of this protein) is fused with a gene encoding one of the proteins that forms the viral coat. When this virus infects E. coli, it replicates, producing phage particles that display the hybrid protein on the outside of their coats (Figure 8-52A). This bacteriophage can then be used to fish for binding partners in a large pool of potential target proteins.
The phage display method for investigating protein interactions. (A) Preparation of the bacteriophage. DNA encoding the desired peptide is ligated into the phage vector, fused with the gene encoding the viral protein coat. The engineered phage are then (more...)
However, the most powerful use of this phage display method allows one to screen large collections of proteins or peptides for binding to selected targets. This approach requires first generating a library of fusion proteins, much like the prey library in the two-hybrid system. This collection of phage is then screened for binding to a purified protein of interest. For example, the phage library can be passed through an affinity column containing an immobilized target protein. Viruses that display a protein or peptide that binds tightly to the target are captured on the column and can be eluted with excess target protein. Those phage containing a DNA fragment that encodes an interacting protein or peptide are collected and allowed to replicate in E. coli. The DNA from each phage can then be recovered and its nucleotide sequence determined to identify the protein or peptide partner that bound to the target protein. A similar technique has been used to isolate peptides that bind specifically to the inside of the blood vessels associated with human tumors. These peptides are presently being tested as agents for delivering therapeutic anti-cancer compounds directly to such tumors (Figure 8-52B).
Phage display has also been used to generate monoclonal antibodies that recognize a specific target molecule or cell. In this case, a library of phage expressing the appropriate parts of antibody molecules is screened for those phage that bind to a target antigen.
Protein Interactions Can Be Monitored in Real Time Using Surface Plasmon Resonance
Once two proteins—or a protein and a small molecule—are known to associate, it becomes important to characterize their interaction in more detail. Proteins can bind to one another permanently, or engage in transient encounters in which proteins remain associated only temporarily. These dynamic interactions are often regulated through reversible modifications (such as phosphorylation), through ligand binding, or through the presence or absence of other proteins that compete for the same binding site.
To begin to understand these intricacies, one must determine how tightly two proteins associate, how slowly or rapidly molecular complexes assemble and break down over time, and how outside influences can affect these parameters. As we have seen in this chapter, there are many different techniques available to study protein-protein interactions, each with its individual advantages and disadvantages. One particularly useful method for monitoring the dynamics of protein association is called surface plasmon resonance (SPR). The SPR method has been used to characterize a wide variety of molecular interactions, including antibody-antigen binding, ligand-receptor coupling, and the binding of proteins to DNA, carbohydrates, small molecules, and other proteins.
SPR detects binding interactions by monitoring the reflection of a beam of light off the interface between an aqueous solution of potential binding molecules and a biosensor surface carrying immobilized bait protein. The bait protein is attached to a very thin layer of metal that coats one side of a glass prism (Figure 8-53). A light beam is passed through the prism; at a certain angle, called the resonance angle, some of the energy from the light interacts with the cloud of electrons in the metal film, generating a plasmon—an oscillation of the electrons at right angles to the plane of the film, bouncing up and down between its upper and lower surfaces like a weight on a spring. The plasmon, in turn, generates an electrical field that extends a short distance—about the wavelength of the light—above and below the metal surface. Any change in the composition of the environment within the range of the electrical field causes a measurable change in the resonance angle.
Surface plasmon resonance. (A) SPR can detect binding interactions by monitoring the reflection of a beam of light off the interface between an aqueous solution of potential binding molecules (green) and a biosensor surface coated with an immobilized (more...)
To measure binding, a solution containing proteins (or other molecules) that might interact with the immobilized bait protein is allowed to flow past the biosensor surface. When proteins bind to the bait, the composition of the molecular complexes on the metal surface change, causing a change in the resonance angle (see Figure 8-53). The changes in the resonance angle are monitored in real time and reflect the kinetics of the association—or dissociation—of molecules with the bait protein. The association rate (kon) is measured as the molecules interact, and the dissociation rate (koff) is determined as buffer washes the bound molecules from the sensor surface. A binding constant (K) is calculated by dividing koff by kon. In addition to determining the kinetics, SPR can be used to determine the number of molecules that are bound in each complex: the magnitude of the SPR signal change is proportional to the mass of the immobilized complex.
The SPR method is particularly useful because it requires only small amounts of proteins, the proteins do not have to be labeled in any way, and protein-protein interactions can be monitored in real time.
DNA Footprinting Reveals the Sites Where Proteins Bind on a DNA Molecule
So far we have concentrated on examining protein-protein interactions. But some proteins act by binding to DNA. Most of these proteins have a central role in determining which genes are active in a particular cell by binding to regulatory DNA sequences, which are usually located outside the coding regions of a gene.
In analyzing how such a protein functions, it is important to identify the specific nucleotide sequences to which it binds. A method used for this purpose is called DNA footprinting. First, a pure DNA fragment that is labeled at one end with 32P is isolated (see Figure 8-24B); this molecule is then cleaved with a nuclease or a chemical that makes random single-stranded cuts in the DNA. After the DNA molecule is denatured to separate its two strands, the resultant fragments from the labeled strand are separated on a gel and detected by autoradiography. The pattern of bands from DNA cut in the presence of a DNA-binding protein is compared with that from DNA cut in its absence. When the protein is present, it covers the nucleotides at its binding site and protects their phosphodiester bonds from cleavage. As a result, the labeled fragments that terminate in the binding site are missing, leaving a gap in the gel pattern called a “footprint” (Figure 8-54). Similar methods can be used to determine the binding sites of proteins on RNA.
The DNA footprinting technique. (A) This technique requires a DNA molecule that has been labeled at one end (see Figure 8-24B). The protein shown binds tightly to a specific DNA sequence that is seven nucleotides long, thereby protecting these seven nucleotides (more...)
Many powerful techniques are used to study the structure and function of a protein. To determine the three-dimensional structure of a protein at atomic resolution, large proteins have to be crystallized and studied by x-ray diffraction. The structure of small proteins in solution can be determined by nuclear magnetic resonance analysis. Because proteins with similar structures often have similar functions, the biochemical activity of a protein can sometimes be predicted by searching for known proteins that are similar in their amino acid sequences.
Further clues to the function of a protein can be derived from examining its subcellular distribution. Fusion of the protein with a molecular tag, such as the green fluorescent protein (GFP), allows one to track its movement inside the cell. Proteins that enter the nucleus and bind to DNA can be further characterized by footprint analysis, a technique used to determine which regulatory sequences the protein binds to as it controls gene transcription.
All proteins function by binding to other proteins or molecules, and many methods exist for studying protein-protein interactions and identifying potential protein partners. Either protein affinity chromatography or co-immunoprecipitation by antibodies directed against a target protein will allow physical isolation of interacting proteins. Other techniques, such as the two-hybrid system or phage display, permit the simultaneous isolation of interacting proteins and the genes that encode them. The identity of the proteins recovered from any of these approaches is then ascertained by determining the sequence of the protein or its corresponding gene.
Structure analysis of the cell-penetrating peptide transportan 10 (TP10) revealed an exemplary range of different conformations in the membrane-bound state. The bipartite peptide (derived N-terminally from galanin and C-terminally from mastoparan) was found to exhibit prominent characteristics of (i) amphiphilic α-helices, (ii) intrinsically disordered peptides, as well as (iii) β-pleated amyloid fibrils, and these conformational states become interconverted as a function of concentration. We used a complementary approach of solid-state 19F-NMR and circular dichroism in oriented membrane samples to characterize the structural and dynamical behaviour of TP10 in its monomeric and aggregated forms. Nine different positions in the peptide were selectively substituted with either the L- or D-enantiomer of 3-(trifluoromethyl)-bicyclopent-[1.1.1]-1-ylglycine (CF3-Bpg) as a reporter group for 19F-NMR. Using the L-epimeric analogs, a comprehensive three-dimensional structure analysis was carried out in lipid bilayers at low peptide concentration, where TP10 is monomeric. While the N-terminal region is flexible and intrinsically unstructured within the plane of the lipid bilayer, the C-terminal α-helix is embedded in the membrane with an oblique tilt angle of ∼55° and in accordance with its amphiphilic profile. Incorporation of the sterically obstructive D-CF3-Bpg reporter group into the helical region leads to a local unfolding of the membrane-bound peptide. At high concentration, these helix-destabilizing C-terminal substitutions promote aggregation into immobile β-sheets, which resemble amyloid fibrils. On the other hand, the obstructive D-CF3-Bpg substitutions can be accommodated in the flexible N-terminus of TP10 where they do not promote aggregation at high concentration. The cross-talk between the two regions of TP10 thus exerts a delicate balance on its conformational switch, as the presence of the α-helix counteracts the tendency of the unfolded N-terminus to self-assemble into β-pleated fibrils.
Membrane-active peptides are often used as cell-penetrating carriers and antimicrobial agents, but their actual structural behavior in the membrane-bound state remains elusive and makes rational design difficult. Interactions with the lipid bilayer tend to induce specific peptide conformations or trigger structural changes, such as folding, oligomerization, or aggregation into amyloid-like fibrils. Here, we have characterized a representative cell-penetrating peptide, transportan 10 (TP10), and observed a remarkable structural complexity and conformational equilibrium. The detailed insights gained into the stabilities and interconversions of its α-helical, unfolded and β-pleated states reveal some general principles on peptide folding and aggregation in membranes.
Cell-penetrating peptides (CPPs) are used to deliver hydrophilic cargo into cells without disrupting the plasma membrane –. Different classes of CPPs have been described, such as the arginine-rich transporters represented by oligo-arginine and the well-known TAT peptide ,  on the one hand, or the cationic amphiphilic peptides exemplified by penetratin , , – and the transportan family – on the other hand. CPPs differ in their mechanism of uptake, as transportan peptides tend to be more membrane-perturbing than arginine-rich ones , . In fact, transportan shares many characteristics with pore-forming α-helical antimicrobial peptides and toxins, such as magainin 2 and melittin –. Given their structural similarities, some CPPs have been shown to act like antimicrobial peptides, and vice versa, –. Another notable feature of many man-made peptides is their tendency to aggregate at high concentration , , a characteristic shared with natural fibril-forming peptides such as the toxic Aβ peptide –, α-synuclein , , IAPP , or calcitonin . In many of these systems, aggregation is surface-induced or at least enhanced upon binding to lipid membranes –, , –, , .
Peptides are often classified as cell penetrating, membrane permeabilizing, lytic or fusogenic, and all of them induce some kind of perturbation in the lipid bilayer. It is generally not possible to correlate either of these events with any particular peptide structure, as many peptides (i) are multifunctional , –, , (ii) have non-trivial membrane-bound conformations , and (iii) can interconvert between several different structures . Here, we set out to understand the diverse ways how a single representative peptide can interact with membranes. Such a comprehensive view on the conformational transitions and aggregation behavior should then be highly valuable for the wider area of membrane-active peptides in general.
We focused on the membrane-bound structures and the conformational transitions of TP10, a shorter analog of the original transportan (TP) peptide with reduced toxicity , , , . This chimeric family consist N-terminally of a sequence derived from the neuropeptide galanin , and C-terminally of a sequence from the wasp venom, mastoparan , . The two parts are linked by an extra Lys residue, and the hybrid peptides are highly cationic. They have been successfully used to transport a variety of biologically relevant cargoes into living cells , , though the detailed steps of internalization are still controversial. The structure of TP in a membrane-mimetic environment has been resolved using liquid-state NMR in SDS micelles and in phospholipid bicelles , . A well-defined α-helix is reported for the C-terminal mastoparan part, while the N-terminal galanin region is more disordered. The hinge between the two segments is located around Asn15 (equivalent to Asn9 in TP10). However, this picture neither reveals the interaction with nor the alignment of the peptide in a planar lipid membrane. Furthermore, TP has a tendency to aggregate, like many other cell-penetrating peptides in the membrane-bound state , . To understand and control the membrane interactions of this representative cell-penetrating peptide, an analysis of its detailed conformation and conformational transitions in contact with the lipid bilayer is required. Such insight is a prerequisite for optimizing any peptide sequences that are associated with cell uptake or applied to disrupt membranes.
Our strategy for investigating the membrane-bound peptide is based on two complementary techniques, namely solid-state NMR and oriented circular dichroism (OCD) –. Both methods make use of macroscopically aligned lipid bilayer samples, in which the peptide can be studied under quasi-native conditions, i.e., at ambient temperature, adequate hydration, and with a well-defined lipid composition and peptide-to-lipid ratio. OCD provides rapid qualitative information about the conformation and alignment of the peptide , while solid-state NMR can yield a full structure with quasi-atomic resolution , , , –. Especially 19F-NMR analysis of selectively 19F-labeled peptides is a highly sensitive approach to obtain site-specific information , , , –, similar to an alanine or cysteine scan used in molecular genetics. By introducing a single CF3-labeled amino acid into successive positions along the peptide backbone, a three-dimensional picture of the molecule in the lipid bilayer can be obtained. Further information on local and global peptide dynamics can be extracted from the effects of motional averaging , , , –. For several peptides it has already been possible to describe various concentration-dependent effects, such as the re-alignment of α-helices , – or the aggregation into β-sheets , , from which mechanistic insights could be deduced. With a typical length of 10 to 30 amino acids, all peptides studied so far have exhibited just one type of secondary structure in any particular membrane-bound state. Interestingly, we find here that the 21-mer TP10 possesses a distinct bipartite structure, in which the N- and C-terminal regions adopt different conformations, and perturbations in these two regions elicit a differential sensitivity towards aggregation.
The solid-state 19F-NMR approach relies on the designer-made 19F-labeled amino acid 3-(trifluoromethyl)-bicyclopent-[1.1.1]-1-ylglycine (CF3-Bpg), which has a stiff and sterically restrictive side chain , , , . The L-enantiomer has been recently established as a selective NMR label in studies of membrane-bound peptides with simple α-helical or β-pleated conformations , , , , . By incorporating D-amino acids into peptides, it has furthermore been possible to modulate their ability to aggregate as β-sheets , , . Now that 19F-NMR has been established as a reliable method, it is possible to address the bipartite structure and complex conformational transitions of the functionally interesting peptide TP10 in detail. This example shows for the first time how a hybrid peptide is embedded in a membrane with two different conformational elements, and how a local destabilization in the α-helical region promotes extensive aggregation into β-sheeted amyloid-like fibrils. By incorporating the sterically constrained CF3-Bpg into specific regions of TP10, we could thus observe and control its conformational transitions.
Materials and Methods
All amino acids were purchased from Novabiochem (Läufelfingen, Switzerland) or IRIS Biotech (Marktredwitz, Germany), and the coupling reagents 2-(1H-benzotriazol-1-yl)-1,1,3,3-tetramethyluroniumhexafluorophosphate (HBTU) and 1-hydroxybenzo-triazol (HOBt) from Biosolve (Valkenswaard, Netherlands). The 19F-labeled amino acid CF3-Bpg was prepared in the enantiomerically pure L- and D-forms by Enamine Ltd. (Kyiv, Ukraine). The lipids 1,2-dimyristoyl-sn-glycero-3-phosphocholine (DMPC) and 1,2-dimyristoyl-sn-glycero-3-[phospho-rac-(1-glycerol)] (DMPG) were obtained from Avanti Polar Lipids (Alabaster, AL, USA). RPMI 1640 and fetal calf serum (FCS) were from PAN Biotech (Aidenbach, Germany).
Solid-phase Peptide Synthesis
All peptides were synthesized on an automated Syro II multiple peptide synthesizer (MultiSynTech, Witten, Germany) with standard Fmoc solid-phase peptide synthesis protocols  and HOBt/HBTU as coupling reagents. The 19F-labeled amino acid CF3-Bpg (see Scheme S1 in File S1) was coupled manually for 2 h as previously described . The peptides were cleaved off the resin using a mixture of trifluoroacetic acid (TFA) (93.5%), triisopropylsilane (TIS) (4%) and H2O (2.5%), precipitated with diethyl ether and lyophilized. The crude peptides were purified by high-performance liquid chromatography (HPLC) on a preparative C18 column (22 mm×250 mm) (Vydac, Hesperia, CA, USA) using acetonitrile water gradients supplemented with 5 mM HCl. The identity of all peptides was confirmed by mass spectrometry. The purity of the peptides was found to be over 95%.
Carboxyfluorescein (CF)-labeling of Peptides
Peptides were N-terminally coupled to CF before being cleaved off the resin. Diisopropylcarbodiimide, HOBt and 5(6)-CF in a molar ratio of 1∶1∶1 were dissolved in dimethylformamide (DMF), mixed with the peptide on the resin in a molar ratio of 5∶1, and coupled for 12 h. After washing with DMF, dichloromethane (DCM), methanol (MeOH) and diethyl ether, piperidine (20% v/v in DMF) was added for 30 min. Afterwards the resin was washed with DMF, DCM and MeOH, dried under reduced pressure, and the peptide was cleaved off the resin and purified using HPLC.
Cell Uptake Assay with Fluorescence Microscopy
HeLa cells were maintained in RPMI 1640 supplemented with 10% FCS, and incubated at 37°C in a 5% CO2-containing humidified incubator. Cells were passaged every 2 to 3 days. The concentrations of the CF-labeled peptides were determined by measuring A492 in Tris-HCl buffer (pH 8.8), assuming a molar extinction coefficient of 75,000 M−1 cm−1. Confocal laser scanning microscopy was performed on a TCS SP5 confocal microscope (Leica Microsystems, Mannheim, Germany) equipped with an HCX PL APO 63×N.A. 1.2 water immersion lens. HeLa cells were maintained at 37°C on a temperature-controlled microscope stage. The cells were seeded at a density of 10,000 cells/well three days before the experiment in 8-well microscopy chambers (Nunc, Wiesbaden, Germany) and grown to 75% confluence. Cells were incubated with 2 or 10 µM of the CF-labeled TP10 wild type (WT) or the analogs in RPMI 1640 supplemented with 10% FCS for 30 min at 37°C. Cells were washed twice after the incubation, and living cells were analyzed immediately by confocal microscopy.
Circular Dichroism Spectroscopy
Circular dichroism (CD) experiments were performed on a J-815 spectropolarimeter (Jasco, Tokyo, Japan) over the range from 260 to 180 nm at 0.1 nm intervals, using a quartz glass cuvette of 1 mm optical path length (Suprasil, Hellma Optik, Jena, Germany). The spectra were recorded using a scan rate of 10 nm/min, 8 s response time and 1 nm bandwidth. Three scans were averaged and the baseline spectrum of pure lipids was subtracted. The spectra of the peptides in the presence of lipids were recorded at 30°C (i.e. above the lipid phase transition temperature). The appropriate amounts of peptides were dissolved in phosphate buffer (10 mM, pH 7) to yield a stock solution with a concentration of 0.5 mg/ml. Appropriate amounts of DMPC and DMPG in a molar ratio of 3∶1 were co-dissolved in CHCl3/MeOH, dried in a stream of nitrogen followed by drying overnight under reduced pressure, and suspended in 10 mM phosphate buffer, pH 7.0. The lipid dispersion was homogenized by vigorously vortexing for 10×1 min and by 10 freeze-thaw cycles. Afterwards, small unilamellar vesicles (SUVs) were generated by sonication of the multilamellar vesicles for 1 min in a strong ultrasonic bath (UTR 200, Hielscher, Germany). The sonication procedure was repeated 4 times, after cooling the water of the ultrasonic bath down to room temperature with ice, to avoid overheating of the samples. Three different peptide-to-lipid molar ratios (P/L = 1∶50, 1∶100, 1∶200) were tested with a lipid concentration of 1.5 mg/ml and corresponding peptide concentrations of 0.1 mg/ml for P/L of 1∶50, 0.05 mg/ml for a P/L of 1∶100, and 0.025 mg/ml for a P/L of 1∶200.
For secondary structure analysis of TP10, the CD spectrum of the WT peptide at a P/L = 1∶50 in DMPC/DMPG vesicles (see Figure S1 A/B in File S1) was converted to mean residue ellipticities by using the weighed-in peptide amount and the volume of the sample for concentration determination. A reliable UV concentration determination at 280 nm from the absorption of the single Tyr residue contained in the sequence was not possible due to the low absorption values and strong background scattering in the final vesicle sample used for CD. Secondary structure analyses were performed using three different algorithms: the CDSSTR program with the implemented SVD (singular value decomposition) algorithm , , the CONTIN-LL program, which is based on the ridge regression algorithm ,  and the SELCON-3 program, which incorporates the self-consistent method together with the SVD algorithm to assign protein secondary structure , . The three algorithms as well as the used protein CD spectra reference data set #7 are provided by the DICHROWEB on-line server , . The quality of the fit between experimental and back-calculated spectrum corresponding to the derived secondary structure fractions was assessed from the normalized root mean square deviation (NRMSD), with a value <0.1 (CONTIN-LL, CDSSTR) and <0.25 (SELCON-3) considered as a good fit .
Oriented Circular Dichroism Spectroscopy (OCD)
OCD experiments on macroscopically oriented samples were performed with a designated OCD sample holder built in-house . Lipids were dissolved in CHCl3 and peptides in MeOH, appropriate amounts mixed and spread onto a 12 mm diameter quartz glass plate. The final amount of peptide deposited on a quartz glass plate was 7.5×10−6 mmol, and the amount of lipid was adjusted to obtain desired P/L. After drying under reduced pressure, the samples were hydrated over night at 40°C in 96–97% relative humidity. The spectra were recorded in the range from 260 to 180 nm using a scan rate of 20 nm/min, 8 s response time, 1 nm bandwidth, at eight different rotations of the cell, and referenced by subtracting the background signal that was recorded with a sample containing the same amount of lipids without peptides.
Solid-state 19F-NMR Spectroscopy
All experiments were performed on a Bruker Avance 500 MHz NMR spectrometer (Bruker BioSpin, Rheinstetten, Germany) at 40°C. 31P-NMR spectra were acquired at a frequency of 202.5 MHz using a Hahn echo sequence  with a 90° pulse of 5 µs, 30 µs echo time, a sweep width of 200 kHz, 4096 data points and 28 kHz proton decoupling with SPINAL64  or TPPM . Usually 128 scans were recorded with a relaxation delay of 1 s. 19F-NMR was performed at a frequency of 470.6 MHz using an anti-ringing sequence  with a 90° pulse of 3.25 μs, a sweep width of 500 kHz, 4096 data points, and 24 kHz proton decoupling with SPINAL64. Usually, depending on the amount of peptide, between 2,000 (560 µg peptide, P/L = 1∶50) and 20,000 (130 µg peptide, P/L = 1∶400) scans were acquired with a relaxation delay of 1 s.
For preparing the oriented NMR samples, 14–15 mg (8 mg for a P/L of 1∶50) of a DMPC/DMPG mixture in a molar ratio of 3∶1 were dissolved in chloroform and an appropriate amount of peptide depending on the P/L (560 µg for P/L = 1∶50, 260 µg for P/L = 1∶200 and 130 µg for P/L = 1∶400) was dissolved in MeOH. Both solutions were thoroughly mixed and equally distributed on 18 glass plates (15 mm×7.5 mm×0.08 mm) (Marienfeld Laboratory Glassware, Lauda-Königshofen, Germany). The glass plates were stacked and hydrated at 48°C in 96% relative humidity for 24 h after drying overnight under reduced pressure. The hydrated samples were wrapped in parafilm and plastic foil before the NMR experiments.
NMR Structure Analysis
The orientation of the peptide in the bilayer was calculated based on the experimentally determined dipolar couplings (within the experimental error of 0.5 kHz) of the CF3-Bpg labeled TP10 analogs. According to our CD analysis, the backbone was modeled as an ideal α-helix. The alignment of the helix in a bilayer is described by the tilt angle (τ) with respect to the membrane normal, and by the azimuthal rotation angle (ρ) around the helix. The overall effect of motional averaging is taken into account by the Gaussian distribution parameters στ and σρ as previously described . The orientation of the Cα−Cβ bond vector of the 19F-labeled side chain is described by the angles α = 121.1° and β = 53.2° . The orientational parameters τ, ρ, στ and σρ were determined by a least-squares fit. The sum of squared deviations to the experimentally determined dipolar couplings was minimized to find the best-fit parameters.
Transmission Electron Microscopy (TEM)
The peptides were dissolved in water in a concentration of 2 mM in the absence of lipids. The peptide solution was sprayed on the carbon-coated grids and subsequently stained with uranyl acetate (5% aqueous solution with 30% ethanol) to enhance the contrast. Transmission electron microscopy was performed with a ZEISS 922 microscope at 200 keV electron energy with an in-column Omega filter. The images were taken in the bright-field mode applying zero-loss filtering for contrast enhancement.
To resolve the structural details and study the aggregation behavior of TP10 in the membrane-bound state, a single 19F-NMR reporter group was introduced in the peptide sequence as an L- or D-enantiomer of CF3-Bpg at position Gly2, Leu4, Leu5, Ile8, Leu10, Leu13, Leu16, Ile20, or Leu21. All analogs were also synthesized with a CF-label, yielding a total of 36 analogs (L-epimers are shown in Table S1 in File S1). The influence of the 19F-labeled amino acid on the secondary structure of the peptide was examined by circular dichroism (CD) spectroscopy in solution, and the influence on the biological cell uptake activity was assessed with another set of carboxyfluorescein-19F-labeled analogs using fluorescence microscopy on HeLa cells. We then analyzed the L-epimers using solid-state 19F-NMR to determine the conformation and orientation of monomeric TP10 at low peptide concentration in oriented DMPC/DMPG (3∶1) bilayers. Thereafter, the D-epimers were measured by 19F-NMR and OCD at high peptide concentration to examine their aggregation behavior. The morphology of the TP10 aggregates was characterized by transmission electron microscopy (TEM).
Characterization of the 19F-labeled TP10 Analogs by CD and Cell Uptake Assays
To examine the influence of 19F-labeling on the secondary structure of TP10, we studied the (WT) peptide and the 18 CF3-Bpg labeled analogs by CD spectroscopy in solution. The peptide was added to small unilamellar lipid vesicles composed of DMPC/DMPG (molar ratio of 3∶1) at three different peptide-to-lipid molar ratios (P/L), namely 1∶50, 1∶100 and 1∶200. In these freshly prepared samples, all TP10 analogs have an α-helical conformation similar to the WT peptide at all concentrations tested. CD spectra with a P/L of 1∶200 are shown in Figure 1A and 1B for the L- and D-epimers, respectively. (Data on P/L of 1∶50 and 1∶100 are presented in Figure S1 in File S1). The L-epimers showed no signs of perturbation, neither when the label was incorporated in the N-terminal galanin part, which is colored in green in this and subsequent figures (including residue Ile8, as will become obvious from the data below), nor in the C-terminal mastoparan part (colored in red). We may thus conclude that the TP10 analogs labeled with L-CF3-Bpg are essentially unperturbed and can be used for the subsequent 19F-NMR structure analysis. Rough secondary structure estimation for the WT peptide by deconvolution of the CD data revealed a helix fraction of around 56%, as shown in Table 1. This corresponds to a stretch of 12 amino acids that are helically folded, while the remaining part of the sequence is predominantly disordered.
CD spectra of the CF3-Bpg labeled TP10 analogs.
Secondary structure of TP10-WT bound to DMPC/DMPG vesicles evaluated from the CD spectrum (P/L = 1∶50, see Figure S1 A/B in File S1 for details).
The D-epimers showed slight deviations from the WT spectrum, especially when the CF3-Bpg label was incorporated in the C-terminal mastoparan region (see red lines in Figure 1B, and in Figures S1B, S1D, and S2B in File S1). It is not surprising that the configuration of D-CF3-Bpg leads to moderate structural perturbations due to the inversion of the natural stereochemistry at the backbone.
The additional set of 18 N-terminal carboxyfluorescein-19F-labeled TP10 analogs that were prepared for the cell uptake experiments were also characterized by CD, giving essentially the same lineshapes as without the fluorophore (spectra are presented in Figure S2 in File S1). We note already at this point that also the 19F-NMR spectra of the L-CF3-Bpg labeled peptides with and without carboxyfluorescein showed no significant differences (Figure S3 and Table S2 in File S1), so the fluorescent analogs prepared for the cell-uptake assays can be assumed to represent those used in the structure analysis.
The cell uptake of the carboxyfluorescein-labeled L-CF3-Bpg TP10 analogs was investigated by confocal fluorescence microscopy and compared to the WT peptide. The fluorescence was predominantly localized in punctate structures in the cytoplasm of HeLa cells after incubation for 30 min at 37°C (Figure 2 and Figure S4 in File S1). Since all L-epimers of TP10 showed an activity similar to the WT, they are functionally active and suitable for the subsequent 19F-NMR structure analysis.
Structure Analysis of Monomeric TP10, based on the L-epimers at Low Peptide Concentration
To determine the three-dimensional structure of membrane-bound TP10, we carried out solid-state 19F-NMR experiments in oriented DMPC/DMPG (3∶1) bilayers. The L-epimeric peptides were used for this analysis, as L-CF3-Bpg has the same configuration as natural amino acids. This side chain does not usually perturb the peptide conformation when substituted for a bulky hydrophobic residue, as in the case of the labeled positions Leu4, Leu5, Ile8, Leu10, Leu13, Leu16, Ile20, and Leu21, (plus another label at Gly2). The structural compatibility of L-CF3-Bpg has been demonstrated above by CD, and was previously shown also for other membrane-active peptides , . For the NMR structure analysis we first had to determine and stay below the concentration threshold at which the L-epimeric TP10 analogs would start to aggregate. The unique sensitivity of 19F-NMR makes it possible to detect even very low peptide concentrations , , , allowing us to readily screen P/L ratios of 1∶50, 1∶200, and 1∶400 (Figure 3A). Only at 1∶400 did we obtain sharp 19F-NMR spectra with no signs of aggregation (as explained below) for all TP10 L-epimers (Figure 3C), even after prolonged sample storage. Under these conditions also the 31P-NMR spectra of the phospholipids confirmed a high quality in terms of the degree of lipid orientation in the samples before and after the 19F-NMR measurements, with around 85% of the lipids being well-oriented (Figure 3B).
Solid-state NMR spectra of TP10:
The structure analysis was thus carried out at P/L = 1∶400, where the well-resolved 19F-NMR spectra yielded distinct dipolar splittings, and where all 19F-labeled analogs can be safely assumed to reflect the monomeric structure of TP10. The absence of aggregation could be further confirmed by showing that the peptides are mobile on the millisecond timescale of the NMR experiment due to free rotational diffusion in the bilayer plane , , , . To examine mobility, each oriented sample was measured twice: with the bilayer normal aligned parallel (0°) and perpendicular (90°) to the direction of the external magnetic field B0. As the dipolar splittings at 0° were found to be scaled by a factor of −1/2 upon changing the orientation to 90°, this means that the peptides are engaged in fast rotation around the bilayer normal, which is indicative of monomers or very small oligomers at most , , , . The NMR data of all TP10 L-epimers showed this behavior at low peptide concentration (P/L = 1∶400), confirming that no aggregation had taken place. In some of the NMR spectra at higher peptide concentration (P/L = 1∶50) an emerging powder pattern suggested the onset of immobilization. Yet, even in those cases the sharp dipolar couplings of the mobile component did not change with peptide concentration (Figure S5 and Table S3 in File S1). It can thus be concluded that TP10 has a well-defined structure at low concentration and does not undergo any concentration-dependent re-alignment or gradual conformational change before the onset of aggregation. All NMR spectra at P/L = 1∶50 and 1∶200 are given in Supporting Information (Figure S5 in File S1), and the dipolar couplings at P/L = 1∶50 and 1∶200 are listed in Table S3 in File S1.
The well-resolved dipolar splittings of the CF3-groups in the nine L-epimers were used as orientational constraints to calculate the three-dimensional structure of monomeric TP10 in the membrane. Details of the corresponding dipolar wave analysis have been described elsewhere , , , . Briefly, if a regular secondary structure such as an α-helix can be presumed, a least-squares fit will yield the helix tilt angle (τ) and the azimuthal rotation angle (ρ), plus the mobility parameters στ and σρ. In the case of TP10, there is ample evidence for an α-helical conformation, based on the CD analysis above and on the liquid-state 1H-NMR structure of the parent peptide transportan , , . However, when we attempted to fit the data of all nine L-CF3-Bpg labels to a straight α-helix, this led to an insufficient quality of the fit with a very large root-mean-square deviation (RMSD) (Figures 4A and 4B). On the other hand, it is known that the NMR structure of transportan in SDS micelles and phospholipid bicelles has a kink around Asn15 (equivalent to Asn9 in TP10) and a more flexible N-terminus , . Therefore, assuming an helical structure as evidenced by CD spectroscopy, we carried out a segmental analysis of the individual N- and C-terminal parts separately and obtained a very good fit for the C-terminal region with a low RMSD (Figures 4C and 4D).
These data suggest that the C-terminal region of TP10, which was derived from mastoparan, has a well-defined α-helical fold in the membrane-bound state. This finding supports our observation by CD that membrane-bound TP exhibits roughly 56% helicity (Table 1), which would corresponds well to the 12 C-terminal residues. Based on the dipolar couplings of the labeled positions Leu10, Leu13, Leu16, Ile20, and Leu21, the data analysis yielded a helix tilt angle of τ≈55° and an azimuthal rotation angle of ρ≈120°, with a moderate wobble (σρ) around the long axis. The alignment of the α-helix in the lipid bilayer is illustrated in Figure 5A. The obtained azimuthal rotation angle is in full agreement with the amphiphilic character of the helix. The helical wheel projection of the membrane-embedded C-terminal region in Figure 5B shows that the hydrophobic residues face the bilayer interior, while the positively charged Lys residues point towards the aqueous layer.
Membrane-bound structure of TP10, as derived by solid-state 19F
In contrast, the dipolar splittings of the N-terminal part of TP10 could not be fitted to an α-helix, or to any other regular secondary structure element. Instead, all dipolar couplings of the N-terminal labels (positions Gly2, Leu4, Leu5, Ile8) are close to +7 kHz, a value that has been reported as characteristic for a unstructured peptide backbone that is swimming around flexibly in the membrane . A previous study on isolated galanin peptides had also shown that this glycine-rich sequence is largely unstructured in various environments, including lipid bilayers and detergent micelles . We can thus refer to the galanin-derived segment of TP10 as a region of the peptide that is intrinsically unstructured within the two-dimensional membrane plane - analogous to the concept of intrinsically unstructured proteins in aqueous solution , . We thus conclude that the galanin-derived N-terminus of TP10 is essentially unstructured within the 2D plane of the membrane, while being anchored to the bilayer surface by its hydrophobic residues. The detailed structure derived here by 19F-NMR is good agreement with the qualitative picture obtained above by CD lineshape deconvolution (see Table 1), for which the estimated 56% helical fold can be attributed now to the C-terminal mastoparan-derived segment.
Analysis of the Aggregation Behavior of TP10, based on the D-epimers
Next, the dipolar splittings were measured for the series of TP10 epimers labeled with D-CF3-Bpg, where the Cα configuration is stereochemically inverted. At a low peptide concentration of P/L = 1∶400, the splittings along the entire sequence are all rather close to +7 kHz (Table S4 in File S1). The C-terminal part of TP10 can clearly no longer be fitted to an α-helix. Instead, the characteristic range of dipolar splittings suggests that the helix has now become unfolded within the plane of the membrane, just like the N-terminus. It thus appears that the introduction of a single bulky D-CF3-Bpg residue in the C-terminal segment leads to local destabilization around the D-amino acid and possibly of the entire α-helical part. On the other hand, incorporation of D-CF3-Bpg in the N-terminal region does not have any significant structural consequences, as this part of the peptide is intrinsically flexible anyway.
Having found that D-CF3-Bpg destabilizes the helical region of TP10 and leads to unfolding in the membrane at low concentration, we also examined the D-epimers at a high P/L ratio of 1∶50. The 19F-NMR spectra of freshly prepared samples are shown in Figure 6, and the splittings are listed in Table S4 in File S1. It appears that all analogs with D-substituents in the C-terminal region of TP10 (at positions Leu13, Leu16, Ile20, or Leu21) are partly or completely aggregated. This is seen from the static powder components in the lineshapes, with dipolar splittings of −8 kHz (as illustrated by the boxes in Figure 6). Upon changing the sample alignment from 0° to 90° these splittings are no longer reduced by a factor of −1/2, which is a sign that the peptide molecules are completely immobilized. On the other hand, introduction of D-CF3-Bpg into the N-terminal region of TP10 (at position Gly2, Leu4, Leu5, Ile8, or Leu10) did not lead to aggregation. In these cases, all dipolar splittings remained close to +7 kHz as seen for P/L = 1∶400, and they were reduced by a factor of −1/2 when the sample alignment was changed from 0° to 90° (see Figure 6, and Table S4 in File S1). It may thus be concluded that a destabilization of the α-helical mastoparan part of TP10 by the sterically restrictive D-CF3-Bpg leads to unfolding at low peptide concentration and an enhanced aggregation at high concentration.
D-amino acid “scan” to identify aggregation-prone regions in TP10.
Strictly speaking, the appearance of a static 19F-NMR powder pattern does not prove aggregation per se, but it shows that the molecules are no longer well-oriented in the membrane and are immobilized on the millisecond time-scale of the NMR experiment. It is impossible to deduce any structural parameters from these powder spectra, because all orientational order has been lost. We thus wanted to find out whether the immobilized peptides (P/L = 1∶50 positions Leu10, Leu13, Leu16, IIe20, Leu21 in Table S4 in File S1) have taken on a characteristic β-sheet conformation, as often implied for aggregation and amyloid formation. To this aim, we employed oriented circular dichroism (OCD) to determine the conformation of the TP10 analogs in the same type of oriented DMPC/DMPG (3∶1) samples and at the same P/L ratio of 1∶50 as in 19F-NMR. To pick up any slow kinetics of aggregation, ageing studies were performed over 1, 5 and 8 days of incubation at 48°C under a fully hydrated atmosphere, yielding the characteristic OCD spectra illustrated in Figure 7. The complete set of OCD spectra for the WT peptide and all the D-epimers is presented in Figure S6 in File S1. The observed decrease in spectral intensity with time is attributed to lateral phase segregation of the membrane-bound helical peptides into peptide-rich domains, which reduces the intensity of the CD signal.
OCD analysis demonstrated that all TP10 analogs with a D-CF3-Bpg substitution in the C-terminal region gave typical β-sheet lineshapes (Figure 7B). On the other hand, substitutions in the N-terminal part had no impact on the predominantly α-helical conformation of the D-epimeric peptides (Figure 7A), which behaved like the TP10 WT. We thus conclude that incorporation of the bulky D-amino acid into the flexible N-terminal region did not disturb the peptide structure and preserved the folded helical C-terminus. In contrast, D-CF3-Bpg in the C-terminal region led to an unfolding of the helix and made the peptide more disordered at low concentration, which permitted its aggregation at higher concentration into β-sheets. Overall, we have thus identified by OCD the same pattern of aggregation as seen by solid-state 19F-NMR under the same conditions in the same type of samples.
Transportan, like many other membrane-active peptides, has a known tendency to aggregate at high concentration , , , –. In the case of TP10, we recall that several L-epimeric TP10 analogs had already shown signs of aggregation at P/L = 1∶50 in fresh 19F-NMR samples (Figure S5 in File S1). The aggregation process is typically surface-induced and accelerated in the membrane-bound state as reported for various systems , –, where all peptide molecules are confined within the two-dimensional plane of the bilayer with an increased local concentration. Aggregation is often associated with the transition of an intrinsically unstructured peptide into oligomeric β-sheets in a concentration and time dependent manner. We have recently used the model peptide [KIGAKI]3 to examine this process and characterize its typical 19F-NMR signature , . Self-assembly of this system in membranes ,  leads to formation of fibrils that are similar to amyloid consisting of hydrogen-bonded cross-β-strands –, , .
It is intrinsically difficult to prove the formation of genuine amyloid fibrils in the presence of membranes. For example, the thioflavin-T assay does not work, because the positive charge on the thioflavin-T micelles interferes with binding to the cationic peptides due to electrostatic repulsion . Electron microscopy in the presence of membranes is problematic, too, because mixtures of lipids with self-assembling peptides can give rise to unusual morphologies in which it is hard to discriminate regular fibrils . Therefore, we decided to use TEM to directly observe the peptide aggregates in the absence of any lipid. The L- and D-epimers labeled at position Leu16 were incubated in water at 2 mM concentration at 20°C. After 24 hours the sample with the L-epimer was opaque and the D-epimer even more so. The representative TEM images in Figure 8 reveal a network of short fibrils in both samples, reminiscent of amyloid fibrils, thus supporting the OCD observations above (for OCD of L-epimer see Figure S7 in File S1). Assuming that peptide aggregation is an intrinsic property of a given sequence and that its immediate environment influences mainly the kinetics of aggregation but not the local morphology, it is reasonable to argue that the observed cross-β-sheet fibrils in the absence of lipids should be similar to those in the presence of membranes.
We have shown that TP10 can assume three distinctly different conformations in the membrane-bound state (Figure 9). The monomeric structure has a well-defined α-helix in the C-terminal region, and a flexible N-terminus that may be regarded as intrinsically unstructured in the plane of the membrane (Figure 9A). This picture is consistent with an earlier NMR analysis in detergent micelles , and with a more recent MD study of TP10 in a POPC bilayer . If the helix is perturbed, the peptide unfolds completely and starts to aggregate as amyloid-like fibrils in a concentration and time dependent manner. While the incorporation of L-CF3-Bpg as a 19F-NMR label does not significantly perturb the peptide, the sterically obstructive D-epimer can be used to dramatically shift the transition from the partially α-helical monomer ↔ unfolded monomer ↔ β-pleated aggregate. Remarkably, the tendency to aggregate depends distinctly on the position of the substitution. When D-CF3-Bpg is incorporated into the intrinsically flexible N-terminal region of TP10, the peptide maintains its usual partially α-helical structure (Figure 9B). A substitution in the C-terminal region, on the other hand, leads to unfolding and subsequent aggregation as β-sheets at high concentration (Figure 9C).
Structure and aggregating behavior of TP10.
The effect of D-CF3-Bpg on TP10 resembles the reported destabilization of the α-helical region in the Aβ peptide by D-amino acids or proline, which subsequently leads to enhanced aggregation . At the same time - and seemingly contradictory at first sight - it has been reported that D-amino acids can also be used to prevent the aggregation of peptides , , . This effect is not attributed to equilibrium between helix and unfolded state (as in TP10 or Aβ), but instead to the self-assembly of already unfolded peptides (from the β-stranded KIGAKI family) into aggregated fibrils. The resulting β-sheets are destabilized, because the sterically obstructive side chain cannot be readily accommodated in the ordered core of the amyloid. In the case of a sensitive helix-to-sheet equilibrium, the decisive aspect is whether the α-helix or the β-sheet is more strongly perturbed by the D-amino acid. Along the same lines, it has been reported that the systematic replacement of amino acid pairs with D-ProGly could accelerate Aβ fibril formation , while pairs of D-amino acids, or even single proline residues, have been used as amyloid breakers . It was thus interesting to see that not only the L- but also the D-epimers of TP10 ended up as amyloid-like fibrils. To reconcile these observations, it must be noted that β-sheets can actually tolerate the odd misfit. For example, the β-pleated model peptide [KIGAKI]3 can assemble as β-sheets even with one third of its amino acids exchanged to the D-form , , meaning that only the D-amino acid itself and its closest neighbors are excluded from the ordered array of hydrogen-bonds in a typical sheet. It is therefore perfectly reasonable that a bulky D-amino acid can have both effects, of either preventing or promoting amyloid formation. The observed outcome depends on the relative destabilization of the original (α-helical) conformation compared to the resulting β-sheet structure. In the case of TP10, which already contains an unstructured N-terminus, the aggregation equilibrium is shifted to the right when the destabilizing D-CF3-Bpg is incorporated into the C-terminus. In the Model Amphiphilic Peptide MAP [KLALKLALKALKAALKLA-NH2], on the other hand, the equilibrium is shifted to the left and aggregation is prevented by a D-amino acid. That is because MAP can engage in favorable interactions with the membrane only as an amphiphilic helix , . Likewise, in [KIGAKI]3, bulky D-amino acids shift the aggregation threshold to higher peptide concentration, as the unfolded peptide has no reason to become α-helical , , .
Based on the TEM images and the OCD spectra of aggregated TP10 with D-CF3-Bpg, which show an essentially complete β-structure, we expect that the entire peptide may assume a regular β-sheet conformation in the presence as well as absence of membranes, apart from the local position that is labeled with D-CF3