Kurt
L. Krause, Feb 1998
Multiple Isomorphous Replacement and Heavy Atom Refinement
The most common method of solving new protein structures using X-ray crystallography is called multiple isomorphous replacement or MIR for short. In this method, crystals of the wild type protein whose structure is sought are grown in the usual manner, but after reaching maturity they are soaked in solutions of heavy atom compounds. The goal is to obtain derivatives crystals in which heavy atoms bind specifically and consistently to each protein molecule in the unit cell. After soaking, the positions of the heavy atoms are determined using difference Pattersons. For this step to be successful it is important that only a few heavy atoms bind in each asymmetric unit. Once the initial heavy atom locations have been found, the coordinates, occupancy and temperature factors of each heavy atom are refined. At least two isomorphous derivatives are needed for successful MIR refinement, and for MIRAS phasing, one isomorphous derivative plus anomalous scattering data is needed. In practice, data from several derivatives are combined for the refinement of heavy atom parameters and for the calculation of an MIR or MIRAS phased Fourier map that is suitable for building the initial structure.
Outline of an MIR structure determination
1. Harvest native protein crystals
2. Collect and interpret native data
3. Soak crystals in heavy atom solution
4. Collect data set from soaked crystals
5. Generate difference Patterson maps
6. Interpret Patterson and locate heavy atom sites
7. Refine heavy atom parameters
8. Generate heavy atom phased Fourier map
9. Solvent flattening and averaging; then repeat steps 7,8,9
10. Build initial model of native protein
In this exercise we will focus on steps 5,6,7 and 8 using the PHASES package written by William Furey at the University of Pittsburgh. This package contains a complete set of programs needed to complete an MIR structure determination, and much more. This package is easy to learn to use and it will run on both UNIX and open VMS platforms. For more information on how to receive PHASES contact William Furey . The version of PHASES used here is described in the March 1995 write-up; later versions may differ in how they operate. On-line documentation of PHASES is available.
The protein we will be solving is FRP flavin reductase from Vibrio harveyi. This protein is studied in the laboratory of one of our collaborators, Prof. Shao-Chun Tu. It is a homodimeric flavoprotein composed of two 240 residue monomers. Within the bacterial cell it functions to reduce FMN to FMNH2, thus the name, flavin reductase. The reduced flavin is thought to be transferred to bacterial luciferase for use in chemiluminescence. This structure was solved by a recent postdoctoral fellow from our laboratory, Jack Tanner, with assistance from a current postdoctoral fellow, Mitch Miller.
MIR exercise
For this exercise you will need to download several files needed by PHASES. First, and most important, are the heavy atom and native data sets labeled native.hkl and lead_iso.hkl respectively. A derivative data set with anomalous differences, lead_anom.hkl, is also included. Next, download the PHASES parameter file (fmn.pam). The parameter file contains information specific to this project including the unit cell, space group, lattice and symmetry equivalent positions. It is used by most of the different programs in the PHASES package. Finally, download all of the remaining input files (.inp) because they will be needed in later parts of this exercise. The parameter file called fmn.pam contains information specific to this project including the unit cell, space group lattice and symmetry equivalent positions. It is used by several different programs in the PHASES package.
Download these files:
1. native data (native.hkl)
2. derivative isomorphous data (lead_iso.hkl)
3. derivative anomalous data (lead_anom.hkl)
4. parameter file (fmn.pam)
5. Patterson input file (diffpatt.inp)
6. Peak search input file (psrch.inp)
7. Heavy atom refinement file (phasit.inp)
8. Native Fourier input file (miras_map.inp)
Setting up PHASES
To run PHASES more easily on a UNIX machine it is recommended that you learn the location of the PHASES executable files and type two simple commands prior to beginning the exercise. Let's assume that PHASES is located on your machine in the /usr/local/phases/bin. If so then type:
set path = ($path /usr/local/phases/bin)
setenv phases /usr/local/phases/bin
After typing these commands programs can be run by simply typing their name, with the appropriate input files. For example, to run PSRCH you would type:
psrch < prsch.inp
1. Calculation of a difference Patterson
Difference Pattersons are calculated by passing data through
three different procedures 1) scaling and combining native and derivative
data, 2) deletion of outliers and 3) Fourier generation. The flow
of data through this process is shown in (Figure 1) .

First run CMBISO interactively. Input the parameter file and the native and derivative data sets. In our example the derivative data set was collected with a trimethlylead soaked crystal. This compound was initially described by Hazel Holden and Ivan Rayment at the University of Wisconsin and, in our experience, it has often been successful in producing good derivatives. The CMBISO output information includes scaling statistics and various R-factors. Give the output file the name frp_iso.scl so that you know it contains merged, scaled isomorphous derivative data. CMBANO is run in an analogous manner.
Next, run TOPDEL and input the .scl file you just created in the last step. TOPDEL requires some input with regard to what percent of the difference data you want keep and how many of the biggest differences, if any, you wish to discard. The assumption here is that very large differences could be due to measurement error. For this exercise choose 100% and delete the top 10 differences. Try different values here and see if your peak heights are affected. TOPDEL will create an input coefficient file for Fourier generation. Choose diffpatt.coef for the output file name.
Next generate a difference Patterson map using the program FSFOUR. This program requires an input file that contains information needed by FSFOUR, such as the type of map being calculated, the input coefficient file, and the output map file name. This file is included in the course materials and is named diffpatt.inp. A number of other flags are set within this .inp file. They have been set properly for a difference Patterson calculation. To learn more about them consult the PHASES manual. Verify that diffpatt.map is being used for the name of the map file being generated. Run the program by typing fsfour < diffpatt.inp
Viewing the map is done with the interactive program MAPVIEW. Activate the program, choose y for FSFOUR style map and n for masking option. Then input 0,1 for the x,y and z parameters. Choose 2 for xz sections and contour section 45 out of 90. Good minimum, maximum and contour intervals are 150, 800, 100. At this point a difference Patterson with strong peaks should appear on your screen (Figure 2).

2. Solving the Patterson
Solving the Patterson requires identification of a consistent set of Harker peaks and verification of the resulting heavy atom location using the cross vectors. In space groups with low symmetry or very high symmetry the process can become more difficult. In our example, the crystals grow in space group P21 with unit cell positions x,y,z and -x, y+1/2 ,-z. Subtracting these two locations gives 2x, 1/2, 2z for the atomic coordinates for any Harker peaks. The y-coordinate is arbitrary but some authors recommend setting y=1/2 to remind the crystallographer that single site derivatives in this space group have a centrosymmetric distribution.
Locate the peaks in diffpatt.map by running PKSRCH. The responses can be entered interactively but essentially the program requests a parameter file, a map file, and the number of peaks desired. In our example PKSRCH identifies the two strong peaks on the Harker section as follows:
site 1 (u,v,w) = 0.1323 0.5000
0.2525
site 2 (u,v,w) = 0.2107
1.0000 0.3300
based on the Harker equation the first heavy atom site
is
site 1 (x,y,z) = 0.06511 0.25000
0.12703
The second "Harker" peak would ordinarily yield a set of atomic coordinates for site 2, but it turns out that this peak is not a Harker peak but actually a cross peak that relates a weak second site to the strong site 1 identified above. Because this weak site has a y-coordinate almost identical to the y-coordinate of site 1, the cross peak comes very close to occurring on the Harker section. In practice one would discover that the second peak on the Harker section was not a true Harker peak by solving for its location and noticing that no cross peak between site 1 and site 2 was present. You might then phase a difference Fourier with the first site to locate the minor site that was causing the cross peak.
One other point to consider involves choice of origin
in derivative with more than one site. Since multiple symmetry operators
exist in real space it is possible that the two sites relate to a different
origin. In our P21
example you can find a consistent origin by considering the four possible
solutions to the second site:
Solution 1 (u,1/2,w)
Solution 2 (u+1, 1/2,w)
Solution 3 (u, 1/2, w+1)
Solution 4 (u+1, 1/2, w+1)
These solutions correspond to the four possible origin choices in this example and the correct choice can be determined by looking for cross peaks at the proper location between this site and site 1. In real space these four solutions correspond to adding 1/2 to each of the coordinates.
3. Refining the Heavy Atom parameters
Within PHASES it is possible to perform single derivative
(SIR) refinement using GREF and MIR refinement using PHASIT. Each
of these programs will refine heavy atom coordinates, occupancies and,
if desired, temperature factors. In practice you might use the SIR
phases to calculate difference Pattersons to use in locating minor sites,
but the native Fourier coefficients would come from PHASIT. Anomalous
differences can also be incorporated in the phase refinement by using the
program CMBANO to scale and combine native and anomalous data sets. In
the example below both native and anomalous differences for the trimethyllead
derivative will be used in MIRAS refinement.
PHASIT is run by inputting the parameter file, an input file with the initial heavy atom locations and other input flags, and the scaled native-derivative data sets. Several cycles of refinement can be done with periodic recalculation of the phases. Heavy atom parameters can be refined using two methods, 1) phase refinement or 2) maximum likelihood (Figure 3). Phase refinement has been successfully used to solve structures for decades but can be tricky with irritating instabilities and weighting problems. Some authorities prefer maximum likelihood refinement which is thought to be more stable. See the PHASES manual for an excellent discussion of the vagaries of phase refinement.

Edit the phasit.inp file to verify the names of the data files and consult the PHASES manual to learn the meaning of each flag. In our example we will perform 6 cycles of refinement with 2 recalculations of the phases. We will be doing traditional phase refinement. After the last cycle PHASIT will output best phased/figure of merit weighted coefficients that we will use in the next step. Get started by typing:
phasit < phasit.inp
The parameters change very little during refinement which suggests that the locations of our sites are close to being correct.
4. Calculation of the native Fourier
In the last step of this exercise you will use two programs with which you are already acquainted, FSFOUR Fourier and MAPVIEW. To generate a native Fourier map with FSFOUR the input flags have to be changed to reflect the new input coefficients and desired map calculation. These changes have been made in miras_map.inp. Otherwise it is run just as before. In MAPVIEW you can take advantage of the ability to add sections on top of each other in order to see more clearly the protein solvent boundary. A clear protein/solvent boundary in an MIR map is usually a strong indication of a successful MIR refinement. In this example, set the contour levels at one sigma and start contouring at two sigma. If you inspect sections 1,2,3 added together a clear protein-solvent boundary should be apparent (Figure 4). Good luck!
References
(The Bricogne, G. and Jansonius, J. N. references are among my favorite
but I only have preprinted versions. If anyone has the official reference
please let me know. KLK)
Blow, D. M. and F. H. C. Crick, "The treatment of errors in the isomorphous
replacement method", ActaCryst. (1959), 12, 794-802.
Blundell, T. N. and L. N. Johnson, "Protein Crystallography" , Academic Press, New York, (1976).
Bricogne, G., "Multiple isomorphous replacement: The problem of parameter refinement from acentric reflections." ???,??,223-230.
Eisenberg, D and D. Crothers, "X-ray diffraction and the determination of molecular structure", In, Physical Chemistry with applications to the Life Sciences, Benjamin/Cummings Publishing Company, 1979.
Jansonius, J. N., "The isomorphous replacement method", ???,??,63 - 84.
PHASES-95: A Program Package for the Processing and Analysis of Diffraction Data from Macromolecules", W. Furey and S. Swaminathan, in METHODS IN ENZYMOLOGY: MACROMOLECULAR CRYSTALLOGRAPHY, PART B, Volume 277, Chapter 31, eds. C. Carter and R. Sweet, Academic Press, Orlando, Fl.(1997).
Stout, G. and L. Jensen, "X-Ray Structure Determination - A practical guide", Macmillan Publishing Co., New York.
” ALL RIGHTS RESERVED,
UNIVERSITY OF HOUSTON, 1998