Author Topic: DNA Step Values for DNA Mismatches (Read 100054 times)

slaw · « **on:** January 15, 2010, 12:12:28 pm »

Hi all,

I am studying the human DNA mismatch recognition protein, MutS, which is bound to a G-T mismatch. The PDBID is 2o8b.pdb. After extracting the DNA coordinates and analyzing it with 3dna, I get the following base-pair step parameters:

Local base-pair step parameters
step Shift Slide Rise Tilt Roll Twist
1 AA/TT -0.54 -0.56 3.45 4.80 -4.35 38.73
2 AC/GT 1.04 -0.46 3.47 -1.76 2.29 34.74
3 CC/GG -0.09 -0.04 3.17 8.61 5.77 27.01
4 CG/CG -0.46 0.75 3.23 -4.45 6.89 35.38
5 GC/GC -0.20 0.24 3.45 2.85 -2.55 38.63
6 CG/TG ---- ---- ---- ---- ---- ----
7 GC/GT 3.87 0.83 3.32 -11.26 0.08 6.18
8 CG/CG 0.16 -0.00 2.69 -0.59 9.40 30.96
9 GC/GC 0.67 -0.69 3.51 4.51 7.01 29.83
10 CT/AG -0.58 -0.42 3.55 -0.60 3.16 36.13
11 TA/TA 0.06 0.40 3.32 3.45 0.91 39.35
12 AG/CT 0.63 -0.36 2.88 -6.44 5.81 20.96
13 GG/CC 0.33 -0.95 3.81 4.06 12.57 33.98

I should point out that step 6 is where the G-T mismatch is located but I don't understand why the parameter values are missing. In addition, I notice that another output file (bp_step.par) that contains the base-pair step parameters actually has values for the mismatch:

14 base-pairs
0 ***local base-pair & step parameters***
Shear Stretch Stagger Buckle Prop-Tw Opening Shift Slide Rise Tilt Roll Twist
A-T -0.30 0.18 0.82 15.93 -21.58 -1.06 0.00 0.00 0.00 0.00 0.00 0.00
A-T 0.22 0.08 0.03 7.86 -16.61 -8.39 -0.54 -0.56 3.45 4.80 -4.35 38.73
C-G 0.74 -0.12 0.19 1.74 -8.41 4.29 1.04 -0.46 3.47 -1.76 2.29 34.74
C-G -0.40 0.27 -0.29 8.49 -15.11 7.26 -0.09 -0.04 3.17 8.61 5.77 27.01
G-C -0.08 -0.04 0.27 18.38 -5.82 -3.77 -0.46 0.75 3.23 -4.45 6.89 35.38
C-G 0.19 -0.11 0.01 11.72 -21.15 -2.89 -0.20 0.24 3.45 2.85 -2.55 38.63
G-T 5.21 0.18 0.31 -41.44 12.48 -69.61 -1.53 1.35 7.27 9.79 65.43 41.15
C-G 0.17 -0.23 0.33 -20.98 -0.21 -5.76 3.87 0.83 3.32 -11.26 0.08 6.18
G-C -0.12 -0.03 0.39 9.91 -4.28 1.38 0.16 -0.00 2.69 -0.59 9.40 30.96
C-G 0.26 0.06 -0.01 2.86 -12.54 4.36 0.67 -0.69 3.51 4.51 7.01 29.83
T-A 0.70 -0.16 0.47 0.86 -11.85 -2.94 -0.58 -0.42 3.55 -0.60 3.16 36.13
A-T 0.46 -0.21 0.18 -0.01 -9.30 -1.93 0.06 0.40 3.32 3.45 0.91 39.35
G-C -0.69 -0.09 0.94 18.10 -11.18 -0.17 0.63 -0.36 2.88 -6.44 5.81 20.96
G-C -1.15 -0.22 0.05 -6.23 -37.33 13.97 0.33 -0.95 3.81 4.06 12.57 33.98

The (shift, slide, rise, tilt, roll) values are essentially identical in both cases with the exception of the missing values for the mismatch. What do the values in the latter case actually mean and why are they missing in the first case? What I am interested in is calculating the local curvature around the mismatch (and not the global curvature) but since the first set of base-pair step parameters do not have values then it is not possible to calculate accurate curvature values surrounding the mismatch (since the program that I am using requires the base-pair step parameters as input). I want to show that although the global curvature is large, the local curvature of the DNA is relatively straight (when compared to, say, straight B-DNA). I am currently using MADBEND to measure DNA curvature (I generate the necessary base-pair step parameters to be used in MADBEND) and I understand that this isn't a program that you are supporting but I just thought that this would be relevant information.

Any help would be greatly appreciated! Thank you for your time.

Sean

xiangjun · « **Reply #1 on:** January 17, 2010, 05:45:42 pm »

Hi Sean,

Thanks for the well-formulated question. I am impressed that you noticed the fact that a set of step parameters is omitted in 3DNA main parameters file (*.out) of 2o8b, but available in file 'bp_step.par'. In my support of 3DNA over the years, you are the first who has dug into this detail.

First, a clarification:

Quote from: "Sean"

After extracting the DNA coordinates and analyzing it with 3dna, ...

Did you that you do not need to first perform "extracting the DNA coordinates" from the PDB file? With "find_pair", you can analyze a nucleic acid structure directly from a PDB file. See FAQ for an example.

Now to your specific questions:

Quote from: "Sean"

Code: [Select]
5 GC/GC -0.20 0.24 3.45 2.85 -2.55 38.63 6 CG/TG ---- ---- ---- ---- ---- ---- 7 GC/GT 3.87 0.83 3.32 -11.26 0.08 6.18I should point out that step 6 is where the G-T mismatch is located but I don't understand why the parameter values are missing.

If you check carefully the output file from "find_pair" (BTW, your output file apparently lacks the first G-C base-pair, why?),

Code: [Select]

2o8b.pdb
2o8b.out
    2         # duplex
   15         # number of base-pairs
    1    1    # explicit bp numbering/hetero atoms
    1   30  0 #    1 | ....>E:...1_:[.DG]G-----C[.DC]:..30_:F<....  1.14  0.36 14.09  9.30 -0.44
    2   29  0 #    2 | ....>E:...2_:[.DA]A-----T[.DT]:..29_:F<....  0.89  0.82 26.82  9.09 -1.13
    3   28  0 #    3 | ....>E:...3_:[.DA]A-----T[.DT]:..28_:F<....  0.23  0.03 18.38  9.24 -3.78
    4   27  0 #    4 | ....>E:...4_:[.DC]C-----G[.DG]:..27_:F<....  0.78  0.19  8.59  8.93 -3.40
    5   26  0 #    5 | ....>E:...5_:[.DC]C-----G[.DG]:..26_:F<....  0.56  0.29 17.33  9.24 -3.00
    6   25  0 #    6 | ....>E:...6_:[.DG]G-----C[.DC]:..25_:F<....  0.29  0.27 19.28  9.01 -3.20
    7   24  9 #    7 x ....>E:...7_:[.DC]C-----G[.DG]:..24_:F<....  0.22  0.01 24.18  9.04 -3.55
    8   23  0 #    8 | ....>E:...8_:[.DG]G-**--T[.DT]:..23_:F<....  5.22  0.31 43.28  9.87  7.00
    9   22  0 #    9 | ....>E:...9_:[.DC]C-----G[.DG]:..22_:F<....  0.44  0.33 20.98  8.84 -2.85
   10   21  0 #   10 | ....>E:..10_:[.DG]G-----C[.DC]:..21_:F<....  0.41  0.39 10.80  9.05 -3.28
   11   20  0 #   11 | ....>E:..11_:[.DC]C-----G[.DG]:..20_:F<....  0.26  0.01 12.86  9.06 -4.07
   12   19  0 #   12 | ....>E:..12_:[.DT]T-----A[.DA]:..19_:F<....  0.85  0.47 11.88  8.95 -2.62
   13   18  0 #   13 | ....>E:..13_:[.DA]A-----T[.DT]:..18_:F<....  0.53  0.18  9.30  8.85 -3.65
   14   17  0 #   14 | ....>E:..14_:[.DG]G-----C[.DC]:..17_:F<....  1.17  0.94 21.28  8.97 -0.89
   15   16  0 #   15 | ....>E:..15_:[.DG]G-----C[.DC]:..16_:F<....  1.17  0.05 37.85  8.66 -1.83
##### Base-pair criteria used:   4.00   0.00  15.00   2.50  65.00   4.50   7.50 [ O N]
##### 1 non-Watson-Crick base-pair, and 2 helices (0 isolated bps)
##### Helix #1 (7): 1 - 7
##### Helix #2 (8): 8 - 15

you will find that the structure has been broken into two helical fragments, at the middle G-T pair. If you set the helix_break parameter in file 'misc_3dna.par' (see the FAQ) from the default 7.5 Å to a larger value, e.g., 8.5 Å as below (3DNA v2.0),

Code: [Select]

#   distance criterion for helix break
<helix_break>8.5</helix_break>

"find_pair" will take the whole double helix as a single unit, and the "analyze" output parameters would have included the "missing" step.

Alternatively, with default 'misc_3dna.par' parameters, you can still recover the "missing" step parameters by adding '-c' option to "analyze". Type "analyze -h" to see how it works, and yet another alternative.

Quote from: "Sean"

In addition, I notice that another output file (bp_step.par) that contains the base-pair step parameters actually has values for the mismatch:
.............................
The (shift, slide, rise, tilt, roll) values are essentially identical in both cases with the exception of the missing values for the mismatch. What do the values in the latter case actually mean and why are they missing in the first case?

No matter how many helical regions are involved in the input file to "analyze", the fixed-named output file 'bp_step.par' always include a compete set of parameters, including the inter-helix steps. This is to ensure that "rebuild" has enough information to construct a model of the whole structure. Thus, the values in the two cases (*.out vs "bp_step.par") mean exactly the same thing; they just serve two different purposes.

HTH,

Xiang-Jun

slaw · « **Reply #2 on:** January 21, 2010, 11:18:51 am »

Xiang-Jun,

Sorry that it has taken me so long to reply but I wanted to cover all of my bases before I made too many wild claims of what did/didn't work. I should first start off by saying that I was using the older version of 3DNA (v1.5 I think) so I suspected that there would possible differences in the calculations. Thus, I installed v2.0 but found the same problem (with respect to missing the first G-C base pair, see below).

1) I know how hard it is to remain polite and professional when posting/monitoring forums/discussions especially when people want a quick answer so I always try my best to do my homework thoroughly before asking too many dumb questions. I think I dug into the bp_step.par file because I was trying to understand where certain values were coming from and why they existed in multiple places. Details, as you mentioned, are important and I definitely don't want to waste any body's valuable time. You've written a great tool in 3DNA and the support forum is a wealth of knowledge!

2) Running find_pair directly, I will try to keep that in mind next time!

3) Originally, I had attributed the missing G-C base pair to it being the first step and glazed over that fact. After you brought it up, I went back to look at the difference between v1.5 and v2.0. When I run find_pair v1.5, I get the following screen output:

Command: find_pair 2O8B.pdb 2O8B.out

...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: misc_3dna.par ......

...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: baselist.dat ......
unknown residue DG 1 on chain E [#1]
Check the base and add one more item in file <baselist.dat>

Notice that it complains about the DG residue. As well, it is unable to produce the corresponding 2O8B.out file. I think that this is due to the unrecognized naming convention "DG" which should be written as "GUA" instead. This is why I had extracted the coordinates before and renamed them all to GUA, ADE, THY, and CYT. This time, to see that 2) above works, I simply made a copy of 2O8B.pdb and changed all of the DNA nucleotides while keeping all of the other parts of the structural file intact. Running this through find_pair produced:

...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: misc_3dna.par ......

...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: baselist.dat ......

...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......

...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: misc_3dna.par ......

Time used: 0.17 seconds

In the v2.0 case, the output looks like:

handling file <2O8B.pdb>

...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: misc_3dna.par ......

...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: baselist.dat ......
uncommon residue ADP 936 on chain A [#1793] assigned to: a
uncommon residue ADP 202 on chain B [#1795] assigned to: a

...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: atomlist.dat ......

...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......

...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: atomlist.dat ......

Time used: 00:00:00:01

Instead of complaining about the DG (which I assume is "fixed" in v2.0), it complains about the ADP nucleotides which are present in the PDB file (of 2O8B.pdb, not the modified one). Now, when I compare the ".out" file from both v1.5 and v2.0:

from v1.5:

2O8B.new.pdb
2O8B.new.out
2 # duplex
14 # number of base-pairs
1 0 # explicit bp numbering/hetero atoms
2 29 0 # 1 | E:...2_:[ADE]A-----T[THY]:..29_:F 0.89 0.82 26.82 9.09 1.03
3 28 0 # 2 | E:...3_:[ADE]A-----T[THY]:..28_:F 0.23 0.03 18.38 9.24 -1.20
4 27 0 # 3 | E:...4_:[CYT]C-----G[GUA]:..27_:F 0.78 0.19 8.59 8.93 -0.33
5 26 0 # 4 | E:...5_:[CYT]C-----G[GUA]:..26_:F 0.56 0.29 17.33 9.24 -0.36
6 25 0 # 5 | E:...6_:[GUA]G-----C[CYT]:..25_:F 0.29 0.27 19.28 9.01 -0.67
7 24 9 # 6 x E:...7_:[CYT]C-----G[GUA]:..24_:F 0.22 0.01 24.18 9.04 -1.26
8 23 0 # 7 | E:...8_:[GUA]G-*---T[THY]:..23_:F 5.22 0.31 43.28 9.87 5.84
9 22 0 # 8 | E:...9_:[CYT]C-----G[GUA]:..22_:F 0.44 0.33 20.98 8.84 -0.40
10 21 0 # 9 | E:..10_:[GUA]G-----C[CYT]:..21_:F 0.41 0.39 10.80 9.05 -0.32
11 20 0 # 10 | E:..11_:[CYT]C-----G[GUA]:..20_:F 0.26 0.01 12.86 9.06 -1.21
12 19 0 # 11 | E:..12_:[THY]T-----A[ADE]:..19_:F 0.85 0.47 11.88 8.95 0.28
13 18 0 # 12 | E:..13_:[ADE]A-----T[THY]:..18_:F 0.53 0.18 9.30 8.85 -0.61
14 17 0 # 13 | E:..14_:[GUA]G-----C[CYT]:..17_:F 1.17 0.94 21.28 8.97 1.54
15 16 0 # 14 | E:..15_:[GUA]G-----C[CYT]:..16_:F 1.17 0.05 37.85 8.66 -0.22
##### Base-pair criteria used: 4.00 15.00 2.50 65.00 4.50 7.50
##### 1 non-Watson-Crick base-pair, and 2 helices (0 isolated bps)
##### Helix #1 (6): 1 - 6
##### Helix #2 (

: 7 - 14

from v2.0:

2O8B.pdb
2O8B.out
2 # duplex
14 # number of base-pairs
1 1 # explicit bp numbering/hetero atoms
2 29 0 # 1 | ....>E:...2_:[.DA]A-----T[.DT]:..29_:F<.... 0.89 0.82 26.82 9.09 -1.13
3 28 0 # 2 | ....>E:...3_:[.DA]A-----T[.DT]:..28_:F<.... 0.23 0.03 18.38 9.24 -3.78
4 27 0 # 3 | ....>E:...4_:[.DC]C-----G[.DG]:..27_:F<.... 0.78 0.19 8.59 8.93 -3.40
5 26 0 # 4 | ....>E:...5_:[.DC]C-----G[.DG]:..26_:F<.... 0.56 0.29 17.33 9.24 -3.00
6 25 0 # 5 | ....>E:...6_:[.DG]G-----C[.DC]:..25_:F<.... 0.29 0.27 19.28 9.01 -3.20
7 24 9 # 6 x ....>E:...7_:[.DC]C-----G[.DG]:..24_:F<.... 0.22 0.01 24.18 9.04 -3.55
8 23 0 # 7 | ....>E:...8_:[.DG]G-*---T[.DT]:..23_:F<.... 5.22 0.31 43.28 9.87 7.00
9 22 0 # 8 | ....>E:...9_:[.DC]C-----G[.DG]:..22_:F<.... 0.44 0.33 20.98 8.84 -2.85
10 21 0 # 9 | ....>E:..10_:[.DG]G-----C[.DC]:..21_:F<.... 0.41 0.39 10.80 9.05 -3.28
11 20 0 # 10 | ....>E:..11_:[.DC]C-----G[.DG]:..20_:F<.... 0.26 0.01 12.86 9.06 -4.07
12 19 0 # 11 | ....>E:..12_:[.DT]T-----A[.DA]:..19_:F<.... 0.85 0.47 11.88 8.95 -2.62
13 18 0 # 12 | ....>E:..13_:[.DA]A-----T[.DT]:..18_:F<.... 0.53 0.18 9.30 8.85 -3.65
14 17 0 # 13 | ....>E:..14_:[.DG]G-----C[.DC]:..17_:F<.... 1.17 0.94 21.28 8.97 -0.89
15 16 0 # 14 | ....>E:..15_:[.DG]G-----C[.DC]:..16_:F<.... 1.17 0.05 37.85 8.66 -1.83
##### Base-pair criteria used: 4.00 0.00 15.00 2.50 65.00 4.50 7.50 [ O N]
##### 1 non-Watson-Crick base-pair, and 2 helices (0 isolated bps)
##### Helix #1 (6): 1 - 6
##### Helix #2 (

: 7 - 14

I notice some key differences/similarities:

i) They both contain the same number of lines.

ii) They both still do NOT contain the first G-C base pair information (even with v2.0 using an unmodified PDB file downloaded from PDB.org).

iii) The output format for v2.0 is slightly different from v1.5 (so my parsing script written in Perl will need to be modified)

iv) The final column in each row for each base step is different (-1.13 vs. 1.03). I think I read somewhere that this value is simply being calculated differently?

v) The base-pair criteria used appears slightly different.

From this, I still can't explain why the G-C base pair is missing.

4) I will try modifying the helix break parameter as you had suggested (just for experience) but from what you said, it looks like I could just extract the pertinent information directly from the "bp_step.par" file without having to do that since it will always include a complete set of parameters. Is that correct?

Thank you for your time.

Sean

slaw · « **Reply #3 on:** January 21, 2010, 11:25:19 am »

On a side note, I stumbled across your blog page and, per your suggestion, I tested out Valgrind on a simple piece of code which had been giving me some problems previously (and was unresolved). Thanks to your suggestion, I was able to pinpoint and correctly identify the (memory-related) bug in my code!

Sean

xiangjun · « **Reply #4 on:** January 21, 2010, 10:33:57 pm »

Quote

...... reading file: baselist.dat ......
unknown residue DG 1 on chain E [#1]
Check the base and add one more item in file <baselist.dat>

Take the hint, and read FAQs on "How to fix missing (superfluous) base pairs identified by find_pair?" and "How to handle modified (uncommon) bases?" to see how to handle such cases.

Quote from: "Sean"

Notice that it complains about the DG residue. As well, it is unable to produce the corresponding 2O8B.out file. I think that this is due to the unrecognized naming convention "DG" which should be written as "GUA" instead. This is why I had extracted the coordinates before and renamed them all to GUA, ADE, THY, and CYT.

Following the suggestion above, you do not need to manipulate PDB file at all. 3DNA v1.5 works in such situation as well, and this topic has been brought up in the forum before.

Quote

...... reading file: baselist.dat ......
uncommon residue ADP 936 on chain A [#1793] assigned to: a
uncommon residue ADP 202 on chain B [#1795] assigned to: a
................................................................
Instead of complaining about the DG (which I assume is "fixed" in v2.0), it complains about the ADP nucleotides which are present in the PDB file (of 2O8B.pdb, not the modified one).

Note that this is for information only. Again, reading and understanding the above mentioned two FAQs would help.

Quote from: "Sean"

ii) They both still do NOT contain the first G-C base pair information (even with v2.0 using an unmodified PDB file downloF:..30_:[.DC]Caded from PDB.org).

I have just checked and reproduced your result with the distributed 3DNA v2.0. The missing first G-C base-pair is due to the distortion of C30 on chain F.

Quote from: "Sean"

iii) The output format for v2.0 is slightly different from v1.5 (so my parsing script written in Perl will need to be modified)

iv) The final column in each row for each base step is different (-1.13 vs. 1.03). I think I read somewhere that this value is simply being calculated differently?

v) The base-pair criteria used appears slightly different.

Over the time, there are some internal changes in find_pair. Moreover, contents following "#" are for information only, undocumented, and are subjected to changes.

Quote from: "Sean"

4) I will try modifying the helix break parameter as you had suggested (just for experience) but from what you said, it looks like I could just extract the pertinent information directly from the "bp_step.par" file without having to do that since it will always include a complete set of parameters. Is that correct?

If you are just interested in getting the numbers, yes, 'bp_step.par' contains all the base-pair and step parameters.

HTH,

Xiang-Jun

News:

Author Topic: DNA Step Values for DNA Mismatches (Read 100054 times)

slaw

DNA Step Values for DNA Mismatches

xiangjun

Re: DNA Step Values for DNA Mismatches

slaw

Re: DNA Step Values for DNA Mismatches

slaw

Re: DNA Step Values for DNA Mismatches

xiangjun

Re: DNA Step Values for DNA Mismatches