Xiang-Jun,
Sorry that it has taken me so long to reply but I wanted to cover all of my bases before I made too many wild claims of what did/didn't work. I should first start off by saying that I was using the older version of 3DNA (v1.5 I think) so I suspected that there would possible differences in the calculations. Thus, I installed v2.0 but found the same problem (with respect to missing the first G-C base pair, see below).
1) I know how hard it is to remain polite and professional when posting/monitoring forums/discussions especially when people want a quick answer so I always try my best to do my homework thoroughly before asking too many dumb questions. I think I dug into the bp_step.par file because I was trying to understand where certain values were coming from and why they existed in multiple places. Details, as you mentioned, are important and I definitely don't want to waste any body's valuable time. You've written a great tool in 3DNA and the support forum is a wealth of knowledge!
2) Running find_pair directly, I will try to keep that in mind next time!
3) Originally, I had attributed the missing G-C base pair to it being the first step and glazed over that fact. After you brought it up, I went back to look at the difference between v1.5 and v2.0. When I run find_pair v1.5, I get the following screen output:
Command: find_pair 2O8B.pdb 2O8B.out
...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: misc_3dna.par ......
...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: baselist.dat ......
unknown residue DG 1 on chain E [#1]
Check the base and add one more item in file <baselist.dat>
Notice that it complains about the DG residue. As well, it is unable to produce the corresponding 2O8B.out file. I think that this is due to the unrecognized naming convention "DG" which should be written as "GUA" instead. This is why I had extracted the coordinates before and renamed them all to GUA, ADE, THY, and CYT. This time, to see that 2) above works, I simply made a copy of 2O8B.pdb and changed all of the DNA nucleotides while keeping all of the other parts of the structural file intact. Running this through find_pair produced:
...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: misc_3dna.par ......
...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: baselist.dat ......
...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... /home/slaw/Desktop/Programs/X3DNAv1.5/X3DNA/BASEPARS/ ......
...... reading file: misc_3dna.par ......
Time used: 0.17 seconds
In the v2.0 case, the output looks like:
handling file <2O8B.pdb>
...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: misc_3dna.par ......
...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: baselist.dat ......
uncommon residue ADP 936 on chain A [#1793] assigned to: a
uncommon residue ADP 202 on chain B [#1795] assigned to: a
...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: atomlist.dat ......
...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... /home/slaw/Desktop/Programs/X3DNA/X3DNA/config/ ......
...... reading file: atomlist.dat ......
Time used: 00:00:00:01
Instead of complaining about the DG (which I assume is "fixed" in v2.0), it complains about the ADP nucleotides which are present in the PDB file (of 2O8B.pdb, not the modified one). Now, when I compare the ".out" file from both v1.5 and v2.0:
from v1.5:
2O8B.new.pdb
2O8B.new.out
2 # duplex
14 # number of base-pairs
1 0 # explicit bp numbering/hetero atoms
2 29 0 # 1 | E:...2_:[ADE]A-----T[THY]:..29_:F 0.89 0.82 26.82 9.09 1.03
3 28 0 # 2 | E:...3_:[ADE]A-----T[THY]:..28_:F 0.23 0.03 18.38 9.24 -1.20
4 27 0 # 3 | E:...4_:[CYT]C-----G[GUA]:..27_:F 0.78 0.19 8.59 8.93 -0.33
5 26 0 # 4 | E:...5_:[CYT]C-----G[GUA]:..26_:F 0.56 0.29 17.33 9.24 -0.36
6 25 0 # 5 | E:...6_:[GUA]G-----C[CYT]:..25_:F 0.29 0.27 19.28 9.01 -0.67
7 24 9 # 6 x E:...7_:[CYT]C-----G[GUA]:..24_:F 0.22 0.01 24.18 9.04 -1.26
8 23 0 # 7 | E:...8_:[GUA]G-*---T[THY]:..23_:F 5.22 0.31 43.28 9.87 5.84
9 22 0 # 8 | E:...9_:[CYT]C-----G[GUA]:..22_:F 0.44 0.33 20.98 8.84 -0.40
10 21 0 # 9 | E:..10_:[GUA]G-----C[CYT]:..21_:F 0.41 0.39 10.80 9.05 -0.32
11 20 0 # 10 | E:..11_:[CYT]C-----G[GUA]:..20_:F 0.26 0.01 12.86 9.06 -1.21
12 19 0 # 11 | E:..12_:[THY]T-----A[ADE]:..19_:F 0.85 0.47 11.88 8.95 0.28
13 18 0 # 12 | E:..13_:[ADE]A-----T[THY]:..18_:F 0.53 0.18 9.30 8.85 -0.61
14 17 0 # 13 | E:..14_:[GUA]G-----C[CYT]:..17_:F 1.17 0.94 21.28 8.97 1.54
15 16 0 # 14 | E:..15_:[GUA]G-----C[CYT]:..16_:F 1.17 0.05 37.85 8.66 -0.22
##### Base-pair criteria used: 4.00 15.00 2.50 65.00 4.50 7.50
##### 1 non-Watson-Crick base-pair, and 2 helices (0 isolated bps)
##### Helix #1 (6): 1 - 6
##### Helix #2 (
: 7 - 14
from v2.0:
2O8B.pdb
2O8B.out
2 # duplex
14 # number of base-pairs
1 1 # explicit bp numbering/hetero atoms
2 29 0 # 1 | ....>E:...2_:[.DA]A-----T[.DT]:..29_:F<.... 0.89 0.82 26.82 9.09 -1.13
3 28 0 # 2 | ....>E:...3_:[.DA]A-----T[.DT]:..28_:F<.... 0.23 0.03 18.38 9.24 -3.78
4 27 0 # 3 | ....>E:...4_:[.DC]C-----G[.DG]:..27_:F<.... 0.78 0.19 8.59 8.93 -3.40
5 26 0 # 4 | ....>E:...5_:[.DC]C-----G[.DG]:..26_:F<.... 0.56 0.29 17.33 9.24 -3.00
6 25 0 # 5 | ....>E:...6_:[.DG]G-----C[.DC]:..25_:F<.... 0.29 0.27 19.28 9.01 -3.20
7 24 9 # 6 x ....>E:...7_:[.DC]C-----G[.DG]:..24_:F<.... 0.22 0.01 24.18 9.04 -3.55
8 23 0 # 7 | ....>E:...8_:[.DG]G-*---T[.DT]:..23_:F<.... 5.22 0.31 43.28 9.87 7.00
9 22 0 # 8 | ....>E:...9_:[.DC]C-----G[.DG]:..22_:F<.... 0.44 0.33 20.98 8.84 -2.85
10 21 0 # 9 | ....>E:..10_:[.DG]G-----C[.DC]:..21_:F<.... 0.41 0.39 10.80 9.05 -3.28
11 20 0 # 10 | ....>E:..11_:[.DC]C-----G[.DG]:..20_:F<.... 0.26 0.01 12.86 9.06 -4.07
12 19 0 # 11 | ....>E:..12_:[.DT]T-----A[.DA]:..19_:F<.... 0.85 0.47 11.88 8.95 -2.62
13 18 0 # 12 | ....>E:..13_:[.DA]A-----T[.DT]:..18_:F<.... 0.53 0.18 9.30 8.85 -3.65
14 17 0 # 13 | ....>E:..14_:[.DG]G-----C[.DC]:..17_:F<.... 1.17 0.94 21.28 8.97 -0.89
15 16 0 # 14 | ....>E:..15_:[.DG]G-----C[.DC]:..16_:F<.... 1.17 0.05 37.85 8.66 -1.83
##### Base-pair criteria used: 4.00 0.00 15.00 2.50 65.00 4.50 7.50 [ O N]
##### 1 non-Watson-Crick base-pair, and 2 helices (0 isolated bps)
##### Helix #1 (6): 1 - 6
##### Helix #2 (
: 7 - 14
I notice some key differences/similarities:
i) They both contain the same number of lines.
ii) They both still do NOT contain the first G-C base pair information (even with v2.0 using an unmodified PDB file downloaded from PDB.org).
iii) The output format for v2.0 is slightly different from v1.5 (so my parsing script written in Perl will need to be modified)
iv) The final column in each row for each base step is different (-1.13 vs. 1.03). I think I read somewhere that this value is simply being calculated differently?
v) The base-pair criteria used appears slightly different.
From this, I still can't explain why the G-C base pair is missing.
4) I will try modifying the helix break parameter as you had suggested (just for experience) but from what you said, it looks like I could just extract the pertinent information directly from the "bp_step.par" file without having to do that since it will always include a complete set of parameters. Is that correct?
Thank you for your time.
Sean