How to fix missing (superfluous) base pairs identified by find

xiangjun:
Structural analysis of nucleic acid used to be a rather tedious process, especially for irregular, complicated RNA structures and nucleic acid-protein complexes (e.g., the large ribosomal subunit 1jj2/rr0033). Without valid base-pairing information as input, the various analysis software will produce meaningless results. The program find_pair was originally created to solve this specific problem, by generating input file to 3DNA analysis routines (analyze/cehs) directly from a PDB file.

In its core, find_pair uses a pure geometric approach to identify all possible pairs (Watson-Cricks or non-canonical pairs actually exist in a structure), their H-bonding patterns and helix context. Specifically, the major criteria used are as follows:

* The distance between the origins of the two bases (as defined by their standard reference frames) must be less than certain limit (15.0 Å by default) - otherwise, they would be too far away to be called a pair.
* The vertical separation (i.e., stagger) between the two base planes must be less than certain limit (2.5 Å by default) - otherwise, they would be stacking instead of pairing.
* The angle between the two base z-axes (i.e., their normal vectors) is less than a cut-off (65.0° by default).
* There is at least one pair of nitrogen/oxygen base atoms that are within a H-bonding cut off distance (4.0 Å by default).If two bases fulfill these geometric requirements, they are defined to be a pair, without taking consideration of their chemical constituents. Thus our method allows for identification of unconventional pairs as easily as the canonical ones. The program then checks for possible H-bonding patterns, whether the normal donor-acceptor (noted by '-' as in O6 - N4 for a G·C pair) or the unusual donor-donor, acceptor-acceptor (noted by '*' as in O2 * N3 for a C·C pair in urx057). The non-canonical pairs, especially those with unusual H-bonding patterns, should be checked more carefully - they could be due to errors in structure determination, or they could have some special meaning/significance unnoticed previously.

The default criteria mentioned above are based on a survey of the NDB structures. Generally speaking, they are pretty generous and work quite well in the most common cases we've encountered. However, we are aware of the possibilities of special cases where some of them might be too restrict or too generous, thus leading to find_pair to miss or produce superfluous base pairs. The default settings are stored in a text file named misc_3dna.par under the directory $X3DNA/config/ where users can modify as they see fit. Changes in that directory will have a global effect - wherever you run find_pair on your system, the modified values will be used. Alternately, users could make a copy of misc_3dna.par to their current working directory and change it over there for local effect. Note that the local setting has precedence over the global one.

As an example, find_pair will miss the 127th base-pair I:..53_:[.DT]T-----A[.DA]:.-53_:J in structure 1kx5/pd0287 in its default settings. This is because the H-bonding distance between T:N3 - A:N1 is 4.20 Å and that for T:O4 - A:N6 is 4.85 Å; both of them are larger than the default 4.0 Å cut off. Increasing the H-bonding criterion in file misc_3dna.par from 4.0 Å to 5.0 Å will solve this problem. Please note that in 3DNA, users can start directly from an uncompressed PDB file, without having to extract the DNA fragment first:

* find_pair 1kx5.pdb 1kx5.inp to get input file for analyze
* analyze 1kx5.inp to get detailed structural parameters in file 1kx5.out
* The above two steps can be combined into one: find_pair 1kx5.pdb stdout | analyze stdinIn addition to (or instead of) manipulating parameters in misc_3dna.par, oftentimes it may be preferable to manually edit find_pair-generated base-piar files before feeding them into analyze/cehs. This allows for maximum flexibility as to which pair to consider in calculating 3DNA structural parameters.

Also worth noting is the -p option of find_pair: without this option, find_pair locates base pairs in double-helical regions; thus the Watson-Crick pairs take precedence over the Wobble and other non-canonical pairs. With the -p, then all pairs and higher order base associations (i.e., triplets and above) are detected.

sli:
Dear Dr. Lu
I want to get DNA structure's information from a pdb file of DNA-protein complex.when I use 'find_pair' program , the mismatched base pairs which one base is completely flipping are default deleted. My object is collect the size of the roll angle of the mismatched base pair. Due to my.bps file is lack of the mismatched base pair,when I use 'analyze my.bps' to generate 'bp_step.par', there are not the mismatched base pairs information. I referenced the method of ' How to fix missing (superfluous) base pairs identified by find_pair?' .But I still not solve my problem. Could you give me some suggestions? Thank you!

xiangjun:
In your my.pdb structure, the DNA nucleotide D.UF2/17 is flipped out of the duplex, instead of forming a pair with C.DG12. In such cases, you could proceed as noted in the FAQ:

--- Quote ---In addition to (or instead of) manipulating parameters in misc_3dna.par, oftentimes it may be preferable to manually edit find_pair-generated base-piar files before feeding them into analyze/cehs. This allows for maximum flexibility as to which pair to consider in calculating 3DNA structural parameters.
--- End quote ---

So you could manually edit the find_pair output: changing 27 to 28, and add an extra line "209 242" as highlighted in red below. Note that in such cases, the step parameters (such as slide/roll etc) may not make intuitive sense (the numbers look weird). Nevertheless, they can be used to rigorously "rebuild" the original base geometry. See the 2008 3DNA Nature Protocols paper for more info.

my.pdb
my.out
2 # duplex
28 # number of base-pairs
1 1 # explicit bp numbering/hetero atoms
198 253 0 # 1 | ....>C:...1_:[.DC]C-----G[.DG]:..28_:D<.... 0.09 0.00 12.58 9.13 -4.27
199 252 0 # 2 | ....>C:...2_:[.DA]A-----T[.DT]:..27_:D<.... 0.29 0.11 12.49 8.76 -3.87
200 251 0 # 3 | ....>C:...3_:[.DG]G-----C[.DC]:..26_:D<.... 0.38 0.17 11.55 8.98 -3.71
201 250 0 # 4 | ....>C:...4_:[.DC]C-----G[.DG]:..25_:D<.... 1.05 0.36 15.27 8.77 -2.47
202 249 0 # 5 | ....>C:...5_:[.DT]T-----A[.DA]:..24_:D<.... 0.18 0.01 12.00 8.93 -4.21
203 248 0 # 6 | ....>C:...6_:[.DC]C-----G[.DG]:..23_:D<.... 0.19 0.02 9.73 9.01 -4.29
204 247 0 # 7 | ....>C:...7_:[.DT]T-----A[.DA]:..22_:D<.... 0.28 0.04 8.95 9.03 -4.18
205 246 0 # 8 | ....>C:...8_:[.DG]G-----C[.DC]:..21_:D<.... 0.34 0.08 7.51 8.96 -4.12
206 245 0 # 9 | ....>C:...9_:[.DT]T-----A[.DA]:..20_:D<.... 0.20 0.12 8.73 8.88 -4.12
207 244 0 # 10 | ....>C:..10_:[.DA]A-----T[.DT]:..19_:D<.... 0.23 0.18 13.22 8.78 -3.75
208 243 0 # 11 x ....>C:..11_:[.DC]C-----G[.DG]:..18_:D<.... 0.35 0.30 19.14 8.92 -3.08
209 242 0
210 241 0 # 12 | ....>C:..13_:[.DT]T-----A[.DA]:..16_:D<.... 0.29 0.11 18.66 8.94 -3.55
211 240 0 # 13 | ....>C:..14_:[.DG]G-----C[.DC]:..15_:D<.... 0.18 0.13 10.54 9.00 -4.03
212 239 0 # 14 | ....>C:..15_:[.DA]A-----T[.DT]:..14_:D<.... 0.48 0.02 7.98 8.84 -4.09
213 238 0 # 15 | ....>C:..16_:[.DG]G-----C[.DC]:..13_:D<.... 0.29 0.21 8.86 9.02 -3.85
214 237 0 # 16 | ....>C:..17_:[.DC]C-----G[.DG]:..12_:D<.... 0.39 0.09 11.91 8.95 -3.84
215 236 0 # 17 | ....>C:..18_:[.DG]G-----C[.DC]:..11_:D<.... 0.46 0.11 16.51 8.89 -3.50
216 235 0 # 18 | ....>C:..19_:[.DA]A-----T[.DT]:..10_:D<.... 0.23 0.20 15.04 8.89 -3.62
217 234 0 # 19 | ....>C:..20_:[.DT]T-----A[.DA]:...9_:D<.... 0.12 0.02 13.95 8.86 -4.15
218 233 0 # 20 | ....>C:..21_:[.DG]G-----C[.DC]:...8_:D<.... 0.27 0.03 4.95 8.88 -4.43
219 232 0 # 21 | ....>C:..22_:[.DG]G-----C[.DC]:...7_:D<.... 0.36 0.19 14.07 8.97 -3.55
220 231 0 # 22 | ....>C:..23_:[.DA]A-----T[.DT]:...6_:D<.... 0.11 0.03 11.03 9.00 -4.27
221 230 0 # 23 | ....>C:..24_:[.DC]C-----G[.DG]:...5_:D<.... 0.26 0.01 5.16 9.01 -4.46
222 229 0 # 24 | ....>C:..25_:[.DA]A-----T[.DT]:...4_:D<.... 0.12 0.08 11.88 8.99 -4.13
223 228 0 # 25 | ....>C:..26_:[.DG]G-----C[.DC]:...3_:D<.... 0.19 0.10 12.55 9.07 -3.98
224 227 0 # 26 | ....>C:..27_:[.DC]C-----G[.DG]:...2_:D<.... 0.70 0.08 13.79 8.87 -3.45
225 226 0 # 27 | ....>C:..28_:[.DT]T-----A[.DA]:...1_:D<.... 0.32 0.23 16.73 8.85 -3.38
##### Base-pair criteria used: 4.00 0.00 15.00 2.50 65.00 4.50 7.80 [ O N]
##### 0 non-Watson-Crick base-pairs, and 2 helices (0 isolated bps)
##### Helix #1 (11): 1 - 11
##### Helix #2 (16): 12 - 27

sli:
Thank you very much! I add a line as following:
2 # duplex
28 # number of base-pairs
1 1 # explicit bp numbering/hetero atoms
198 253 0 # 1 | ....>C:...1_:[.DC]C-----G[.DG]:..28_:D<.... 0.09 0.00 12.58 9.13 -4.27
199 252 0 # 2 | ....>C:...2_:[.DA]A-----T[.DT]:..27_:D<.... 0.29 0.11 12.49 8.76 -3.87
200 251 0 # 3 | ....>C:...3_:[.DG]G-----C[.DC]:..26_:D<.... 0.38 0.17 11.55 8.98 -3.71
201 250 0 # 4 | ....>C:...4_:[.DC]C-----G[.DG]:..25_:D<.... 1.05 0.36 15.27 8.77 -2.47
202 249 0 # 5 | ....>C:...5_:[.DT]T-----A[.DA]:..24_:D<.... 0.18 0.01 12.00 8.93 -4.21
203 248 0 # 6 | ....>C:...6_:[.DC]C-----G[.DG]:..23_:D<.... 0.19 0.02 9.73 9.01 -4.29
204 247 0 # 7 | ....>C:...7_:[.DT]T-----A[.DA]:..22_:D<.... 0.28 0.04 8.95 9.03 -4.18
205 246 0 # 8 | ....>C:...8_:[.DG]G-----C[.DC]:..21_:D<.... 0.34 0.08 7.51 8.96 -4.12
206 245 0 # 9 | ....>C:...9_:[.DT]T-----A[.DA]:..20_:D<.... 0.20 0.12 8.73 8.88 -4.12
207 244 0 # 10 | ....>C:..10_:[.DA]A-----T[.DT]:..19_:D<.... 0.23 0.18 13.22 8.78 -3.75
208 243 9 # 11 x ....>C:..11_:[.DC]C-----G[.DG]:..18_:D<.... 0.35 0.30 19.14 8.92 -3.08
209 242 0 # 12 | ....>C:..12_:[.DG]G-----UF2[.DUF2]:..17_:D<....
210 241 0 # 13 | ....>C:..13_:[.DT]T-----A[.DA]:..16_:D<.... 0.29 0.11 18.66 8.94 -3.55
211 240 0 # 14 | ....>C:..14_:[.DG]G-----C[.DC]:..15_:D<.... 0.18 0.13 10.54 9.00 -4.03
212 239 0 # 15 | ....>C:..15_:[.DA]A-----T[.DT]:..14_:D<.... 0.48 0.02 7.98 8.84 -4.09
213 238 0 # 16 | ....>C:..16_:[.DG]G-----C[.DC]:..13_:D<.... 0.29 0.21 8.86 9.02 -3.85
214 237 0 # 17 | ....>C:..17_:[.DC]C-----G[.DG]:..12_:D<.... 0.39 0.09 11.91 8.95 -3.84
215 236 0 # 18 | ....>C:..18_:[.DG]G-----C[.DC]:..11_:D<.... 0.46 0.11 16.51 8.89 -3.50
216 235 0 # 19 | ....>C:..19_:[.DA]A-----T[.DT]:..10_:D<.... 0.23 0.20 15.04 8.89 -3.62
217 234 0 # 20 | ....>C:..20_:[.DT]T-----A[.DA]:...9_:D<.... 0.12 0.02 13.95 8.86 -4.15
218 233 0 # 21 | ....>C:..21_:[.DG]G-----C[.DC]:...8_:D<.... 0.27 0.03 4.95 8.88 -4.43
219 232 0 # 22 | ....>C:..22_:[.DG]G-----C[.DC]:...7_:D<.... 0.36 0.19 14.07 8.97 -3.55
220 231 0 # 23 | ....>C:..23_:[.DA]A-----T[.DT]:...6_:D<.... 0.11 0.03 11.03 9.00 -4.27
221 230 0 # 24 | ....>C:..24_:[.DC]C-----G[.DG]:...5_:D<.... 0.26 0.01 5.16 9.01 -4.46
222 229 0 # 25 | ....>C:..25_:[.DA]A-----T[.DT]:...4_:D<.... 0.12 0.08 11.88 8.99 -4.13
223 228 0 # 26 | ....>C:..26_:[.DG]G-----C[.DC]:...3_:D<.... 0.19 0.10 12.55 9.07 -3.98
224 227 0 # 27 | ....>C:..27_:[.DC]C-----G[.DG]:...2_:D<.... 0.70 0.08 13.79 8.87 -3.45
225 226 0 # 28 | ....>C:..28_:[.DT]T-----A[.DA]:...1_:D<.... 0.32 0.23 16.73 8.85 -3.38
##### Base-pair criteria used: 4.00 0.00 15.00 2.50 65.00 4.50 7.80 [ O N]
##### 0 non-Watson-Crick base-pairs, and 2 helices (0 isolated bps)
##### Helix #1 (11): 1 - 11
##### Helix #2 (16): 12 - 27

I found that on the line I added, the result of the output file（be_step.par） was the same whether I add the number(like as 0.30 0.35 18.14 9.92 -3.33 ) or not. and then I use the modified file（my.bps） to "analyze" and finally "rebuild" DNA structure.In addition, I align the original DNA structure and the rebuild DNA structure by Pymol（pair_fit） ,they are not completely same. So I am not sure that my rebuild structure is right. Because I can not well understand means of your words:"Nevertheless, they can be used to rigorously "rebuild" the original base geometry". Is it said that the rebuild structure should completely as the original one? The aligned result is shown in the attachment.

Best wishes!

xiangjun:
Hi,

Thanks for your followup. I sort of understand your confusions in general. Your case could serve as an excellent example to clarify some subtle points in 3DNA. To proceed, could you please provide details what you did, especially how you generate the attached image?

Best regards,

Xiang-Jun

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University