Netiquette · Download · News · Gallery · Homepage · DSSR · Web-DSSR · DSSR Manual · Reproduce DSSR · DSSR-Jmol · DSSR-PyMOL · Web-SNAP

Author Topic: How to fix missing (superfluous) base pairs identified by find_pair?  (Read 2391 times)

Offline xiangjun

  • Administrator
  • regular
  • *****
  • Posts: 1215
    • View Profile
    • 3DNA homepage
Structural analysis of nucleic acid used to be a rather tedious process, especially for irregular, complicated RNA structures and nucleic acid-protein complexes (e.g., the large ribosomal subunit 1jj2/rr0033). Without valid base-pairing information as input, the various analysis software will produce meaningless results. The program find_pair was originally created to solve this specific problem, by generating input file to 3DNA analysis routines (analyze/cehs) directly from a PDB file.

In its core, find_pair uses a pure geometric approach to identify all possible pairs (Watson-Cricks or non-canonical pairs actually exist in a structure), their H-bonding patterns and helix context. Specifically, the major criteria used are as follows:
  • The distance between the origins of the two bases (as defined by their standard reference frames) must be less than certain limit (15.0 Å by default) - otherwise, they would be too far away to be called a pair.
  • The vertical separation (i.e., stagger) between the two base planes must be less than certain limit (2.5 Å by default) - otherwise, they would be stacking instead of pairing.
  • The angle between the two base z-axes (i.e., their normal vectors) is less than a cut-off (65.0° by default).
  • There is at least one pair of nitrogen/oxygen base atoms that are within a H-bonding cut off distance (4.0 Å by default).
If two bases fulfill these geometric requirements, they are defined to be a pair, without taking consideration of their chemical constituents. Thus our method allows for identification of unconventional pairs as easily as the canonical ones. The program then checks for possible H-bonding patterns, whether the normal donor-acceptor (noted by '-' as in O6 - N4 for a G·C pair) or the unusual donor-donor, acceptor-acceptor (noted by '*' as in O2 * N3 for a C·C pair in urx057). The non-canonical pairs, especially those with unusual H-bonding patterns, should be checked more carefully - they could be due to errors in structure determination, or they could have some special meaning/significance unnoticed previously.

The default criteria mentioned above are based on a survey of the NDB structures. Generally speaking, they are pretty generous and work quite well in the most common cases we've encountered. However, we are aware of the possibilities of special cases where some of them might be too restrict or too generous, thus leading to find_pair to miss or produce superfluous base pairs. The default settings are stored in a text file named misc_3dna.par under the directory $X3DNA/config/ where users can modify as they see fit. Changes in that directory will have a global effect - wherever you run find_pair on your system, the modified values will be used. Alternately, users could make a copy of misc_3dna.par to their current working directory and change it over there for local effect. Note that the local setting has precedence over the global one.

As an example, find_pair will miss the 127th base-pair I:..53_:[.DT]T-----A[.DA]:.-53_:J in structure 1kx5/pd0287 in its default settings. This is because the H-bonding distance between T:N3 - A:N1 is 4.20 Å and that for T:O4 - A:N6 is 4.85 Å; both of them are larger than the default 4.0 Å cut off. Increasing the H-bonding criterion in file misc_3dna.par from 4.0 Å to 5.0 Å will solve this problem. Please note that in 3DNA, users can start directly from an uncompressed PDB file, without having to extract the DNA fragment first:
  • find_pair 1kx5.pdb 1kx5.inp to get input file for analyze
  • analyze 1kx5.inp to get detailed structural parameters in file 1kx5.out
  • The above two steps can be combined into one: find_pair 1kx5.pdb stdout | analyze stdin
In addition to (or instead of) manipulating parameters in misc_3dna.par, oftentimes it may be preferable to manually edit find_pair-generated base-piar files before feeding them into analyze/cehs. This allows for maximum flexibility as to which pair to consider in calculating 3DNA structural parameters.

Also worth noting is the -p option of find_pair: without this option, find_pair locates base pairs in double-helical regions; thus the Watson-Crick pairs take precedence over the Wobble and other non-canonical pairs. With the -p, then all pairs and higher order base associations (i.e., triplets and above) are detected.

« Last Edit: June 21, 2017, 04:09:37 pm by xiangjun »
Dr. Xiang-Jun Lu [律祥俊]


Created and maintained by Dr. Xiang-Jun Lu[律祥俊]· Supported by the NIH grant R01GM096889 · Dr. Lu is currently a member of the Bussemaker Laboratory at the Department of Biological Sciences, Columbia University. The project is in collabration with the Olson Laborarory at Rutgers where 3DNA got started.