Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: How to handle modified (uncommon) bases?  (Read 43239 times)

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1652
    • View Profile
    • 3DNA homepage
How to handle modified (uncommon) bases?
« on: March 20, 2012, 09:40:19 pm »
In 3DNA, modified bases are mapped to their standard counterparts, e.g. 5‐iodouracil (5IU) to uracil (U) and 1‐methyladenine (1MA) to adenine (A), and are designated with lower case letters (as u and a respectively for the examples cited above). Technically, the mapping is stored in file $X3DNA/config/baselist.dat, and looks like this:
Code: [Select]
  A     A
 DA     A
ADE     A
....
5IU     u      # I connected to C5
....
1MA     a      # C connected to N1

Each mapped one-letter base (X = A/C/G/T/U for the standard nucleotides and x = a/c/g/t/u for the modified ones) has a corresponding Atomic_X.pdb (or Atomic.x.pdb) file oriented in the standard base reference frame. By default, the two sets (X and x) are identical, i.e., Atomic_A.pdb has the same content as Atomic.a.pdb. The mapping information is used in a ls-fitting procedure to define the base reference frame for each nucleotide in a PDB file, and allows for easy analysis of unusual DNA and RNA structures.

As of v2.1, when encountering a new modified base, 3DNA will automatically perform the mapping, and outputs the following message (using a contrived example):
Code: [Select]
Match '2MG' to 'g' for residue 2MG   10  on chain A [#1]
    check it & consider to add line '2MG     g' to file <baselist.dat>

Simply adding a line containing 2MG     g to file baselist.dat and the above info message will be gone. This is a contrived example because I deliberately deleted that line from baselist.dat for this illustration.

I implemented this auto-mapping as an experimental feature at least back in v1.5, but did not document it for public use. My experience over the years has shown that the auto-mapping is functioning as designed. Now with this feature set by default, processing of large datasets can be fully automated. Moreover, using find_pair, it is easy to get a complete list of modified bases in a dataset, e.g., in all the NDB entires.
« Last Edit: March 21, 2012, 01:04:20 pm by xiangjun »

 

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University