Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper  (Read 21657 times)

Offline ilibarra

  • with-posts
  • *
  • Posts: 11
    • View Profile
The scheme of classifying a dinucleotide step into A-, B- or TA-DNA form is described in the 2003 NAR paper. More specifically, it is based on Zp and Zp(h); see Figure 5(c) linked below. For example, if Zp > 1.5 Å, then it is taken as A-DNA.



Per your request, listed below is the exact definition for A-, B- and TA-DNA, as excerpted from 3DNA source code. Note the "sanity check" at the beginning; the empirical criteria try to ensure a right-handed duplex consisting of Watson-Crick bps and with reasonable geometry. Also bear in mind that the classification is intended to be indicative rather than conclusive.

Code: [Select]
if (dval_in_range(mtwist, 10.0, 60.0)  /* over-all twist average */
    && WC_info[i] && WC_info[i + 1]  /* WC geometry */
    && dval_in_range(twist_rise[i][1], 10.0, 60.0)  /* right-handed */
    && dval_in_range(twist_rise[i][2], 2.5, 5.5)  /* Rise in range */
    && dval_in_range(aveS[i][1], -5.0, -0.5)  /* Xp */
    && dval_in_range(aveS[i][2], 7.5, 10.0)  /* Yp */
    && dval_in_range(aveS[i][3], -2.0, 3.5)  /* Zp */
    && dval_in_range(aveH[i][1], -11.5, 2.5)  /* XpH */
    && dval_in_range(aveH[i][2], 1.5, 10.0)  /* YpH */
    && dval_in_range(aveH[i][3], -3.0, 9.0)) {  /* ZpH */
    if (aveS[i][3] >= 1.5)  /* A-form */
        strABT[i] = 1;
    else if (aveH[i][3] >= 4.0)  /* TA-form */
        strABT[i] = 3;
    else if (aveS[i][3] <= 0.5 && aveH[i][1] < 0.5)  /* B-form */
        strABT[i] = 2;  /* aveS[i][3] < 0.5 for C-DNA #47 */
}

HTH,

Xiang-Jun

I'd like to ask about the DNA set used for the analysis that is presented in Fig 5. in the NAR 2003 paper. Are those structures previously classified as A, B and TA DNA by other means (?) before doing the Zp and Zp(h) calculations to confirm their differences? Where can I look for the structures which were used? (I guess it is somewhere in reference 81, Patikoglou,G.A. et al (1999))

Thanks for the comments

Ignacio
« Last Edit: August 31, 2012, 05:08:33 pm by ilibarra »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper
« Reply #1 on: August 31, 2012, 05:38:14 pm »
Quote
I'd like to ask about the DNA set used for the analysis that is presented in Fig 5. in the NAR 2003 paper. Are those structures previously classified as A, B and TA DNA by other means (?) before doing the Zp and Zp(h) calculations to confirm their differences? Where can I look for the structures which were used? (I guess it is somewhere in reference 81, Patikoglou,G.A. et al (1999))

Thanks for asking about the DNA datasets used in Figure 5 of the 3DNA 2003 NAR paper. Yes, the structures are previously assigned as A-, B- and TA-DNA by other means before we introduced Zp and Zp(h) to classify the three types of dinucleotide steps automatically. A- and B-DNA are based on conventional parameters (Slide/Roll, sugar puckers etc), as in the NDB, and the TA-DNA is mainly inspired by the work of Guzikevich‐Guerstein and Shakked (ref. 80):

Quote from: 3DNA 2003 NAR paper
A detailed structural analysis of two early examples of the TATA‐box DNA bound to the TATA‐box binding protein (TBP) (10,79) led Guzikevich‐Guerstein and Shakked (80) to propose that the 8 bp TATA‐box adopts a novel TA‐DNA conformation, different from either A or B DNA. The structures of many more such complexes have since been determined (81) and, as shown in Table 2 and Figure 5, all TATA‐box regions share similar conformational features.

So the complete list was not taken directly from somewhere in ref. 81, but compiled specifically for the work. The actual structure list used in producing Figure 5 for the TA-DNA steps can be found in the thread "DNA standards/statistics using 3DNA", dated August 2006. For A-DNA and B-DNA structures used in Figure 5 of the 2003 paper, I need to locate my original record from (nearly) a decade ago -- I will write a post about my findings on the 3DNA homepage, possibly by next week.

HTH,

Xiang-Jun
« Last Edit: September 03, 2012, 10:55:11 pm by xiangjun »

Offline ilibarra

  • with-posts
  • *
  • Posts: 11
    • View Profile
Re: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper
« Reply #2 on: September 03, 2012, 10:44:45 pm »
Thank you very much for the fast answers to the questions of this forum :D

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper
« Reply #3 on: September 07, 2012, 02:15:17 pm »
Thanks for your patience -- it took me quite some time to dig into my files used for the 2003 3DNA paper! Luckily, I got them, and the time has been well-worth spent :D.

Here are the details -- the whole datasets and scripts can be downloaded by following the link: 3DNA-NAR03-Fig5.tar.gz. Figure 5(a)-(c) generated with the scripts and data files are attached.

  • Content of the README file:
    This folder (3DNA-NAR03-Fig5) contains all the data files and scripts
    to reproduce Figure 5 of the 2003 3DNA paper in Nucleic Acids Research
    (NAR03). The contents are taken from the original materials I used to
    create Figure 5 of NAR03, with slight editing. Specifically, I revised
    the Matlab scripts to work in GNU Octave v3.2.4 for verification.

    If you have any questions or comments, please do post them on the 3DNA
    Forum.

    2012-09-06 -- Xiang-Jun Lu (http://x3dna.org)

    ========================================================================

    Data selections:
        'note-AB-datasets' -- datasets of selected A- and B-DNA structures
        'note-TA-dataset'  -- dataset of selected TA-DNA structures

    Data files:
        'A-heli-pars.dat' -- six helical parameters
        'A-step-pars.dat' -- six step parameters
        'A-zp-zph.dat'    -- Zp and ZpH parameters
            Selected parameters of the A-DNA dataset. Note that the order
            the parameters is as in .out file from running 'analyze'

        'B-heli-pars.dat',  'B-step-pars.dat',  'B-zp-zph.dat' for B-DNA
        'TA-heli-pars.dat', 'TA-step-pars.dat', 'TA-zp-zph.dat' for TA-DNA

    Scripts:
        'incl_xdsp.m' -- script to generate Figure 5(a), Inclination vs Tip
                      'incl_xdsp.png' -- output file from running the script
        'roll_slide.m' -- script to generate Figure 5(b), Roll vs Slide
                      'roll_slide.png' -- output file from running the script
        'zph_zp.m' -- script to generate Figure 5(c), Zp(h) vs Zp
                      'zph_zp.png' -- output file from running the script
        'draw_ellipse.m', 'get_pars.m', 'open_file.m' -- supporting scripts
  • Content of file note-AB-datasets
    Selection Criteria:
       NDB ID: ad OR bd
       Classification: DNA
       Structure Description: Double Helix
       Conformation Type: A OR B
       No Drug, No Mismatch
       No Modifiers (Base/Sugar/Phosphate)
       Resolution better than 2.0 A
       =======================
       34 A-DNA and 27 B-DNA

    For B-DNA, delete bd0012, bd0013 & bdf068 (following HMB)
       bd0001 bd0006_A
       bd0014: coordinates from PDB 463D
       bd0005 bd0016_A (with repeated atoms!)
       bd0018 bd0019 bdj017 bdj019 bdj025 bdj031 bdj036 bdj037 bdj051
       bdj052 bdj060 bdj061
       bdj081 (Uses helix #1 with strands A and B. The other two are
               disordered)
       bdl001 bdl005 bdl020 bdl084
       bd0023_A  bd0029
       -------------------------- 27-3=24 structures

    For A-DNA
       ad0002 ==> (ad0002_AB + ad0002_CD)
       ad0003 ad0004 adh008 adh010 adh0102 adh0103 adh0104 adh0105
       adh014 adh026 adh027 adh029 adh033 adh034 adh038 adh039 adh047
       adh070 adh078 adj0102 adj0103 adj0112 adj0113 adj022 adj049
       adj050 adj051 adj065 adj066 adj067 adj075
       adl025 (suspicious! big Buckle, alternating Propeller)
       adl047 (with B-steps, not good either!)
       -------------------------- 34+1-2=33 structures

    Outliers:
      A-DNA: ad0002_CD, steps 3-4,   bps 3-4-5
             ad0004,    steps 3-4-5, bps 3-4-5-6
      B-DNA: bdj025,    step 3,      bps 3-4
             bdj031,    step 3,      bps 3-4
             bdj037,    step 3,      bps 3-4
  • Content of file note-TA-dataset
    pd0070, pd0112, pd0154, pd0155, pd0156 pd0157, pd0158, pd0159, pd0160,
    pd0161, pd0162, pd0163, pd0164, pdr031 pdt009, pdt012, pdt024, pdt025,
    pdt032, pdt034, pdt036

    This directory contains TATA box segments. It is normally 8-bp long, and
    has the sequence: T-A-T-A-@-A-@-N. There are two kinks at the terminal
    steps.

    * means non-WC base-pair which is eliminated from further analysis

    NDB ID  ##     Sequence      Res(A)  R-fac(%) chainID and residue range
    --------------------------------------------------------------------
    pd0070  01  T-T-T-A-A-A-T-A   2.4     20.0   C 1410 1417 D 1432 1439
                               
    pd0112  02  T-A-T-A-A-A-A-G   2.65    23.1   K 8 15 L 105 112
            03  T-A-T-A-A-A-A-G                  C 8 15 D 105 112
            04  T-A-T-A-A-A-A-G                  G 8 15 H 105 112
            05  T-A-T-A-A-A-A-G                  O 8 15 P 105 112
            06  T-A-T-A-A-A-A-G                  S 8 15 T 105 112
                                         
    pd0154  07  T-A-T-A-A-A-A-T   1.86    21.0   C 203 210 D 219 226
            08  T-A-T-A-A-A-A-T                  E 203 210 F 219 226
                                       
    pd0155  09  T-A-T-A-A-G-A-G*  1.93    19.6   C 203 209 D 220 226
            10  T-A-T-A-A-G-A-G*                 E 203 209 F 220 226
       
    pd0156  11  T-A-T-A-A-T-A-G*  2.1     19.3   C 203 209 D 220 226
            12  T-A-T-A-A-T-A-G*                 E 203 209 F 220 226
                                       
    pd0157  13  T-A-T-A-T-A-A-G*  2.3     19.4   C 203 209 D 220 226
            14  T-A-T-A-T-A-A-G*                 E 203 209 F 220 226
                                       
    pd0158  15  T-A-T-T-A-A-A-G*  2.1     19.4   C 203 209 D 220 226
            16  T-A-T-T-A-A-A-G*                 E 203 209 F 220 226
                                       
    pd0159  17  T-A-C-A-A-A-A-G*  1.9     20.9   C 203 209 D 220 226
            18  T-A-C-A-A-A-A-G*                 E 203 209 F 220 226
       
    pd0160  19  T-T-T-A-A-A-A-G*  1.8     19.3   C 203 209 D 220 226
            20  T-T-T-A-A-A-A-G*                 E 203 209 F 220 226
                                         
    pd0161  21  T-A-T-A-A-A-T-G*  2.23    19.1   C 203 209 D 220 226
            22  T-A-T-A-A-A-T-G*                 E 203 209 F 220 226
                                         
    pd0162  23  A-A-T-A-A-A-A-G*  2.3     18.2   C 203 209 D 220 226
            24  A-A-T-A-A-A-A-G*                 E 203 209 F 220 226
                                         
    pd0163  25  T-A-T-A-A-A-A-G   1.9     19.7   C 203 210 D 219 226
            26  T-A-T-A-A-A-A-G                  E 203 210 F 219 226
                                         
    pd0164  27  T-A-T-A-A-A-C*G*  1.95    19.9   C 203 208 D 221 226
            28  T-A-T-A-A-A-C*G*                 E 203 208 F 221 226
                                         
    pdr031  29  T-T-T-t-t-A-A-A   2.1     21.2   C 1408 1415 E 1420 1427
                                         
    pdt009  30  T-A-T-A-A-A-A-G   2.25    20.2   A 203 210 B 305 312
            31  T-A-T-A-A-A-A-G                  C 403 410 D 505 512
                                         
    pdt012  32  T-A-T-A-T-A-A-A   1.8     20.1   C 2 9 C 21 28
            33  T-A-T-A-T-A-A-A                  D 2 9 D 21 28
                                         
    pdt024  34  T-A-T-A-T-A-T-A   2.9     21.4   B 103 110 C 115 122
                                         
    pdt025  35  T-A-T-A-A-A-A-G   1.9     19.4   C 203 210 D 219 226
            36  T-A-T-A-A-A-A-G                  E 303 310 F 319 326
                                         
    pdt032  37  T-A-T-A-A-A-A-G   2.7     21.5   C 4 11 D 106 113
                                         
    pdt034  38  T-A-T-A-A-A-A-G   1.9     18.9   B 5 12 C 105 112
                                         
    pdt036  39  T-A-T-A-A-A-A-C   2.5     23.5   E 9 16 F 1 8

HTH,

Xiang-Jun


PS. As a matter of fact, the A- and B-DNA datasets are those used in Table 3 of the report on standard base reference.
« Last Edit: September 18, 2012, 12:10:03 pm by xiangjun »

 

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University