Print Page - Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper

Questions and answers => General discussions (Q&As) => Topic started by: ilibarra on August 31, 2012, 05:05:06 pm

Netiquette · Download · News · Gallery · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · Video Overview · DSSR v2.6.0 (DSSR Manual) · Homepage

Title: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper
Post by: ilibarra on August 31, 2012, 05:05:06 pm

Quote from: xiangjun on April 01, 2012, 10:31:26 am

The scheme of classifying a dinucleotide step into A-, B- or TA-DNA form is described in the 2003 NAR paper (http://nar.oxfordjournals.org/content/31/17/5108.full). More specifically, it is based on Zp and Zp(h); see Figure 5(c) linked below. For example, if Zp > 1.5 Å, then it is taken as A-DNA.

(http://nar.oxfordjournals.org/content/31/17/5108/F5.large.jpg)

Per your request, listed below is the exact definition for A-, B- and TA-DNA, as excerpted from 3DNA source code. Note the "sanity check" at the beginning; the empirical criteria try to ensure a right-handed duplex consisting of Watson-Crick bps and with reasonable geometry. Also bear in mind that the classification is intended to be indicative rather than conclusive.

Code: [Select]
if (dval_in_range(mtwist, 10.0, 60.0) /* over-all twist average */ && WC_info[i] && WC_info[i + 1] /* WC geometry */ && dval_in_range(twist_rise[i][1], 10.0, 60.0) /* right-handed */ && dval_in_range(twist_rise[i][2], 2.5, 5.5) /* Rise in range */ && dval_in_range(aveS[i][1], -5.0, -0.5) /* Xp */ && dval_in_range(aveS[i][2], 7.5, 10.0) /* Yp */ && dval_in_range(aveS[i][3], -2.0, 3.5) /* Zp */ && dval_in_range(aveH[i][1], -11.5, 2.5) /* XpH */ && dval_in_range(aveH[i][2], 1.5, 10.0) /* YpH */ && dval_in_range(aveH[i][3], -3.0, 9.0)) { /* ZpH */ if (aveS[i][3] >= 1.5) /* A-form */ strABT[i] = 1; else if (aveH[i][3] >= 4.0) /* TA-form */ strABT[i] = 3; else if (aveS[i][3] <= 0.5 && aveH[i][1] < 0.5) /* B-form */ strABT[i] = 2; /* aveS[i][3] < 0.5 for C-DNA #47 */ }
HTH,

Xiang-Jun

I'd like to ask about the DNA set used for the analysis that is presented in Fig 5. in the NAR 2003 paper. Are those structures previously classified as A, B and TA DNA by other means (?) before doing the Zp and Zp(h) calculations to confirm their differences? Where can I look for the structures which were used? (I guess it is somewhere in reference 81, Patikoglou,G.A. et al (1999))

Thanks for the comments

Ignacio

Title: Re: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper
Post by: xiangjun on August 31, 2012, 05:38:14 pm

Quote

I'd like to ask about the DNA set used for the analysis that is presented in Fig 5. in the NAR 2003 paper. Are those structures previously classified as A, B and TA DNA by other means (?) before doing the Zp and Zp(h) calculations to confirm their differences? Where can I look for the structures which were used? (I guess it is somewhere in reference 81, Patikoglou,G.A. et al (1999))

Thanks for asking about the DNA datasets used in Figure 5 of the 3DNA 2003 NAR paper (http://nar.oxfordjournals.org/content/31/17/5108.full). Yes, the structures are previously assigned as A-, B- and TA-DNA by other means before we introduced Zp and Zp(h) to classify the three types of dinucleotide steps automatically. A- and B-DNA are based on conventional parameters (Slide/Roll, sugar puckers etc), as in the NDB, and the TA-DNA is mainly inspired by the work of Guzikevich‐Guerstein and Shakked (ref. 80 (http://www.ncbi.nlm.nih.gov/pubmed/8548452)):

Quote from: 3DNA 2003 NAR paper

A detailed structural analysis of two early examples of the TATA‐box DNA bound to the TATA‐box binding protein (TBP) (10,79) led Guzikevich‐Guerstein and Shakked (80) to propose that the 8 bp TATA‐box adopts a novel TA‐DNA conformation, different from either A or B DNA. The structures of many more such complexes have since been determined (81) and, as shown in Table 2 and Figure 5, all TATA‐box regions share similar conformational features.

So the complete list was not taken directly from somewhere in ref. 81 (http://www.ncbi.nlm.nih.gov/pubmed/10617571), but compiled specifically for the work. The actual structure list used in producing Figure 5 for the TA-DNA steps can be found in the thread "DNA standards/statistics using 3DNA (http://forum.x3dna.org/general-discussions/dna-standardsstatistics-using-3dna)", dated August 2006. For A-DNA and B-DNA structures used in Figure 5 of the 2003 paper, I need to locate my original record from (nearly) a decade ago -- I will write a post about my findings on the 3DNA homepage, possibly by next week.

HTH,

Xiang-Jun

Title: Re: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper
Post by: ilibarra on September 03, 2012, 10:44:45 pm

Thank you very much for the fast answers to the questions of this forum :D

Title: Re: Datasets and scripts for reproducing Figure 5 of the 3DNA NAR03 paper
Post by: xiangjun on September 07, 2012, 02:15:17 pm

Thanks for your patience -- it took me quite some time to dig into my files used for the 2003 3DNA paper! Luckily, I got them, and the time has been well-worth spent :D.

Here are the details -- the whole datasets and scripts can be downloaded by following the link: 3DNA-NAR03-Fig5.tar.gz (http://x3dna.bio.columbia.edu/data/3DNA-NAR03-Fig5.tar.gz). Figure 5(a)-(c) generated with the scripts and data files are attached.

Content of the README file:

This folder (3DNA-NAR03-Fig5) contains all the data files and scripts
to reproduce Figure 5 of the 2003 3DNA paper in Nucleic Acids Research
(NAR03). The contents are taken from the original materials I used to
create Figure 5 of NAR03, with slight editing. Specifically, I revised
the Matlab scripts to work in GNU Octave v3.2.4 for verification.

If you have any questions or comments, please do post them on the 3DNA
Forum.

2012-09-06 -- Xiang-Jun Lu (http://x3dna.org)

========================================================================

Data selections:
    'note-AB-datasets' -- datasets of selected A- and B-DNA structures
    'note-TA-dataset'  -- dataset of selected TA-DNA structures

Data files:
    'A-heli-pars.dat' -- six helical parameters 
    'A-step-pars.dat' -- six step parameters
    'A-zp-zph.dat'    -- Zp and ZpH parameters
        Selected parameters of the A-DNA dataset. Note that the order
        the parameters is as in .out file from running 'analyze'

    'B-heli-pars.dat',  'B-step-pars.dat',  'B-zp-zph.dat' for B-DNA
    'TA-heli-pars.dat', 'TA-step-pars.dat', 'TA-zp-zph.dat' for TA-DNA

Scripts:
    'incl_xdsp.m' -- script to generate Figure 5(a), Inclination vs Tip
                  'incl_xdsp.png' -- output file from running the script
    'roll_slide.m' -- script to generate Figure 5(b), Roll vs Slide
                  'roll_slide.png' -- output file from running the script
    'zph_zp.m' -- script to generate Figure 5(c), Zp(h) vs Zp
                  'zph_zp.png' -- output file from running the script
    'draw_ellipse.m', 'get_pars.m', 'open_file.m' -- supporting scripts

Content of file note-AB-datasets

Selection Criteria:
   NDB ID: ad OR bd
   Classification: DNA
   Structure Description: Double Helix
   Conformation Type: A OR B
   No Drug, No Mismatch
   No Modifiers (Base/Sugar/Phosphate)
   Resolution better than 2.0 A
   =======================
   34 A-DNA and 27 B-DNA

For B-DNA, delete bd0012, bd0013 & bdf068 (following HMB)
   bd0001 bd0006_A
   bd0014: coordinates from PDB 463D
   bd0005 bd0016_A (with repeated atoms!)
   bd0018 bd0019 bdj017 bdj019 bdj025 bdj031 bdj036 bdj037 bdj051
   bdj052 bdj060 bdj061 
   bdj081 (Uses helix #1 with strands A and B. The other two are
           disordered)
   bdl001 bdl005 bdl020 bdl084
   bd0023_A  bd0029
   -------------------------- 27-3=24 structures

For A-DNA
   ad0002 ==> (ad0002_AB + ad0002_CD)
   ad0003 ad0004 adh008 adh010 adh0102 adh0103 adh0104 adh0105
   adh014 adh026 adh027 adh029 adh033 adh034 adh038 adh039 adh047
   adh070 adh078 adj0102 adj0103 adj0112 adj0113 adj022 adj049
   adj050 adj051 adj065 adj066 adj067 adj075 
   adl025 (suspicious! big Buckle, alternating Propeller)
   adl047 (with B-steps, not good either!)
   -------------------------- 34+1-2=33 structures

Outliers:
  A-DNA: ad0002_CD, steps 3-4,   bps 3-4-5
         ad0004,    steps 3-4-5, bps 3-4-5-6
  B-DNA: bdj025,    step 3,      bps 3-4
         bdj031,    step 3,      bps 3-4
         bdj037,    step 3,      bps 3-4

Content of file note-TA-dataset

pd0070, pd0112, pd0154, pd0155, pd0156 pd0157, pd0158, pd0159, pd0160,
pd0161, pd0162, pd0163, pd0164, pdr031 pdt009, pdt012, pdt024, pdt025,
pdt032, pdt034, pdt036

This directory contains TATA box segments. It is normally 8-bp long, and
has the sequence: T-A-T-A-@-A-@-N. There are two kinks at the terminal
steps.

* means non-WC base-pair which is eliminated from further analysis

NDB ID  ##     Sequence      Res(A)  R-fac(%) chainID and residue range
--------------------------------------------------------------------
pd0070  01  T-T-T-A-A-A-T-A   2.4     20.0   C 1410 1417 D 1432 1439
                           
pd0112  02  T-A-T-A-A-A-A-G   2.65    23.1   K 8 15 L 105 112
        03  T-A-T-A-A-A-A-G                  C 8 15 D 105 112
        04  T-A-T-A-A-A-A-G                  G 8 15 H 105 112
        05  T-A-T-A-A-A-A-G                  O 8 15 P 105 112
        06  T-A-T-A-A-A-A-G                  S 8 15 T 105 112
                                      
pd0154  07  T-A-T-A-A-A-A-T   1.86    21.0   C 203 210 D 219 226
        08  T-A-T-A-A-A-A-T                  E 203 210 F 219 226
                                   
pd0155  09  T-A-T-A-A-G-A-G*  1.93    19.6   C 203 209 D 220 226
        10  T-A-T-A-A-G-A-G*                 E 203 209 F 220 226
   
pd0156  11  T-A-T-A-A-T-A-G*  2.1     19.3   C 203 209 D 220 226
        12  T-A-T-A-A-T-A-G*                 E 203 209 F 220 226
                                   
pd0157  13  T-A-T-A-T-A-A-G*  2.3     19.4   C 203 209 D 220 226
        14  T-A-T-A-T-A-A-G*                 E 203 209 F 220 226
                                   
pd0158  15  T-A-T-T-A-A-A-G*  2.1     19.4   C 203 209 D 220 226
        16  T-A-T-T-A-A-A-G*                 E 203 209 F 220 226
                                   
pd0159  17  T-A-C-A-A-A-A-G*  1.9     20.9   C 203 209 D 220 226
        18  T-A-C-A-A-A-A-G*                 E 203 209 F 220 226
   
pd0160  19  T-T-T-A-A-A-A-G*  1.8     19.3   C 203 209 D 220 226
        20  T-T-T-A-A-A-A-G*                 E 203 209 F 220 226
                                      
pd0161  21  T-A-T-A-A-A-T-G*  2.23    19.1   C 203 209 D 220 226
        22  T-A-T-A-A-A-T-G*                 E 203 209 F 220 226
                                      
pd0162  23  A-A-T-A-A-A-A-G*  2.3     18.2   C 203 209 D 220 226
        24  A-A-T-A-A-A-A-G*                 E 203 209 F 220 226
                                      
pd0163  25  T-A-T-A-A-A-A-G   1.9     19.7   C 203 210 D 219 226
        26  T-A-T-A-A-A-A-G                  E 203 210 F 219 226
                                      
pd0164  27  T-A-T-A-A-A-C*G*  1.95    19.9   C 203 208 D 221 226
        28  T-A-T-A-A-A-C*G*                 E 203 208 F 221 226
                                      
pdr031  29  T-T-T-t-t-A-A-A   2.1     21.2   C 1408 1415 E 1420 1427
                                      
pdt009  30  T-A-T-A-A-A-A-G   2.25    20.2   A 203 210 B 305 312
        31  T-A-T-A-A-A-A-G                  C 403 410 D 505 512
                                      
pdt012  32  T-A-T-A-T-A-A-A   1.8     20.1   C 2 9 C 21 28
        33  T-A-T-A-T-A-A-A                  D 2 9 D 21 28
                                      
pdt024  34  T-A-T-A-T-A-T-A   2.9     21.4   B 103 110 C 115 122
                                      
pdt025  35  T-A-T-A-A-A-A-G   1.9     19.4   C 203 210 D 219 226
        36  T-A-T-A-A-A-A-G                  E 303 310 F 319 326
                                      
pdt032  37  T-A-T-A-A-A-A-G   2.7     21.5   C 4 11 D 106 113
                                      
pdt034  38  T-A-T-A-A-A-A-G   1.9     18.9   B 5 12 C 105 112
                                      
pdt036  39  T-A-T-A-A-A-A-C   2.5     23.5   E 9 16 F 1 8

HTH,

Xiang-Jun

PS. As a matter of fact, the A- and B-DNA datasets are those used in Table 3 of the report on standard base reference (http://ndbserver.rutgers.edu/standards/standard_reference.html).

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University

3DNA Forum

Questions and answers => General discussions (Q&As) => Topic started by: ilibarra on August 31, 2012, 05:05:06 pm