Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: analyzing longer DNA sequences  (Read 31224 times)

Offline shynomat

  • with-posts
  • *
  • Posts: 16
    • View Profile
analyzing longer DNA sequences
« on: November 20, 2012, 02:01:37 pm »
Hi all,

I am having difficulties analyzing a 62 base pair long DNA.I am using find_pair along with Curves+.
The 3DNA version I use -x3dna-v2.1beta
What is the default number of base pairs find_pair can analyze? Is there a way to increase this value?

Thanks
Shyno

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: analyzing longer DNA sequences
« Reply #1 on: November 20, 2012, 02:22:59 pm »
Hi Shyno,

DNAs with 62 base pairs should not be a problem for 3DNA to analyze; specifically, find_pair and other 3DNA components have no default bp limit other than your computer's memory.

As always, please be specific by providing a reproducible example; that will help solve your problem.

Xiang-Jun

Offline shynomat

  • with-posts
  • *
  • Posts: 16
    • View Profile
Re: analyzing longer DNA sequences
« Reply #2 on: November 20, 2012, 03:13:06 pm »
Hi Xian- Jung,

thanks for your quick response.
Please see the attached files- this is for 27BP long double stranded DNA
Folllowing are the commands I use:
find_pair -c+ sel. pdb curves.inp
~/Curves/cur+ < curves.inp
I am attaching sel.pdb, curves.inp and the output of curves+ - sel.lis
for Curves, the maximum number of nucleotides we can have is '1000' and mine is well below.
So I thought the issue might be with find_pair

thanks
Shyno

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: analyzing longer DNA sequences
« Reply #3 on: November 20, 2012, 04:29:58 pm »
Hi Shyno,

Thanks for providing details of the commands you used and attaching three relevant files. However, I fail to see what're wrong here; things are working as expected from my understanding.

With the -c+ option, you get what's desired, as in your attached curves.inp file. Running find_pair in its default settings on your PDB file sel.pdb gives expected results:
find_pair sel.pdb stdout
sel.pdb
sel.out
    2         # duplex
   23         # number of base-pairs
    1    1    # explicit bp numbering/hetero atoms
    2   47  0 #    1 | ....>A:...2_:[..G]G-----C[..C]:..48_:B<....  0.77  0.76 28.01  8.85 -1.31
    3   46  0 #    2 | ....>A:...3_:[..T]T-----A[..A]:..47_:B<....  0.32  0.02 11.50  9.26 -4.07
    4   45  0 #    3 | ....>A:...4_:[..G]G-----C[..C]:..46_:B<....  0.83  0.64  4.42  9.11 -2.68
    5   44  0 #    4 | ....>A:...5_:[..T]T-----A[..A]:..45_:B<....  0.54  0.32 21.12  8.91 -2.76
    6   43  0 #    5 | ....>A:...6_:[..G]G-----C[..C]:..44_:B<....  0.45  0.27 27.61  8.88 -2.64
    7   42  0 #    6 | ....>A:...7_:[..A]A-**+-T[..T]:..43_:B<....  3.71  0.96 28.75  7.04  4.07
    8   41  0 #    7 | ....>A:...8_:[..G]G-----C[..C]:..42_:B<....  0.45  0.32  3.39  8.97 -3.75
    9   40  0 #    8 | ....>A:...9_:[..C]C-----G[..G]:..41_:B<....  1.05  0.47  8.17  8.82 -2.61
   10   39  0 #    9 | ....>A:..10_:[..G]G-----C[..C]:..40_:B<....  0.76  0.50 21.19  8.86 -2.17
   11   38  0 #   10 | ....>A:..11_:[..T]T-----A[..A]:..39_:B<....  0.34  0.27  5.44  9.01 -3.84
   12   37  0 #   11 | ....>A:..12_:[..G]G-----C[..C]:..38_:B<....  0.58  0.48 12.17  9.02 -2.85
   13   36  0 #   12 | ....>A:..13_:[..G]G-----C[..C]:..37_:B<....  0.24  0.03 16.54  9.08 -3.88
   14   35  0 #   13 | ....>A:..14_:[..G]G-----C[..C]:..36_:B<....  0.69  0.69 14.12  9.14 -2.22
   15   34  0 #   14 | ....>A:..15_:[..C]C-----G[..G]:..35_:B<....  0.41  0.09 14.35  9.02 -3.70
   16   33  0 #   15 | ....>A:..16_:[..G]G-----C[..C]:..34_:B<....  0.40  0.39 26.56  9.03 -2.49
   17   32  0 #   16 | ....>A:..17_:[..T]T-----A[..A]:..33_:B<....  0.70  0.62  8.01  9.49 -2.65
   18   31  0 #   17 | ....>A:..18_:[..A]A-----T[..T]:..32_:B<....  0.26  0.21 18.44  8.96 -3.41
   19   30  0 #   18 | ....>A:..19_:[..C]C-----G[..G]:..31_:B<....  0.26  0.09 25.81  9.27 -3.27
   20   29  0 #   19 | ....>A:..20_:[..A]A-----T[..T]:..30_:B<....  0.63  0.21 19.54  8.96 -2.98
   21   28  0 #   20 | ....>A:..21_:[..C]C-----G[..G]:..29_:B<....  0.57  0.33  6.64  9.06 -3.43
   22   27  0 #   21 | ....>A:..22_:[..A]A-----T[..T]:..28_:B<....  1.09  1.01 10.89  8.63 -1.34
   23   26  0 #   22 | ....>A:..23_:[..C]C-----G[..G]:..27_:B<....  0.66  0.37 28.93  9.04 -2.16
   24   25  0 #   23 | ....>A:..24_:[..A]A-----T[..T]:..26_:B<....  2.72  2.29 25.44  7.75  5.57
##### Base-pair criteria used:     4.00     0.00    15.00     2.50    65.00     4.50     7.50 [ O N]
##### 1 non-Watson-Crick base-pair, and 1 helix (0 isolated bps)
##### Helix #1 (23): 1 - 23

Certainly, find_pair is working properly, as designed. Is there anything I am missing here?

Xiang-Jun


Offline shynomat

  • with-posts
  • *
  • Posts: 16
    • View Profile
Re: analyzing longer DNA sequences
« Reply #4 on: November 20, 2012, 05:58:55 pm »
thanks again for your detailed message.
I am not sure if this is too much to ask you.
But if you feed the output of find_pair , curves.inp to Curves+ as shown below
/Curves/cur+ < curves.inp
The output file (sel.lis) says
Strands =    2 Atoms =   978 Units =    48
which is correct as I need to analyze only 24BP. But the next line,
Combined strands have   15 levels ...

  Strand  1 has  15 bases (5'-3'): GTGTGAGCGTGGGCG
  Strand  2 has  15 bases (3'-5'): CACACTCGCACCCGC

doesn't seem correct as I would expect ~24 levels or (22-23 atleast). For Curves+ the maximumnumber of nucleotides it can analyze is 1000.

thanks
Shyno

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: analyzing longer DNA sequences
« Reply #5 on: November 21, 2012, 11:05:56 am »
Quote
I am not sure if this is too much to ask you.
No, it's not; I always welcome user questions such as this one, and I strive to be as helpful as I could.

Now I see the problem you are experiencing. Strictly speaking, and as I mentioned in my previous reply, it's not a 3DNA problem but at the interface between 3DNA and Curves+. Since the purpose of providing the find_pair c+ option is to build a bridge between the two commonly used software programs for analyzing nucleic acid structures, I'd like to dig the issue further to see if anything can be done from 3DNA's perspective.

Your Curves+ input file curves.inp, as generated with find_pair -c+, has the following content:
&inp file=sel.pdb,
     lis=sel,
     fit=.t.,
     lib=./standard,
     isym=1,
&end
    2    1   -1    0    0
    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16   17   18   19   20   21   22   23   24
   47   46   45   44   43   42   41   40   39   38   37   36   35   34   33   32   31   30   29   28   27   26   25


which has 23 base pairs (note that bases 1 and 48 are not paired). Yet when the file is fed to Curves+, only the first 15 bps are recognized.
Quote
Combined strands have   15 levels ...

  Strand  1 has  15 bases (5'-3'): GTGTGAGCGTGGGCG
  Strand  2 has  15 bases (3'-5'): CACACTCGCACCCGC

To help solve the problem, could you try the following and report back (in detail) what you get?
  • Instead of 23 bps, shorten the list to < 15, say 10, as below:
        2    3    4    5    6    7    8    9   10   11
       47   46   45   44   43   42   41   40   39   38
    Run Curves+ on it again, do you get what you expect?
  • Since the nucleotide numbers are continuous, you can use the short-hand form to specify paired bases:
        2:24
       47:25
    Run Curves+, what do you get?

Xiang-Jun
« Last Edit: November 21, 2012, 11:09:14 am by xiangjun »

 

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University