Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: Analysis crashes: MD trajectory of abasic site in RNA  (Read 39563 times)

Offline Marcel Heinz

  • with-posts
  • *
  • Posts: 5
    • View Profile
Analysis crashes: MD trajectory of abasic site in RNA
« on: August 22, 2018, 11:42:01 am »
Dear Xiang-Jun,

Thanks a lot for your work and support here.
I currently have two issues with DSSR:

1.
Do you have an idea why the backbone parameter for a nucleic acids are so much faster calculated with do_x3dna than with DSSR? Analyzing a trajectory with 100k frames take for a native structure approx. 2 hours with do_x3dna. A native RNA structure with DSSR will take approx. 10 days (10k frames/day). I need to run DSSR, because my system contains an abasic site.

I used this command line:

Quote
x3dna-dssr -i=mod.pdb --nmr --json --abasic -o=mod.json

2.
I also run into problems with DSSR while trying to analyze a MD trajectory (100k structures) of a ~40 NA duplex structure, containing an abasic site. DSSR crashes with an error message at frame 1630 for the abasic site with the error message

Quote
model 1630 [1630 of 100000]
  1630.X.U.7          0.121
  1630.X.U.42         0.143
Uncaught exception 'Assertion failed' raised at [fncs_hbond.c:1762]
aborting...

The frame seems to be ok in the pdb file (s. attachement). Do you have an idea of what might be going wrong?

Best,

Marcel

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #1 on: August 22, 2018, 12:47:05 pm »
Hi Marcel,

Thanks for using DSSR and for your feedback on the Forum.

For your first question on the relative speed of DSSR vs. do_x3dna for calculating backbone parameters. In principle, DSSR would be slower than the 3DNA 'analyze' program since DSSR has many more (housekeeping) features calculated in the 'background'. However, the 2 hours vs 10 days difference for calculating backbone parameters as you noticed is well beyond my expectation. To investigate this issue further, could you elaborate on how you calculated the backbone parameters using do_x3dna? Does do_x3dna call the 'analyze -t' (for torsion angles) option?

For your second question, what version of DSSR were you using? This bug should have been fixed in the later releases of DSSR, as shown below:

Code: [Select]
Processing file 'frame1630.pdb'
  X.U.7               0.121
  X.U.42              0.143
    total number of nucleotides: 44
    total number of base pairs: 20
    total number of helices: 1
    total number of stems: 3
    total number of internal loops: 2

Also, note that you do not need to specify the --abasic option anymore. This feature is taken into consideration by default in recent releases of DSSR.

Best regards,

Xiang-Jun

Offline Marcel Heinz

  • with-posts
  • *
  • Posts: 5
    • View Profile
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #2 on: August 23, 2018, 05:23:29 am »
Thanks a lot for your very fast reply!

To investigate this issue further, could you elaborate on how you calculated the backbone parameters using do_x3dna? Does do_x3dna call the 'analyze -t' (for torsion angles) option?
I do run do_x3dna on a native RNA (no abasic site) trajectory with

Code: [Select]
do_x3dna -f ../mod.pdb -s ../mod.pdb -o test -hbond
and extract the dihedral information from "BackBoneCHiDihedrals_g.dat"

To investigate this further, I started the analysis an hour ago and 53,000 frames are currently analyzed with do_x3dna.

In contrast, I run DSSR now with the latest version (downloaded today) on the same pdb-trajectory with

Code: [Select]
x3dna-dssr -i=mod.pdb -o=output.json --more --json --nmr
and currently it runs for frame 2,100.

Based on my impression, DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures and DSSR gets slower and slower with every following structure.
As a workaround, I could cut my trajectory into 1000 structure fragments and analyze them independently, but I think the performance issue is important especially for the MD community to work with DSSR.


For your second question, what version of DSSR were you using? This bug should have been fixed in the later releases of DSSR, as shown below:

Code: [Select]
Processing file 'frame1630.pdb'
  X.U.7               0.121
  X.U.42              0.143
    total number of nucleotides: 44
    total number of base pairs: 20
    total number of helices: 1
    total number of stems: 3
    total number of internal loops: 2

Also, note that you do not need to specify the --abasic option anymore. This feature is taken into consideration by default in recent releases of DSSR.

Thank you very much. Indeed, I was using an older version of DSSR and it seems to work with your latest version (frame 1630 does not crash). And thank you for the hint with the default --abasic command. I just followed the manual to this: http://x3dna.org/articles/handling-of-abasic-sites-in-dssr

Best,

Marcel
« Last Edit: August 23, 2018, 05:27:49 am by Marcel Heinz »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #3 on: August 23, 2018, 11:13:13 am »
Hi Marcel,

Thanks for your follow-up.

Code: [Select]
do_x3dna -f ../mod.pdb -s ../mod.pdb -o test -hbond
and extract the dihedral information from "BackBoneCHiDihedrals_g.dat"

Could you dig further to see what 3DNA command is being called with the above do_x3dna run? What is the -hbond option? What's the do_x3dna output looks like? Do you only need backbone torsion angles?

Quote
Based on my impression, DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures and DSSR gets slower and slower with every following structure.

This is an important piece of information. I'd check if the slower performance after the first 1K structures (as you noticed) is relevant to increased memory allocation. In principle, DSSR should run each model at roughly constant speed.

Quote
As a workaround, I could cut my trajectory into 1000 structure fragments and analyze them independently, but I think the performance issue is important especially for the MD community to work with DSSR.

As noted above, I'll surely look into this issue. Would it be possible that you provide me a sample MD trajectories file?

Best regards,

Xiang-Jun
« Last Edit: August 23, 2018, 11:14:57 am by xiangjun »

Offline Marcel Heinz

  • with-posts
  • *
  • Posts: 5
    • View Profile
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #4 on: August 24, 2018, 06:04:35 am »
Hi Xiang-Jun,

Code: [Select]
do_x3dna -f ../mod.pdb -s ../mod.pdb -o test -hbond
and extract the dihedral information from "BackBoneCHiDihedrals_g.dat"

Could you dig further to see what 3DNA command is being called with the above do_x3dna run? What is the -hbond option? What's the do_x3dna output looks like? Do you only need backbone torsion angles?
1. I honestly don't know which exact 3DNA commands are being called, but I get all structural informations in separate output files. I could refer to the official webpage where you see all the outputs listed below: https://do-x3dna.readthedocs.io/en/latest/do_x3dna_usage.html
2. The -hbond flag calculates the number of H-bonds per basepair and timeframe. The ASCII ouputfile (h-bond_g.dat) looks like this for three basepairs
Quote
# Time =         1.00000
3
3
2

# Time =         2.00000
3
3
2
3. The ouput files are in ASCII format and column based, for the dihedrals, they do look like this
Quote
#Strand I                                                    Strand II
#alpha    beta   gamma   delta  epsilon   zeta    chi   |||  alpha    beta   gamma   delta  epsilon   zeta    chi

# Time =         1.00000
---   ---   54.5   85.7   -156.4   -76.3  -152.6  -88.4   -175.1   69.5   86.2   ---   ---  -114.5
-71.8   162.2   59.7   82.7   -164.5   -64.9  -163.0  -86.6   -175.1   54.5   72.1   178.6   -70.4  -136.6
-76.8   -178.7   53.6   74.4   -160.4   -64.9  -149.9  -71.7   169.9   60.6   78.4   -157.3   -57.9  -140.9
-70.2   172.9   68.1   66.9   -159.9   -59.8  -150.7  -73.8   171.8   63.5   69.0   -157.2   -61.2  -145.9
-73.0   178.7   54.8   81.2   -175.2   -59.1  -146.4  -71.9   166.1   76.6   76.4   -154.5   -64.2  -167.3

4. The overall structure parameter are of my general interest. But for now, I'm especially interested in the backbone dihedral angles.



Would it be possible that you provide me a sample MD trajectories file?
The data are unpublished, so if you could provide me a way of sharing it in a non-public way (e.g. email address), I'm able to provide a native trajectory.

Just as a note: I continued the DSSR analysis since yesterday and frame 14762 is currently loaded, so I do agree to the memory issue.

Best,

Marcel

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #5 on: August 24, 2018, 11:45:32 am »
Thanks for your feedback.

Quote
The data are unpublished, so if you could provide me a way of sharing it in a non-public way (e.g. email address), I'm able to provide a native trajectory.

I just need a sample dataset for checking purpose. It could be from a public data repository or a deliberately revised version of your unpublished data. You could send me such a minimal dataset (~1K frames) to my 3dna.lu gmail account. Of course, I will keep such data for the internal testing purpose only, not to be shared with anyone else.

Quote
Just as a note: I continued the DSSR analysis since yesterday and frame 14762 is currently loaded, so I do agree to the memory issue.

I've checked via valgrind that the slowness with the DSSR --nmr option for a large dataset is not due to memory issue -- "no leaks are possible". Otherwise, the program would eventually run out of memory. I now have a general idea of the issue and a possible fix.

Best regards,

Xiang-Jun



Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #6 on: August 31, 2018, 12:52:45 am »
I've updated DSSR to v1.7.8-2018sep01. Among other improvements, the new version should have been fixed the speed issue when DSSR is applied to the analysis of large MD trajectories. Having not have any sample dataset from you, I used some NMR ensembles with 10-20 models for testing purpose.

Based on your observation that "DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures", I'd assume the updated DSSR should run as fast as do_x3dna for your whole dataset. Now analyzing your trajectory using 100k frames with DSSR should ~2 hours instead of 10 days.

Please check and report back if the new version is working as expected.

Best regards,

Xiang-Jun


PS: While not being an MD practitioner, I follow research articles in this active and increasingly important field. Dedicated tools (such as MDAnalysis and PPTRAJ) exist for the analysis of MD simulation trajectories. Nevertheless, 3DNA still has something to offer, as shown by do_x3dna.

DSSR has an unmatched set of features, and wherever possible, I'm keen on expanding its applicability in MD analyses. With the speed improvement of this release, DSSR is likely to play a significant role in MD simulations.
« Last Edit: September 01, 2018, 12:16:29 pm by xiangjun »

Offline Marcel Heinz

  • with-posts
  • *
  • Posts: 5
    • View Profile
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #7 on: September 06, 2018, 09:46:54 am »
Hi Xiang-Jun,

Thank you so much for the DSSR update. I do apologize for the inconvenience of not having a trajectory, but I was traveling with an unexpectedly unstable internet connection.
Just arrived back and tested your latest DSSR version with my 100k structure trajectory.
Your update has a dramatic speed up in analyzing the trajectory. Great improvement!
 
Quote
Time used: 00:00:35:59

Are you still interested in a 10k structure trajectory?

Best regards,

Marcel

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #8 on: September 06, 2018, 10:06:59 am »
Hi Marcel,

Glad to hear your encouraging feedback. I'm glad to hear that DSSR works for large NMR ensembles or MD trajectories -- 35m vs. 10 days for 100k structures! It is even faster than do_x3dna (2h) for the same dataset -- this is expected, based on your previous feedback that "DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures".

Quote
Are you still interested in a 10k structure trajectory?

It helps to have such a dataset for future tests. Please send me a ~1k (instead of 10k) sample structure trajectory.

Working together, we will make DSSR directly relevant in the active field of MD simulations.

Best regards,

Xiang-Jun
« Last Edit: September 06, 2018, 10:09:27 am by xiangjun »

Offline Marcel Heinz

  • with-posts
  • *
  • Posts: 5
    • View Profile
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #9 on: September 06, 2018, 11:42:33 am »
Indeed, it is much faster than do_x3dna. Great work!

I've send you an email with a link to download the trajectory to xiangjun@x3dna.org

Just as a short note: I have to look deeper into it, but the output.json file has a size of 6.8G, while the do_x3dna files have combined only 992M.  Is the --json flag mandatory for trajectories or do the 'normal' output-files also contain frame informations?

Best,

Marcel

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Analysis crashes: MD trajectory of abasic site in RNA
« Reply #10 on: September 06, 2018, 01:48:00 pm »
Quote
I've send you an email with a link to download the trajectory to xiangjun@x3dna.org

Got it -- thanks!

Quote
Just as a short note: I have to look deeper into it, but the output.json file has a size of 6.8G, while the do_x3dna files have combined only 992M.  Is the --json flag mandatory for trajectories or do the 'normal' output-files also contain frame informations?

The much larger output file from DSSR than that from do_x3dna is expected. DSSR produces far more RNA structural parameters than other tools I'm aware of. So the JSON output contains far more than you need (torsion angles) right now. However, other DSSR-derived features are of potential use in other projects, as DSSR is used more widely in the MD field.

The --json flag is not mandatory for trajectories, but only the JSON output contains a complete compilation of DSSR parameters. Otherwise, the torsion angles are not listed in 'normal' output file. Check the DSSR manual for more info.

In practice, you could use a tool like jq to filter the stream of JSON data produced by DSSR. This could produce a smaller output file than that from do_x3dna.

Best regards,

Xiang-Jun




« Last Edit: September 06, 2018, 01:56:23 pm by xiangjun »

 

Funded by X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids (R24GM153869)

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University