Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Netiquette · Download · News · Gallery · Homepage · DSSR · Web-DSSR · DSSR Manual · Reproduce DSSR · DSSR-Jmol · DSSR-PyMOL · Web-SNAP

Messages - xiangjun

Pages: [1] 2 3 ... 82
1
RNA structures (DSSR) / Re: Web 3DNA Error
« on: October 18, 2018, 11:58:43 am »
Hi,

Thanks for using 3DNA and for asking questions on the Forum. I've informed Rutgers on the w3DNA-specific issues, and hopefully the problem can be solved soon.

For the analysis of RNA structures, especially those from MD simulations, I'd strongly suggest that you try DSSR (http://docs.x3dna.org/dssr-manual.pdf). DSSR has options that faciltiate MD analyses. See my recent blogpost "DSSR is fast for MD analysis".

For your attached structure 805.pdb, running DSSR is as simple as: x3dna-dssr -i=805.pdb

From the DSSR-derived secondary structure in .ct notation, you can use VARNA to generate the attached figure.

HTH,

Xiang-Jun


2
Hi Nil,

Thanks for your follow-up! It is a good example of what I've always encouraged 3DNA/DSSR users to follow. Presumably, a non-trivial thread would include clarifications along the way and a summary on the topic (from user's perspective).

To make the point of the thread clear, I've created an image of the G2--C23 Watson-Crick pair (PDB id 355d) with base reference frames. The G2 base is filled with green, and its z-axis is pointing upwards. The C23 base is not filled, and its z-axis is pointing downwards. A rotation of the C23 base frame around its x-axis by 180 degrees would bring its y- and z-axes (yellow) parallel with the G2 y- and z-axes (green). As is clear from the figure, the rotated C23 frame is not perfectly aligned with the G2 base frame. The base-pair parameters (shear, buckle, propeller etc.) are used to characterize the deformation. An ideal, perfectly planar WC pair would have all the six parameters to be zeros --- that's how the standard base reference frame is defined.

HTH,

Xiang-Jun

3
You're welcome to ask any DSSR-related questions on the Forum. On the other hand, please respond to my follow-up questions, for the benefit of all parties involved.

Quote
But I cannot be able to get the meaning of c.H, t.H, c.S, t.S, c.W, t.W - what is the . (dot) here? Can you please explain me.

Please read the DSSR User Manual (specifically section 3.2.2 "Base pairs"). Let me know how the corresponding description can be improved.

HTH,

Xiang-Jun



4
Hi Nil,

You've touched a subtle yet important point in 3DNA/DSSR. The 'difference' you noticed is as expected: it simply represents a 180-degree rotation around the x-axis of the reference frame of the complementary base for an M–N type pair (as is the case for WC pairs). Please read the 2003 3DNA paper (also the 2015 DSSR paper) for further information. As you know, the 3DNA source code is available for your examination.

To better understand the point, you could try the following two things:

  • Attach reference frame F2 (from DSSR) to the complementary base, and show an image in 3D. Note that you also need the origin in order to show its position and orientations.
  • Derive the base-pair parameters from F1c and F2c (from your calculations), you should get the same result mentioned at the top of your post.

Working through those steps would be a great exercise for you. Please post back your results on the Forum -- that would surely benefit other 3DNA/DSSR users who care about such (essential) details.

HTH,

Xiang-Jun

5
Quote
Can you please help me to compare the 'adjacent and non-adjacent stacking' of MC-Annotate output with something in DSSR output?

In DSSR output, the fact that stacked bases are covalently connected via a phosphodiester bond is noted by the keyword "connected". For example, for 1ehz (a tRNA), you'd see the following in DSSR output. For entry #68, the stacking between G51 and U52 is marked by connected, while the other two entries do not.

 # x3dna-dssr -i=1ehz.pdb --non-pair
List of 93 non-pairing interactions
......
36 A.A21    A.C48    stacking: 5.9(2.9)--mm(<>,outward) interBase-angle=5 min_baseDist=3.28
68 A.G51    A.U52    stacking: 6.8(4.0)--pm(>>,forward) interBase-angle=6 connected min_baseDist=3.22
71 A.G53    A.A62    stacking: 4.2(2.0)--mm(<>,outward) interBase-angle=8 min_baseDist=3.15


It is up to you to compare the DSSR output with that from MC-Annotate. You're welcome to post your finding here for the benefit of other viewers of the thread.

HTH,

Xiang-Jun

6
Hi Mahfuz,

Quote
The canonical and non-canonical stackings are not clear to me.

Where did you get the concept of canonical vs non-canonical stackings from? What do you mean specifically? It is not a term used in DSSR.

In DSSR, pairwise stacking interactions are available via the --non-pair option, as documented in the -h option and the manual. Stacking interaction is detected and quantified by the overlap area of bases.

Quote
I'm also trying to get the direction of the stackings (upword/downword/inward/outward).

Check section "3.8 The --non-pair option" of the DSSR manual:

Quote
As in 3DNA [8], base-stacking is quantified as the area (in Å2) of the overlapped polygon defined by the two bases of the interacting nts, where the base atoms are projected onto the mean base plane9. In the output file, values in parentheses measure the overlap of base ring atoms only, and those outside parentheses include exocyclic atoms on the ring. Base-stacking interactions are classified into one of the following four categories: pm(>>,forward), mp(<<,backward), mm(<>,outward), and pp(><,inward). Here p and m represent the plus and minus faces of the base ring, as defined by the direction of the z-axis of the standard base reference frame. The symbols (>>, <<, <>, and ><) follow Major et al, except pm(>>) is called forward instead of upward, and mp(<<) backward instead of downward [32]. Moreover, the inter-base angle is reported; closer to zero means the two bases are nearly parallel.

Basically, each base has two faces (just as a coin). See Fig. 1 of the DSSR paper, attached below (also the 2003 3DNA paper). Each z-axis has a minus--->plus directionality, and there are 2x2=4 possible combinations of two bases in stack, as noted above.

"summary of methods to identify RNA structural components" title="summary of methods to identify RNA structural components"

HTH,

Xiang-Jun

7
RNA structures (DSSR) / Re: DSSR Download and Installation
« on: September 26, 2018, 06:48:51 pm »
Hi,

Please check the following two things:

  • The .dms extension (thus x3dna-dssr.dms instead of the intended x3dna-dssr) is added by Safari on macOS. Using the Google Chrome browser won't have the problem. To avoid confusion, you could simply remove the extra extension by: mv x3dna-dssr.dms x3dna-dssr
  • Change the access permissions of the downloaded x3dna-dssr file, by using the chmod u+x command option to make it executable. Then DSSR should work -- check it out: x3dna-dssr -h

I've updated the DSSR User Manual with a note on the extra .dms on macOS. Please read it and let me know if anything is still unclear.

HTH,

Xiang-Jun

8
Quote
I've send you an email with a link to download the trajectory to xiangjun@x3dna.org

Got it -- thanks!

Quote
Just as a short note: I have to look deeper into it, but the output.json file has a size of 6.8G, while the do_x3dna files have combined only 992M.  Is the --json flag mandatory for trajectories or do the 'normal' output-files also contain frame informations?

The much larger output file from DSSR than that from do_x3dna is expected. DSSR produces far more RNA structural parameters than other tools I'm aware of. So the JSON output contains far more than you need (torsion angles) right now. However, other DSSR-derived features are of potential use in other projects, as DSSR is used more widely in the MD field.

The --json flag is not mandatory for trajectories, but only the JSON output contains a complete compilation of DSSR parameters. Otherwise, the torsion angles are not listed in 'normal' output file. Check the DSSR manual for more info.

In practice, you could use a tool like jq to filter the stream of JSON data produced by DSSR. This could produce a smaller output file than that from do_x3dna.

Best regards,

Xiang-Jun





9
Hi Marcel,

Glad to hear your encouraging feedback. I'm glad to hear that DSSR works for large NMR ensembles or MD trajectories -- 35m vs. 10 days for 100k structures! It is even faster than do_x3dna (2h) for the same dataset -- this is expected, based on your previous feedback that "DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures".

Quote
Are you still interested in a 10k structure trajectory?

It helps to have such a dataset for future tests. Please send me a ~1k (instead of 10k) sample structure trajectory.

Working together, we will make DSSR directly relevant in the active field of MD simulations.

Best regards,

Xiang-Jun

10
Thanks for your followups.

It is always a good idea to adhere to a standard format, as much as possible. That'd make downstream analysis simple.

I've revised the mmCIF parser in the DSSR v1.7.9-2018sep06 release. It is more tolerant of input mmCIF files than previous versions. As a result, DSSR now works with the Biopython-produced 333D_biopython.cif file you originally attached, without any modifications.

Please download DSSR v1.7.9-2018sep06. Have a try and report back how it works.

Best regards,

Xiang-Jun

11
I've performed a bit more investigation of your attached 333D_biopython.cif file. My findings are as follows:

In 333D_biopython.cif, the header together with the first atom reads:

loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.auth_seq_id
_atom_site.auth_asym_id
_atom_site.pdbx_PDB_model_num
ATOM   1   O  'O5'' . C   A ? 1 ? 23.308 21.309 18.480 1.0 17.2  1  A ' '
### the above should be changed to the following:
ATOM   1   O  "O5'" . C   A ? 1 ? 23.308 21.309 18.480 1.0 17.2  1  A 1
......


The sugar atom name 'O5'' looks weird: it should be replaced with "O5'". The last item is pdbx_PDB_model_num, an integer. However, the ATOM record gives ' ' . The space character should be replaced with a number (e.g., 1). The revised version is shown above in bold green.

After these two fixes for all the ATOM/HETATM records, DSSR works as expected. I've attached the revised mmCIF file for your reference.

Best regards,

Xiang-Jum

12
Hi,

Thanks for using DSSR and for posting your questions on the Forum.

For parsing mmCIF, DSSR requires a minimal set of required keys. In principle, the Biopython output you have should suffice. Your attached example mmCIF file, however, cannot be read by Jmol or PyMOL. With Jmol 14.27.1 (2017-12-11 09:38), the error message is: "Error reading file at end of file -1". Loading into PyMOL open-source version 1.8.7.0, I cannot see any atoms showing up at all.

Are you sure your example file is valid in mmCIF format? Please verify.

Best regards,

Xiang-Jun

13
Bug reports / Re: 5-hydroxy methyl cytosine
« on: September 05, 2018, 10:43:10 am »
Hi Kareem,

Thanks for using 3DNA and for posting your questions on the Forum.

Quote
I used web 3DNA for my structure which has 5- hydroxy methyl cytosine (5HM). When I used web 3 DNA and i got the error message "error in pdb.inp". Later, I used 4pw5 pdb file, which has 5HM (5hc different annotation) and I got the same message.

The web 3DNA error message for 5HM was due to unrecognized nucleotides (i.e., not in the list of baselist.dat) in early versions of 3DNA, up to v2.0 released 10 years ago. The limitation was fixed a long time ago, as of 3DNA v2.1, so an unrecognized nt such as 5HM is automatically matched to its canonical counterpart (cytosine for 5HM).

As far as I know, the web 3DNA server hosted by Rutgers is still using v2.0. As a result, PDB entries containing 5HM cannot be automatically processed. In principle, updating 3DNA v2.0 to the latest v2.3 release should fix the issue. In practice, there are maybe some technical implications. I've no idea when (or if) that update would happen. See the Section w3DNA -- web interface to 3DNA.

Assuming you 3DNA v2.3 installed on your computer, and the 5HM-containing PDB structure is named 5hm-str.pdb, the following command will solve your problem right away:

Code: [Select]
find_pair 5hm-str.pdb 5hm-str.inp
analyze 5hm-str.inp

The result is in file 5hm-str.out.

Quote
I used SNAP to identify DNA-protein interactions and I learned that we don't have reference to cite.

Can I mention like this in my paper citing SNAP.  ( "SNAP from 3DNA suite").

The suggested citation for SNAP is available by running x3dna-snap -h, and it reads as follows:

Quote
    CITATION: before a paper dedicated to SNAP is published, please
  cite the software as follows: *your specific usage* of SNAP, a new
  component of the 3DNA suite of programs [citing the 2003 NAR or the
  2008 Nature Protocols paper].

Inspired by your above question, I've added the --citation option to SNAP to make this point explicit.

Hope that helps.

Xiang-Jun

14
I've updated DSSR to v1.7.8-2018sep01. Among other improvements, the new version should have been fixed the speed issue when DSSR is applied to the analysis of large MD trajectories. Having not have any sample dataset from you, I used some NMR ensembles with 10-20 models for testing purpose.

Based on your observation that "DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures", I'd assume the updated DSSR should run as fast as do_x3dna for your whole dataset. Now analyzing your trajectory using 100k frames with DSSR should ~2 hours instead of 10 days.

Please check and report back if the new version is working as expected.

Best regards,

Xiang-Jun


PS: While not being an MD practitioner, I follow research articles in this active and increasingly important field. Dedicated tools (such as MDAnalysis and PPTRAJ) exist for the analysis of MD simulation trajectories. Nevertheless, 3DNA still has something to offer, as shown by do_x3dna.

DSSR has an unmatched set of features, and wherever possible, I'm keen on expanding its applicability in MD analyses. With the speed improvement of this release, DSSR is likely to play a significant role in MD simulations.

15
RNA structures (DSSR) / Re: significance of character "P"
« on: August 28, 2018, 11:26:33 am »
Hi,

Thanks for using DSSR and for posting your questions on the Forum. I appreciate your kind words about the DSSR program. I take DSSR as an example of what scientific software should/could be in my understanding. I will continuously do my best to extend and polish the program to make it a handy tool for the RNA structure community.

Now back to your question on what "P" means in DSSR output. The "3.2.1 Summary section" of the DSSR User Manual has the following note:

Quote
Note that pseudouridine, the most prevalent modified nt in RNA, is denoted ‘P’1 in DSSR since the small case ‘p’ is reserved for potential modified pseudouridines.

1Not to be confused with the phosphorus atom in the backbone phosphate group. In fact, the distinction should be clear in context.

I picked up "P" for pseudouridine specifically because it is not listed in the IUPAC nomenclature for nucleotide ambiguity code.

See also my blog posts "Modified pseudouridines" and "Definition of the chi (χ) torsion angle for pseudouridine"

The lower case "t" means modified T which can also exist in RNA structures. For example, the classic yeast phenylalanine tRNA (1ehz) has the T-loop named after a T base.

Also "tP" means t and P are directly connected, as in other cases. Otherwise, a "&" would be inserted, as also shown in your example.



As a side note, DSSR has far more features than a typical user may care. See the following note in the DSSR User Manual.

Quote
There is actually more to DSSR than meets the eye. To target the widest possible user base, I’ve deliberately omitted advanced/technical features in the manual (which is already over 90 pages). Moreover, many documented options have additional variations that may be of interests to some applications. If you feel that a relevant functionality should be there but missing from the manual, Simply ask on the 3DNA Forum.

Pages: [1] 2 3 ... 82

Created and maintained by Dr. Xiang-Jun Lu [律祥俊], Principal Investigator of the NIH grant R01GM096889
Dr. Lu is currently affiliated with the Bussemaker Laboratory at the Department of Biological Sciences, Columbia University.