Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Netiquette · Download · News · Gallery · Homepage · DSSR · Web-DSSR · DSSR Manual · Reproduce DSSR · DSSR-Jmol · DSSR-PyMOL · Web-SNAP

Messages - xiangjun

Pages: [1] 2 3 ... 82
1
Quote
I've send you an email with a link to download the trajectory to xiangjun@x3dna.org

Got it -- thanks!

Quote
Just as a short note: I have to look deeper into it, but the output.json file has a size of 6.8G, while the do_x3dna files have combined only 992M.  Is the --json flag mandatory for trajectories or do the 'normal' output-files also contain frame informations?

The much larger output file from DSSR than that from do_x3dna is expected. DSSR produces far more RNA structural parameters than other tools I'm aware of. So the JSON output contains far more than you need (torsion angles) right now. However, other DSSR-derived features are of potential use in other projects, as DSSR is used more widely in the MD field.

The --json flag is not mandatory for trajectories, but only the JSON output contains a complete compilation of DSSR parameters. Otherwise, the torsion angles are not listed in 'normal' output file. Check the DSSR manual for more info.

In practice, you could use a tool like jq to filter the stream of JSON data produced by DSSR. This could produce a smaller output file than that from do_x3dna.

Best regards,

Xiang-Jun





2
Hi Marcel,

Glad to hear your encouraging feedback. I'm glad to hear that DSSR works for large NMR ensembles or MD trajectories -- 35m vs. 10 days for 100k structures! It is even faster than do_x3dna (2h) for the same dataset -- this is expected, based on your previous feedback that "DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures".

Quote
Are you still interested in a 10k structure trajectory?

It helps to have such a dataset for future tests. Please send me a ~1k (instead of 10k) sample structure trajectory.

Working together, we will make DSSR directly relevant in the active field of MD simulations.

Best regards,

Xiang-Jun

3
Thanks for your followups.

It is always a good idea to adhere to a standard format, as much as possible. That'd make downstream analysis simple.

I've revised the mmCIF parser in the DSSR v1.7.9-2018sep06 release. It is more tolerant of input mmCIF files than previous versions. As a result, DSSR now works with the Biopython-produced 333D_biopython.cif file you originally attached, without any modifications.

Please download DSSR v1.7.9-2018sep06. Have a try and report back how it works.

Best regards,

Xiang-Jun

4
I've performed a bit more investigation of your attached 333D_biopython.cif file. My findings are as follows:

In 333D_biopython.cif, the header together with the first atom reads:

loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.auth_seq_id
_atom_site.auth_asym_id
_atom_site.pdbx_PDB_model_num
ATOM   1   O  'O5'' . C   A ? 1 ? 23.308 21.309 18.480 1.0 17.2  1  A ' '
### the above should be changed to the following:
ATOM   1   O  "O5'" . C   A ? 1 ? 23.308 21.309 18.480 1.0 17.2  1  A 1
......


The sugar atom name 'O5'' looks weird: it should be replaced with "O5'". The last item is pdbx_PDB_model_num, an integer. However, the ATOM record gives ' ' . The space character should be replaced with a number (e.g., 1). The revised version is shown above in bold green.

After these two fixes for all the ATOM/HETATM records, DSSR works as expected. I've attached the revised mmCIF file for your reference.

Best regards,

Xiang-Jum

5
Hi,

Thanks for using DSSR and for posting your questions on the Forum.

For parsing mmCIF, DSSR requires a minimal set of required keys. In principle, the Biopython output you have should suffice. Your attached example mmCIF file, however, cannot be read by Jmol or PyMOL. With Jmol 14.27.1 (2017-12-11 09:38), the error message is: "Error reading file at end of file -1". Loading into PyMOL open-source version 1.8.7.0, I cannot see any atoms showing up at all.

Are you sure your example file is valid in mmCIF format? Please verify.

Best regards,

Xiang-Jun

6
Bug reports / Re: 5-hydroxy methyl cytosine
« on: September 05, 2018, 10:43:10 am »
Hi Kareem,

Thanks for using 3DNA and for posting your questions on the Forum.

Quote
I used web 3DNA for my structure which has 5- hydroxy methyl cytosine (5HM). When I used web 3 DNA and i got the error message "error in pdb.inp". Later, I used 4pw5 pdb file, which has 5HM (5hc different annotation) and I got the same message.

The web 3DNA error message for 5HM was due to unrecognized nucleotides (i.e., not in the list of baselist.dat) in early versions of 3DNA, up to v2.0 released 10 years ago. The limitation was fixed a long time ago, as of 3DNA v2.1, so an unrecognized nt such as 5HM is automatically matched to its canonical counterpart (cytosine for 5HM).

As far as I know, the web 3DNA server hosted by Rutgers is still using v2.0. As a result, PDB entries containing 5HM cannot be automatically processed. In principle, updating 3DNA v2.0 to the latest v2.3 release should fix the issue. In practice, there are maybe some technical implications. I've no idea when (or if) that update would happen. See the Section w3DNA -- web interface to 3DNA.

Assuming you 3DNA v2.3 installed on your computer, and the 5HM-containing PDB structure is named 5hm-str.pdb, the following command will solve your problem right away:

Code: [Select]
find_pair 5hm-str.pdb 5hm-str.inp
analyze 5hm-str.inp

The result is in file 5hm-str.out.

Quote
I used SNAP to identify DNA-protein interactions and I learned that we don't have reference to cite.

Can I mention like this in my paper citing SNAP.  ( "SNAP from 3DNA suite").

The suggested citation for SNAP is available by running x3dna-snap -h, and it reads as follows:

Quote
    CITATION: before a paper dedicated to SNAP is published, please
  cite the software as follows: *your specific usage* of SNAP, a new
  component of the 3DNA suite of programs [citing the 2003 NAR or the
  2008 Nature Protocols paper].

Inspired by your above question, I've added the --citation option to SNAP to make this point explicit.

Hope that helps.

Xiang-Jun

7
I've updated DSSR to v1.7.8-2018sep01. Among other improvements, the new version should have been fixed the speed issue when DSSR is applied to the analysis of large MD trajectories. Having not have any sample dataset from you, I used some NMR ensembles with 10-20 models for testing purpose.

Based on your observation that "DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures", I'd assume the updated DSSR should run as fast as do_x3dna for your whole dataset. Now analyzing your trajectory using 100k frames with DSSR should ~2 hours instead of 10 days.

Please check and report back if the new version is working as expected.

Best regards,

Xiang-Jun


PS: While not being an MD practitioner, I follow research articles in this active and increasingly important field. Dedicated tools (such as MDAnalysis and PPTRAJ) exist for the analysis of MD simulation trajectories. Nevertheless, 3DNA still has something to offer, as shown by do_x3dna.

DSSR has an unmatched set of features, and wherever possible, I'm keen on expanding its applicability in MD analyses. With the speed improvement of this release, DSSR is likely to play a significant role in MD simulations.

8
RNA structures (DSSR) / Re: significance of character "P"
« on: August 28, 2018, 11:26:33 am »
Hi,

Thanks for using DSSR and for posting your questions on the Forum. I appreciate your kind words about the DSSR program. I take DSSR as an example of what scientific software should/could be in my understanding. I will continuously do my best to extend and polish the program to make it a handy tool for the RNA structure community.

Now back to your question on what "P" means in DSSR output. The "3.2.1 Summary section" of the DSSR User Manual has the following note:

Quote
Note that pseudouridine, the most prevalent modified nt in RNA, is denoted ‘P’1 in DSSR since the small case ‘p’ is reserved for potential modified pseudouridines.

1Not to be confused with the phosphorus atom in the backbone phosphate group. In fact, the distinction should be clear in context.

I picked up "P" for pseudouridine specifically because it is not listed in the IUPAC nomenclature for nucleotide ambiguity code.

See also my blog posts "Modified pseudouridines" and "Definition of the chi (χ) torsion angle for pseudouridine"

The lower case "t" means modified T which can also exist in RNA structures. For example, the classic yeast phenylalanine tRNA (1ehz) has the T-loop named after a T base.

Also "tP" means t and P are directly connected, as in other cases. Otherwise, a "&" would be inserted, as also shown in your example.



As a side note, DSSR has far more features than a typical user may care. See the following note in the DSSR User Manual.

Quote
There is actually more to DSSR than meets the eye. To target the widest possible user base, I’ve deliberately omitted advanced/technical features in the manual (which is already over 90 pages). Moreover, many documented options have additional variations that may be of interests to some applications. If you feel that a relevant functionality should be there but missing from the manual, Simply ask on the 3DNA Forum.

9
Thanks for your feedback.

Quote
The data are unpublished, so if you could provide me a way of sharing it in a non-public way (e.g. email address), I'm able to provide a native trajectory.

I just need a sample dataset for checking purpose. It could be from a public data repository or a deliberately revised version of your unpublished data. You could send me such a minimal dataset (~1K frames) to my 3dna.lu gmail account. Of course, I will keep such data for the internal testing purpose only, not to be shared with anyone else.

Quote
Just as a note: I continued the DSSR analysis since yesterday and frame 14762 is currently loaded, so I do agree to the memory issue.

I've checked via valgrind that the slowness with the DSSR --nmr option for a large dataset is not due to memory issue -- "no leaks are possible". Otherwise, the program would eventually run out of memory. I now have a general idea of the issue and a possible fix.

Best regards,

Xiang-Jun



10
Hi Marcel,

Thanks for your follow-up.

Code: [Select]
do_x3dna -f ../mod.pdb -s ../mod.pdb -o test -hbond
and extract the dihedral information from "BackBoneCHiDihedrals_g.dat"

Could you dig further to see what 3DNA command is being called with the above do_x3dna run? What is the -hbond option? What's the do_x3dna output looks like? Do you only need backbone torsion angles?

Quote
Based on my impression, DSSR and do_x3dna are approx. similar in analyzing the first ~1000 structures and DSSR gets slower and slower with every following structure.

This is an important piece of information. I'd check if the slower performance after the first 1K structures (as you noticed) is relevant to increased memory allocation. In principle, DSSR should run each model at roughly constant speed.

Quote
As a workaround, I could cut my trajectory into 1000 structure fragments and analyze them independently, but I think the performance issue is important especially for the MD community to work with DSSR.

As noted above, I'll surely look into this issue. Would it be possible that you provide me a sample MD trajectories file?

Best regards,

Xiang-Jun

11
Hi Marcel,

Thanks for using DSSR and for your feedback on the Forum.

For your first question on the relative speed of DSSR vs. do_x3dna for calculating backbone parameters. In principle, DSSR would be slower than the 3DNA 'analyze' program since DSSR has many more (housekeeping) features calculated in the 'background'. However, the 2 hours vs 10 days difference for calculating backbone parameters as you noticed is well beyond my expectation. To investigate this issue further, could you elaborate on how you calculated the backbone parameters using do_x3dna? Does do_x3dna call the 'analyze -t' (for torsion angles) option?

For your second question, what version of DSSR were you using? This bug should have been fixed in the later releases of DSSR, as shown below:

Code: [Select]
Processing file 'frame1630.pdb'
  X.U.7               0.121
  X.U.42              0.143
    total number of nucleotides: 44
    total number of base pairs: 20
    total number of helices: 1
    total number of stems: 3
    total number of internal loops: 2

Also, note that you do not need to specify the --abasic option anymore. This feature is taken into consideration by default in recent releases of DSSR.

Best regards,

Xiang-Jun

12
Hi Shuxiang,

You've touched a subtle point in the labeling of residues (nucleotides) in mmCIF vs. PDB. Using PDB entry 3mgp, as you used, as an example, an excerpt for the corresponding PDB and mmCIF files (all downloaded from RCSB PDB) for I.DA.-73 is as below:

# PDB format
ATOM   6169  O5'  DA I -73       2.638   0.163  93.308  1.00166.52           O 
ATOM   6170  C5'  DA I -73       3.279   0.178  94.579  1.00166.78           C 
ATOM   6171  C4'  DA I -73       3.645  -1.223  95.042  1.00167.01           C 
ATOM   6172  O4'  DA I -73       2.489  -2.096  95.012  1.00167.37           O 
ATOM   6173  C3'  DA I -73       4.650  -1.969  94.180  1.00166.94           C 
ATOM   6174  O3'  DA I -73       5.972  -1.523  94.462  1.00166.58           O 
ATOM   6175  C2'  DA I -73       4.428  -3.410  94.635  1.00167.20           C 
ATOM   6176  C1'  DA I -73       2.941  -3.442  94.998  1.00167.53           C 
ATOM   6177  N9   DA I -73       2.097  -4.257  94.106  1.00167.70           N 
ATOM   6178  C8   DA I -73       0.995  -3.832  93.410  1.00167.66           C 

#mmCIF:
ATOM   6161  O  "O5'" . DA  I  5 1   ? 2.638   0.163   93.308 1.00 166.52 ? -73  DA  I "O5'" 1
ATOM   6162  C  "C5'" . DA  I  5 1   ? 3.279   0.178   94.579 1.00 166.78 ? -73  DA  I "C5'" 1
ATOM   6163  C  "C4'" . DA  I  5 1   ? 3.645   -1.223  95.042 1.00 167.01 ? -73  DA  I "C4'" 1
ATOM   6164  O  "O4'" . DA  I  5 1   ? 2.489   -2.096  95.012 1.00 167.37 ? -73  DA  I "O4'" 1
ATOM   6165  C  "C3'" . DA  I  5 1   ? 4.650   -1.969  94.180 1.00 166.94 ? -73  DA  I "C3'" 1
ATOM   6166  O  "O3'" . DA  I  5 1   ? 5.972   -1.523  94.462 1.00 166.58 ? -73  DA  I "O3'" 1
ATOM   6167  C  "C2'" . DA  I  5 1   ? 4.428   -3.410  94.635 1.00 167.20 ? -73  DA  I "C2'" 1
ATOM   6168  C  "C1'" . DA  I  5 1   ? 2.941   -3.442  94.998 1.00 167.53 ? -73  DA  I "C1'" 1
ATOM   6169  N  N9    . DA  I  5 1   ? 2.097   -4.257  94.106 1.00 167.70 ? -73  DA  I N9    1
ATOM   6170  C  C8    . DA  I  5 1   ? 0.995   -3.832  93.410 1.00 167.66 ? -73  DA  I C8    1

As noted in the mmCIF header, the sequence number "-73" matches "_atom_site.auth_seq_id", and number "1" matches "_atom_site.label_seq_id". Since the corresponding PDB entry uses _atom_site.auth_seq_id (-73), DSSR follows that convention.

DSSR currently has no option to employ the labeling "_atom_site.label_seq_id" while "_atom_site.auth_seq_id" exists.

Best regards,

Xiang-Jun

13
Thanks for the encouragement for writing a paper on SNAP -- it's been on my to-do list for a while, but delayed for various reasons. Overall, I take it as a positive thing that a method paper is written after that corresponding program has been in active use for a while. My goal here is not to write a paper but to solve a set of related problems so that the community can build upon my work.

You are right in that pseudo-pairing/stacking interactions are between planar moieties in proteins and the standard base reference frame. The planar moieties include the amino-acids { "arg", "phe", "tyr", "trp", "his", "asn", "asp", "gln", "glu" } and the peptide bond. A reference frame is defined for each of them. The pseudo-pairing/stacking interactions of these planar moieties with nucleobases are identified and quantified using exactly the same algorithms as in 3DNA/DSSR. In addition to the pair-wise interactions, 'multiplets' and stacks (as in DSSR) involving both amino-acids and bases will be reported in future releases of SNAP.

HTH,

Xiang-Jun

14
General discussions (Q&As) / Re: x3dna-v2.3 no backbone ribbon
« on: July 31, 2018, 03:02:10 pm »
Quote
I did find a relatively easy "workaround" though with pymol-- If I load the file "pmiview1" and have pymol create a cartoon with it, it overlays nicely with the r3d file.

That's true. That's also the tricks the DSSR --blocview option depends on. When 3DNA was initially released around the beginning of the century, MolScript/Raster3D (and RasMol) were very popular. Nowadays, these earlier software programs are virtually gone and PyMOL (among others) become the dominant players. DSSR takes advantages of what PyMOL has to offer and it greatly simplifies the user-interface to creation of the characteristic cartoon-block images.

Best regards,

Xiang-Jun


 

15
Bug reports / MOVED: x3dna-v2.3 no backbone ribbon
« on: July 31, 2018, 12:54:39 pm »

Pages: [1] 2 3 ... 82

Created and maintained by Dr. Xiang-Jun Lu [律祥俊], Principal Investigator of the NIH grant R01GM096889
Dr. Lu is currently affiliated with the Bussemaker Laboratory at the Department of Biological Sciences, Columbia University.