Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Questions and answers > RNA structures (DSSR)

discarded nucleotides in json output

(1/1)

ICdB:
Hi,

First, many thanks for your great software suite. It is a real pleasure to work with such complete and convenient outputs for my parsing purposes. I am a strong believer in collaboration in methods development trough connectible building blocs, and it is nice to have others with such view in the RNA-modeling field  :)

I am building specific structural fragment libraries for RNA docking, and I try to parse dssr json outputs to automatically keep track of my fragments characteristics (2D structure, interactions ...). For this, I need to now which nucleotides were discarded because of e.g. weird geometry. As I see it, a "&" is inserted in the "bseq" entry for chain breaks. But I can't find any indication of if a nucleotide was discarded or the break was already present in the input pdb, and retrieving it by comparing bseq to the input sequence can be ambiguous.
I am considering using the "Summary of structural features of xx nucleotides" in the text output, by running dssr w. and wo. the "--json" option. Is there a dssp option I could use to get the info directly from the json output and avoid double work (as I am analysing thousands of pdb files)?

Thanks in advance for your help,
Isaure C. de Beauchene

PS: I'm using version v1.7.2-2017nov20

xiangjun:
Hi,

Thanks for a thoughtful post.


--- Quote ---I am a strong believer in collaboration in methods development trough connectible building blocs, and it is nice to have others with such view in the RNA-modeling field  :)
--- End quote ---

I cannot agree with you more. I am glad to know of another one of like mind across the Atlantic.


--- Quote ---As I see it, a "&" is inserted in the "bseq" entry for chain breaks. But I can't find any indication of if a nucleotide was discarded or the break was already present in the input pdb, and retrieving it by comparing bseq to the input sequence can be ambiguous.
--- End quote ---

Yes, the meaning of the symbol "&" in "bseq" with the DBN output is ambiguous. It could be due to: (1) switch of chains for "all_chains" as in a DNA duplex (e.g., 355d), (2) missing atomic coordinates of nucleotides within a DNA/RNA chain, as in some X-ray crystal structures due to local disorder (e.g., 2fk6), (3) abasic sites before DSSR 1.7.3-2017dec26 (which were not considered by default), (4) highly distorted bases, as from some MD simulations, that are out of the default cutoff.

Since you mention you're using v1.7.2-2017nov20, please update to the latest DSSR v1.7.4-2018jan30 that would account for case (3) above.


--- Quote ---I am considering using the "Summary of structural features of xx nucleotides" in the text output, by running dssr w. and wo. the "--json" option. Is there a dssp option I could use to get the info directly from the json output and avoid double work (as I am analysing thousands of pdb files)?
--- End quote ---

The output of "Summary of structural features of xx nucleotides" matches all the detected nucleotides. This information (plus more) is also available from JSON output, as shown below:

--- Code: ---x3dna-dssr -i=1ehz.pdb --json | jq .nts
--- End code ---

HTH,

Xiang-Jun

ICdB:
Hi Xiang-Jun,

Thanks a lot for your detailed reply, and for reminding me of the "nts" entry of the json output. That was very useful.

Best,
Isaure

Navigation

[0] Message Index

Created and maintained by Dr. Xiang-Jun Lu [律祥俊] (xiangjun@x3dna.org)
The Bussemaker Laboratory at the Department of Biological Sciences, Columbia University.

Go to full version