Author Topic: discarded nucleotides in json output (Read 67633 times)

ICdB · « **on:** March 04, 2018, 08:36:01 am »

Hi,

First, many thanks for your great software suite. It is a real pleasure to work with such complete and convenient outputs for my parsing purposes. I am a strong believer in collaboration in methods development trough connectible building blocs, and it is nice to have others with such view in the RNA-modeling field

I am building specific structural fragment libraries for RNA docking, and I try to parse dssr json outputs to automatically keep track of my fragments characteristics (2D structure, interactions ...). For this, I need to now which nucleotides were discarded because of e.g. weird geometry. As I see it, a "&" is inserted in the "bseq" entry for chain breaks. But I can't find any indication of if a nucleotide was discarded or the break was already present in the input pdb, and retrieving it by comparing bseq to the input sequence can be ambiguous.
I am considering using the "Summary of structural features of xx nucleotides" in the text output, by running dssr w. and wo. the "--json" option. Is there a dssp option I could use to get the info directly from the json output and avoid double work (as I am analysing thousands of pdb files)?

Thanks in advance for your help,
Isaure C. de Beauchene

PS: I'm using version v1.7.2-2017nov20

xiangjun · « **Reply #1 on:** March 04, 2018, 10:46:09 am »

Hi,

Thanks for a thoughtful post.

Quote

I am a strong believer in collaboration in methods development trough connectible building blocs, and it is nice to have others with such view in the RNA-modeling field

I cannot agree with you more. I am glad to know of another one of like mind across the Atlantic.

Quote

As I see it, a "&" is inserted in the "bseq" entry for chain breaks. But I can't find any indication of if a nucleotide was discarded or the break was already present in the input pdb, and retrieving it by comparing bseq to the input sequence can be ambiguous.

Yes, the meaning of the symbol "&" in "bseq" with the DBN output is ambiguous. It could be due to: (1) switch of chains for "all_chains" as in a DNA duplex (e.g., 355d), (2) missing atomic coordinates of nucleotides within a DNA/RNA chain, as in some X-ray crystal structures due to local disorder (e.g., 2fk6), (3) abasic sites before DSSR 1.7.3-2017dec26 (which were not considered by default), (4) highly distorted bases, as from some MD simulations, that are out of the default cutoff.

Since you mention you're using v1.7.2-2017nov20, please update to the latest DSSR v1.7.4-2018jan30 that would account for case (3) above.

Quote

I am considering using the "Summary of structural features of xx nucleotides" in the text output, by running dssr w. and wo. the "--json" option. Is there a dssp option I could use to get the info directly from the json output and avoid double work (as I am analysing thousands of pdb files)?

The output of "Summary of structural features of xx nucleotides" matches all the detected nucleotides. This information (plus more) is also available from JSON output, as shown below:

Code: [Select]

x3dna-dssr -i=1ehz.pdb --json | jq .nts
HTH,

Xiang-Jun

ICdB · « **Reply #2 on:** March 08, 2018, 05:23:57 pm »

Hi Xiang-Jun,

Thanks a lot for your detailed reply, and for reminding me of the "nts" entry of the json output. That was very useful.

Best,
Isaure

News:

Author Topic: discarded nucleotides in json output (Read 67633 times)

ICdB

discarded nucleotides in json output

xiangjun

Re: discarded nucleotides in json output

ICdB

Re: discarded nucleotides in json output