discarded nucleotides in json output

ICdB:
Hi,

First, many thanks for your great software suite. It is a real pleasure to work with such complete and convenient outputs for my parsing purposes. I am a strong believer in collaboration in methods development trough connectible building blocs, and it is nice to have others with such view in the RNA-modeling field :)

I am building specific structural fragment libraries for RNA docking, and I try to parse dssr json outputs to automatically keep track of my fragments characteristics (2D structure, interactions ...). For this, I need to now which nucleotides were discarded because of e.g. weird geometry. As I see it, a "&" is inserted in the "bseq" entry for chain breaks. But I can't find any indication of if a nucleotide was discarded or the break was already present in the input pdb, and retrieving it by comparing bseq to the input sequence can be ambiguous.
I am considering using the "Summary of structural features of xx nucleotides" in the text output, by running dssr w. and wo. the "--json" option. Is there a dssp option I could use to get the info directly from the json output and avoid double work (as I am analysing thousands of pdb files)?

Thanks in advance for your help,
Isaure C. de Beauchene

PS: I'm using version v1.7.2-2017nov20

xiangjun:
Hi,

Thanks for a thoughtful post.

--- Quote ---I am a strong believer in collaboration in methods development trough connectible building blocs, and it is nice to have others with such view in the RNA-modeling field :)
--- End quote ---

I cannot agree with you more. I am glad to know of another one of like mind across the Atlantic.

--- Quote ---As I see it, a "&" is inserted in the "bseq" entry for chain breaks. But I can't find any indication of if a nucleotide was discarded or the break was already present in the input pdb, and retrieving it by comparing bseq to the input sequence can be ambiguous.
--- End quote ---

Yes, the meaning of the symbol "&" in "bseq" with the DBN output is ambiguous. It could be due to: (1) switch of chains for "all_chains" as in a DNA duplex (e.g., 355d), (2) missing atomic coordinates of nucleotides within a DNA/RNA chain, as in some X-ray crystal structures due to local disorder (e.g., 2fk6), (3) abasic sites before DSSR 1.7.3-2017dec26 (which were not considered by default), (4) highly distorted bases, as from some MD simulations, that are out of the default cutoff.

Since you mention you're using v1.7.2-2017nov20, please update to the latest DSSR v1.7.4-2018jan30 that would account for case (3) above.

--- Quote ---I am considering using the "Summary of structural features of xx nucleotides" in the text output, by running dssr w. and wo. the "--json" option. Is there a dssp option I could use to get the info directly from the json output and avoid double work (as I am analysing thousands of pdb files)?
--- End quote ---

The output of "Summary of structural features of xx nucleotides" matches all the detected nucleotides. This information (plus more) is also available from JSON output, as shown below:

--- Code: ---x3dna-dssr -i=1ehz.pdb --json | jq .nts
--- End code ---

HTH,

Xiang-Jun

ICdB:
Hi Xiang-Jun,

Thanks a lot for your detailed reply, and for reminding me of the "nts" entry of the json output. That was very useful.

Best,
Isaure

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University