Hi,
Thanks for a thoughtful post.
I am a strong believer in collaboration in methods development trough connectible building blocs, and it is nice to have others with such view in the RNA-modeling field
I cannot agree with you more. I am glad to know of another one of like mind across the Atlantic.
As I see it, a "&" is inserted in the "bseq" entry for chain breaks. But I can't find any indication of if a nucleotide was discarded or the break was already present in the input pdb, and retrieving it by comparing bseq to the input sequence can be ambiguous.
Yes, the meaning of the symbol "&" in "bseq" with the DBN output is ambiguous. It could be due to: (1) switch of chains for "all_chains" as in a DNA duplex (e.g., 355d), (2) missing atomic coordinates of nucleotides within a DNA/RNA chain, as in some X-ray crystal structures due to local disorder (e.g., 2fk6), (3) abasic sites before DSSR 1.7.3-2017dec26 (which were
not considered
by default), (4) highly distorted bases, as from some MD simulations, that are out of the default cutoff.
Since you mention you're using v1.7.2-2017nov20, please update to the latest DSSR v1.7.4-2018jan30 that would account for case (3) above.
I am considering using the "Summary of structural features of xx nucleotides" in the text output, by running dssr w. and wo. the "--json" option. Is there a dssp option I could use to get the info directly from the json output and avoid double work (as I am analysing thousands of pdb files)?
The output of "Summary of structural features of xx nucleotides" matches all the detected nucleotides. This information (plus more) is also available from JSON output, as shown below:
x3dna-dssr -i=1ehz.pdb --json | jq .nts
HTH,
Xiang-Jun