Mainly prompted by
questions from Pascal (
who has contributed the most posts among 3DNA users), here is a further note on DSSR.
It [DSSR] looks like a combined version of find_pair and analyze. Is that correct ?
Of course it seems not possible to (re)construct NA structures with DSSR.
Yes, to certain extent, you can think DSSR as a combination of
find_pair and
analyze. The post "
DSSR, what's it and why bother?" provides more background information. You are right, DSSR does not construct nucleic acid structures.
DSSR represents my (
opinionated) view of what a program for the structural analysis of nucleic acids (RNA in particular) should/could be, based on my extensive experience in supporting 3DNA, an increased knowledge in RNA structures and refined skills in C programming.
So first, why calling it DSSR and not DSSNA since it works also for DNA ?
I think that one should avoid the RNA domination, it is possible to learn from both structures.
thus, does DSSR really work for DNA ?
Again, read carefully the post "
DSSR, what's it and why bother?" for my rationale. You may also notice that I put the word
secondary in parenthesis in the title of the software, "DSSR: Software for Defining the (Secondary) Structures of RNA". DSSR surely works for DNA, or DNA-protein complexes in the same way as it does for RNA. As mentioned in the release note, I tested DSSR against every nucleic-acid-containing structure in the PDB. Overall, the acronym DSSR captures the essential message I'd like to get across, it is short, and it parallels the well-respected DSSP program for proteins (among other things).
Then, as for formats,
I think that as I mentioned it somewhere earlier, and since I am processing the output files
for a large number of structures, I appreciate when there are spacesbetween fields (see).
base_id alpha beta gamma delta epsilon zeta e-z chi phase-angle sugar-type Zp Dp
1 A.C2649 --- 167.1 47.6 84.1 -146.6 -77.1 -69(BI) -160.5(anti) 12.9(C3'-endo) ~C3'-endo 4.41 4.66
2 A.U2650 -64.2 164.2 60.3 79.8 -154.5 -73.1 -81(BI) -167.2(anti) 21.3(C3'-endo) ~C3'-endo 4.40 4.55
I see your point, but the purpose of the output file is mainly for visual examination by a non-expert user. The message appears to be succinct. Your parser should be flexible enough to handle the case. Also see
my reply to your initial thread.
and is there a need for writing twice the sugar pucker in this file ?
From my experience, the phase angle and pucker classification are the most useful information for the sugar moiety. I repeated the sugar pucker together with commonly used backbone parameters for
convenience; one can now easily see the backbone conformation at a glance.
you name this file torsion although there are sugar puckers in it.
Thus it might be called torsion_puckers.dat or something else.
I see your point, but the file also contains Zp and Dp, and pseudo torsion angles. I'd keep the name as is; it is just a convention to get used to.
For the non-pairing interactions that is just a great feature,
you had before two values for base overlap
one calculated by just using ring atoms the other by using all base atoms.
you could add this.
DSSR checks base-stacking interaction using all base atoms, and so is the output value of base-overlap-area. I will consider to add overlap areas based on just ring atoms.
Why adding the name of the chemical groups (hydroxyl, amino, imino, ...)
again this complicates reading since some groups are named and others not like OP2 and so on.
I would appreciate another presentation here.
I added the names of chemical groups (hydroxyl/amino/imino) for the convenience of those who are not that familiar with the chemistry of H-bond. I've first-hand experience with such people (mostly physics/mathematics/computer science turned bioinformaticians). I can add an option to turn the chemical group off; but honestly, I really think you should revised your parser to handle it properly.
Take the following case as an example:
H-bonds[2]: "N3(imino)-N1[2.81]; O4(carbonyl)-N6(amino)[3.13]"
if your parser can extract the distance and the PDB atom names, it won't be that far to check for
() and get rid of the name of the chemical groups.
I haven't really checked, but are your base pair numbering scheme coherent with the one
you use in find_pair ? It would be really nice to be the case.
What do you mean by "base pair numbering scheme"? The serial numbers should not matter; the base pair is specified by the two constituent nucleotides (chain id, residue name and number, etc).
Also, I wanted to ask you that but know it seems to be done. You add various names
to each base pair. Thats great. Just a hint to the various nomenclatures (Leontis-Westhof, Saenger...)
would be helpful in the *.out files.
Advice taken
-- I will add a note in DSSR-beta-r10 (coming soon).
is there a configuration file that would allow to precise hydrogen bond and other parameters like in 3DNA.
I would really appreciate that.
To make DSSR self-contained, I've eliminated the configuration file. Overall, DSSR has refined algorithms for finding H-bonds, base pairs, helices etc, and the defaults should work for the vast majority of cases. So regular users could take DSSR as a black box, and they can check the results based on their domain knowledge and application needs.
DSSR also accepts command-line options to alter the default behavior. For example, you can use
--hbond_d2=3.6 to set up the upper limit of H-bond length to 3.6 instead of the default 4.0 Å. I am working on a manuscript that describes details of the software.
HTH,
Xiang-Jun