Netiquette · Download · News · Gallery · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL
· Video Overview · DSSR v2.5.1 (DSSR Manual) · Homepage
-
Hi again,
Great program, like all the improvements you've made so far.
Now, I'm trying to parse the output given by DSSR using Python, and so far it is quite easy. However, I'm running in to a bit of trouble when trying to parse the base identifiers.
IE. "0.C309" from 1S72, it is easy enough to split the strand from the base type/residue number, but then separating C from 309 becomes more difficult.
Separating by chars vs. integers would be okay, but some alt. residues have numbers in them which makes it more difficult.
Is there any way you would want to add another separator for base type from base number?
Ex. "0.C_309" or the like?
Thanks
-
Thanks for your kind words about DSSR.
Separating by chars vs. integers would be okay, but some alt. residues have numbers in them which makes it more difficult.
Could you provide some specific cases to make your point clearer?
Indeed, there are more complications in the nt identifier than the very simple case you mentioned. For example, model number and insertion code etc are also (need to be) considered in DSSR.
Is there any way you would want to add another separator for base type from base number?
Ex. "0.C_309" or the like?
I'd like to keep the default settings for DSSR simple/succinct, targeting more towards human apprehension than computer parsing. That said, I may consider to add an option to make the id string software friendly.
Xiang-Jun
-
Okay, a separate option would be nice, but I understand if that's not what you intend for the output.
Thanks for your kind words about DSSR.
Separating by chars vs. integers would be okay, but some alt. residues have numbers in them which makes it more difficult.
Could you provide some specific cases to make your point clearer?
Xiang-Jun
An example from PDB 1D9H, you have a modified base U31 of residue number 16 on chain A.
DSSR displays it as "B.U31/16", separating the numeral in the base type from the numeral of the residue number. If they all took that format, it would be nice for those who wish to parse the DSSR data.
-
Hi,
I am glad that you noticed this subtle point. Since the nucleotide is named U31, ending with digital numbers, it obviously would be confused with the residue number 16. That's why I decided to add a slash (/) in between. I will write a post on the details of nt id string in DSSR.
HTH,
Xiang-Jun
-
I've updated DSSR to beta-r11-on-20130603 which contains a new option --long-idstr to delineate fields of nucleotide id string. The format is:
model-number.chain-id.nucleotide-name.nt-sequence-number.insertion-code
It has five fields, and some of them (model number, insertion code) can be missing. For example, with the new option, B.U31/16 in 1d9h would become .B.U31.16..
I believe this DSSR update would fulfill your needs -- please verify and report back how it goes.
Xiang-Jun
Updated on 2013-06-18: the new format is:
model-number.seqid.chain-id.nt-name.nt-number.insertion-code
-
Works great, thanks!
Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids
Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University