Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: Parsing the base pair identifiers - separating base type from base number  (Read 34051 times)

Offline jyvdf3asdg2

  • non-commercial
  • with-posts
  • *
  • Posts: 24
    • View Profile
Hi again,

Great program, like all the improvements you've made so far.

Now, I'm trying to parse the output given by DSSR using Python, and so far it is quite easy. However, I'm running in to a bit of trouble when trying to parse the base identifiers.

IE. "0.C309" from 1S72, it is easy enough to split the strand from the base type/residue number, but then separating C from 309 becomes more difficult.

Separating by chars vs. integers would be okay, but some alt. residues have numbers in them which makes it more difficult.

Is there any way you would want to add another separator for base type from base number?

Ex. "0.C_309" or the like?

Thanks

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1646
    • View Profile
    • 3DNA homepage
Thanks for your kind words about DSSR.

Quote
Separating by chars vs. integers would be okay, but some alt. residues have numbers in them which makes it more difficult.
Could you provide some specific cases to make your point clearer?

Indeed, there are more complications in the nt identifier than the very simple case you mentioned. For example, model number and insertion code etc are also (need to be) considered in DSSR.

Quote
Is there any way you would want to add another separator for base type from base number?
Ex. "0.C_309" or the like?
I'd like to keep the default settings for DSSR simple/succinct, targeting more towards human apprehension than computer parsing. That said, I may consider to add an option to make the id string software friendly.

Xiang-Jun

Offline jyvdf3asdg2

  • non-commercial
  • with-posts
  • *
  • Posts: 24
    • View Profile
Okay, a separate option would be nice, but I understand if that's not what you intend for the output.

Thanks for your kind words about DSSR.

Quote
Separating by chars vs. integers would be okay, but some alt. residues have numbers in them which makes it more difficult.
Could you provide some specific cases to make your point clearer?

Xiang-Jun

An example from PDB 1D9H, you have a modified base U31 of residue number 16 on chain A.

DSSR displays it as "B.U31/16", separating the numeral in the base type from the numeral of the residue number. If they all took that format, it would be nice for those who wish to parse the DSSR data.

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1646
    • View Profile
    • 3DNA homepage
Hi,

I am glad that you noticed this subtle point. Since the nucleotide is named U31, ending with digital numbers, it obviously would be confused with the residue number 16. That's why I decided to add a slash (/) in between. I will write a post on the details of nt id string in DSSR.

HTH,

Xiang-Jun

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1646
    • View Profile
    • 3DNA homepage
I've updated DSSR to beta-r11-on-20130603 which contains a new option --long-idstr to delineate fields of nucleotide id string. The format is:
 
model-number.chain-id.nucleotide-name.nt-sequence-number.insertion-code
It has five fields, and some of them (model number, insertion code) can be missing. For example, with the new option, B.U31/16 in 1d9h would become .B.U31.16..

I believe this DSSR update would fulfill your needs -- please verify and report back how it goes.

Xiang-Jun


Updated on 2013-06-18: the new format is:
model-number.seqid.chain-id.nt-name.nt-number.insertion-code
« Last Edit: June 18, 2013, 06:26:19 pm by xiangjun »

Offline jyvdf3asdg2

  • non-commercial
  • with-posts
  • *
  • Posts: 24
    • View Profile
Works great, thanks!

 

Created and maintained by Dr. Xiang-Jun Lu [律祥俊] (xiangjun@x3dna.org)
The Bussemaker Laboratory at the Department of Biological Sciences, Columbia University.