Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: PDBML output in 3DNA  (Read 53081 times)

Offline sdixit

  • with-posts
  • *
  • Posts: 5
    • View Profile
PDBML output in 3DNA
« on: October 04, 2006, 09:59:12 am »
Hello, Xiang-Jun,
As you are aware, I use your 3DNA program to construct DNA models based on MD data at my website
        http://humphry.chem.wesleyan.edu:8080/M ... Pages.html
       
When I construct some interesting but very long DNA structure (~15 KB), although 3DNA gives me a pdb format or alchemy format output file there is usually a coordinate overflow. This happens because of the pdb format lmitation of F8.3 (or F9.4? in alchemy) on the coordinate values in these files.
       
One alternative that might overcome this problem is if we can output the file in PDBML format instead of PDB.
       
Is there anyway we could work this out ? I will be glad to   incorporate this extension in 3DNA.
       
Thanks.
Surjit

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
(No subject)
« Reply #1 on: October 04, 2006, 11:04:57 pm »
Hi Surjit,

Thanks for using 3DNA in your MD web server.

I am aware of the PDB f8.3 coordinate limitation. In the current version of 3DNA the xyz coordinates are reset if they are possibly out of range. However, as you see, the problem can't be solved within PDB format.

In the coming new release of 3DNA v1.6, I've added a new command line option -w for wide output in pseudo-PDB file, as follows:
Code: [Select]
"%6s%8ld %4s %3s %c %6ld %15.5f%15.5f%15.5f
This is a simplified solution which requires little parsing work for those who  need large structures. Of course, I will also consider a more standard, general approach, e.g., adopting PDBML as an output option, if feasible. Could you please provide me with more details on it? What's the minimum that needs to be done to convert the PDB to PDBML? Let's work this out before the new 3DNA release.

3DNA is only available in binary form, per Rutgers University policy.

Best regards,

Xiang-Jun

Offline sdixit

  • with-posts
  • *
  • Posts: 5
    • View Profile
(No subject)
« Reply #2 on: October 06, 2006, 01:25:05 pm »
Hi Xiang-Jun
Using the pseudo-PDb format sounds like one simple option. May be you could just use %15.3f for the floats, not sure if that extra precision in %15.5f would help. But the general problem with this format would be that nost of the structure viewing programs are not going to read it.

The central portion of the PDBML format that you would be interested in is like the following for each atom.
   
      <PDBx:atom_site id="1">
         <PDBx:group_PDB>ATOM</PDBx:group_PDB>
         <PDBx:type_symbol>N</PDBx:type_symbol>
         <PDBx:label_atom_id>N</PDBx:label_atom_id>
         <PDBx:label_alt_id xsi:nil="true" />
         <PDBx:label_comp_id>GLY</PDBx:label_comp_id>
         <PDBx:label_asym_id>A</PDBx:label_asym_id>
         <PDBx:label_entity_id>1</PDBx:label_entity_id>
         <PDBx:label_seq_id>1</PDBx:label_seq_id>
         <PDBx:Cartn_x>13.603</PDBx:Cartn_x>
         <PDBx:Cartn_y>47.057</PDBx:Cartn_y>
         <PDBx:Cartn_z>32.218</PDBx:Cartn_z>
         <PDBx:occupancy>1.00</PDBx:occupancy>
         <PDBx:B_iso_or_equiv>24.40</PDBx:B_iso_or_equiv>
         <PDBx:auth_seq_id>1</PDBx:auth_seq_id>
         <PDBx:auth_comp_id>GLY</PDBx:auth_comp_id>
         <PDBx:auth_asym_id>A</PDBx:auth_asym_id>
         <PDBx:auth_atom_id>N</PDBx:auth_atom_id>
         <PDBx:pdbx_PDB_model_num>1</PDBx:pdbx_PDB_model_num>
      </PDBx:atom_site>

Essentially it is the familiar PDB line with each atom property between those tags. Each atom is an atom_site and all the atoms are present between
<PDBx:atom_siteCategory> and </PDBx:atom_siteCategory>
There is a lot of header details in the XML schema that you would not be addressing.
As you see there would be not format restrictions for the coordinate values.
The complete schema is available at
http://pdbml.rcsb.org/schema/pdbx.xsd
You can download any pdb file from RCSB in XML (PDBML) format  for more examples. I am sure with time all the graphics programs would start reading this format. At present there is atleast one visualization program that I am aware of (jV released by PDB Japan) that is reading these PDBML files.
Thanks.
Surjit

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
(No subject)
« Reply #3 on: October 06, 2006, 10:45:36 pm »
Dear Surjit,

Thanks for your feedback. PDBML is certainly more general and standard, and is the way to go. I will have a look of this matter in more detail, and make changes in 3DNA accordingly. The -wide option for PDB structure output will be changed to -xml. Hopefully, I could find time to get this done by next week.

Best regards,

Xiang-Jun

Offline sdixit

  • with-posts
  • *
  • Posts: 5
    • View Profile
(No subject)
« Reply #4 on: October 09, 2006, 10:44:28 am »
Xiang-Jun,
That would be very valuable. Thanks a lot.
Surjit

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
(No subject)
« Reply #5 on: October 10, 2006, 12:13:20 am »
Hi Surjit,

Hopefully you are still checking back this forum earlier than until the end of the week when I had hoped to get back to you...

Now I have the first version of -xml option ready, currently only with the "fiber" utility program. Here are two sample files: fb_atcg.xml for a fiber B-DNA model of sequence ATCG, using the default PDBML format; and fb_atcg_simplified.xml for the simplified representation of the ATOM records. Please check to see if they are valid since I do not have a software to display PDBML files. Of course, please let me know if anything you feel that can be improved. I will then add this option to "rebuild" which is the program you are currently using in your DNA MD server.

Best regards,

Xiang-Jun

Offline sdixit

  • with-posts
  • *
  • Posts: 5
    • View Profile
pdbml
« Reply #6 on: October 23, 2006, 10:14:14 am »
Hi, Xiang-Jun
That is great. I have checked the files with jV and they are read without any problem.
Thank you so much.
Surjit

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
(No subject)
« Reply #7 on: October 23, 2006, 11:38:51 pm »
Hi Surjit,

Thanks for your help. I have added the -xml option to 'rebuild' which is used to build arbitrary nucleic acid structures based on users' input file. Do you know how to add CONECT records in PDBML? I can't find the schema for it. The program 'rebuild' also gives complete linkage info in PDB CONECT records for structures with backbone. This is to avoid erroneous connections for distorted structures in some visualization programs like RasMol.

Adding -xml option for PDBML output is a good example of how valuable users like you get involved in 3DNA's further improvements/development. It is also the reason that whenever possible, I've been trying hard to get back to users' question as quick and concrete as possible.

Best regards,

Xiang-Jun

Offline sdixit

  • with-posts
  • *
  • Posts: 5
    • View Profile
(No subject)
« Reply #8 on: November 06, 2006, 02:34:31 pm »
Hi, Xiang-Jun
Here is my understanding of this issue. The original PDb format did not require the specification of CONECT record for the standard amino acids and nucleic acids. The CONECT was mainly required for the non-standard residues and HETATMS.
PDBML does not have any CONECT records. Instead it would be using a Chemical Component Dictionary which records all the non standard residues (and small molecules) available in the PDB.  This dictionary contains the connect information. Description of this dictionary is available at
http://deposit.pdb.org/cc_dict_tut.html
So as I understand, based on the residue name in the PDBML file (or mmCIF), the reader would cross reference the Chemical Component Dictionary to obtain the connectivity information available in this dictionary. So the CONECT record will not be present in the pdb file in either the xml  or mmCIF format.
Thanks.
Surjit

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
(No subject)
« Reply #9 on: November 13, 2006, 10:27:15 pm »
Hi Surjit,

Thanks for clarifying the CONECT record issue in PDBML. I am aware of the PDB HET Group Dictionary, and that is not the issue here. 3DNA generated fiber models and arbitrary strcutures mostly contains only standard residues.

The geometry that links residue i and i+1 for the "rebuild" structures may not be within standard range, thus RasMol, for examle, will generate both extraneous bonds, and missing bonds. Here is an arbitrary example just to illustrate my point:

Code: [Select]
[1] to generate the input file in tst.inp
regular_dna tst.inp

Six base-pair parameters (Dft: 0s) in the order of:
Shear  Stretch  Stagger  Buckle  Propeller  Opening
0 0 0 34 45 0

Six step parameters in the order of:
Shift  Slide  Rise  Tilt  Roll  Twist
0 0 3.4 0 25 30

Input your base sequence with only A,C,G & T:
1. From a data file (complete sequence)
2. From keyboard (enter only the repeating sequence)
Your choice (1 or 2, Dft: 2):

Repeating unit (Dft: A):
Repeating unit: A
Number of repeats (Dft: 10):

[2] to generate the PDB file with standard B-DNA backbone conformation
cp_std BDNA
rebuild -atomic tst.inp tst.pdb

[3] display in RasMol, with correct connections:
rasmol tst.pdb

[4] manually delete the CONECT records in the above tst.pdb file (tst_nocnt.pdb), and use RasMol to display it again to see what you get.


Here are the links to the three files: tst.inp, tst.pdb, tst_nocnt.pdb for you to check against/play with.

In such cases, 3DNA makes efforts to generate the correct linkages both within residue and between residues. With full CONECT info, RasMol will display properly, otherwise, its interally generated linkages may not be desirable. I am not sure how such initial distorted structure would effect MM/MD calculations, though. This is one of the examples of the details that have been taken into considerations in 3DNA.

Anyway, I have also modified "rebuild" to output PDBXML, and I am testing the code, and the new 3DNA homepage for a new release after 4 years of v1.5 ... I will keep you informed.

Best regards,

Xiang-Jun

PS. This thread is the most-visited in the 3DNA forum so far ... thanks for your involvement!

 

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University