Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: Minimal required cif keys (cif files created with biopython not working)  (Read 33103 times)

Offline Bernhard10

  • non-commercial
  • with-posts
  • *
  • Posts: 4
    • View Profile
Hallo,

I noticed that the cif files created with biopython are not supported by DSSR.
Biopython `Bio.PDB.MMCIFIO' writes rather minimal cif files which ONLY contain the following loop:

Quote
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.auth_seq_id
_atom_site.auth_asym_id
_atom_site.pdbx_PDB_model_num
ATOM   1   O  'O5'' . C   A ? 1 ? 23.308 21.309 18.480 1.0 17.2  1  A ' '

When I analyze these biopython-generated cif files with DSSR, I get the following error: "no nucleotides found."

Could you tell me which dictionary keys of the cif format are essential for DSSR, so I can decide if I can extend biopython's cif writer to make it compatible with dssr.

(Attached is an example file. This is 333D.cif read with biopython and written to a new file. Tested with cif version v1.7.1-2017nov01)
« Last Edit: September 05, 2018, 10:30:14 am by Bernhard10 »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Minimal required cif keys (cif files created with biopython not working)
« Reply #1 on: September 05, 2018, 11:18:05 am »
Hi,

Thanks for using DSSR and for posting your questions on the Forum.

For parsing mmCIF, DSSR requires a minimal set of required keys. In principle, the Biopython output you have should suffice. Your attached example mmCIF file, however, cannot be read by Jmol or PyMOL. With Jmol 14.27.1 (2017-12-11 09:38), the error message is: "Error reading file at end of file -1". Loading into PyMOL open-source version 1.8.7.0, I cannot see any atoms showing up at all.

Are you sure your example file is valid in mmCIF format? Please verify.

Best regards,

Xiang-Jun

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Minimal required cif keys (cif files created with biopython not working)
« Reply #2 on: September 05, 2018, 11:39:14 am »
I've performed a bit more investigation of your attached 333D_biopython.cif file. My findings are as follows:

In 333D_biopython.cif, the header together with the first atom reads:

loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.auth_seq_id
_atom_site.auth_asym_id
_atom_site.pdbx_PDB_model_num
ATOM   1   O  'O5'' . C   A ? 1 ? 23.308 21.309 18.480 1.0 17.2  1  A ' '
### the above should be changed to the following:
ATOM   1   O  "O5'" . C   A ? 1 ? 23.308 21.309 18.480 1.0 17.2  1  A 1
......


The sugar atom name 'O5'' looks weird: it should be replaced with "O5'". The last item is pdbx_PDB_model_num, an integer. However, the ATOM record gives ' ' . The space character should be replaced with a number (e.g., 1). The revised version is shown above in bold green.

After these two fixes for all the ATOM/HETATM records, DSSR works as expected. I've attached the revised mmCIF file for your reference.

Best regards,

Xiang-Jum

Offline Bernhard10

  • non-commercial
  • with-posts
  • *
  • Posts: 4
    • View Profile
Re: Minimal required cif keys (cif files created with biopython not working)
« Reply #3 on: September 06, 2018, 06:04:38 am »
Thanks for the help.

It seems there is a bug in biopython. I have submitted a bugreport and I'm testing the fix: https://github.com/biopython/biopython/issues/1784

Offline Bernhard10

  • non-commercial
  • with-posts
  • *
  • Posts: 4
    • View Profile
Re: Minimal required cif keys (cif files created with biopython not working)
« Reply #4 on: September 06, 2018, 07:50:15 am »
It seems that 'O5'' is actually correct.

The error was caused by using a non-integer model number.

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: Minimal required cif keys (cif files created with biopython not working)
« Reply #5 on: September 06, 2018, 09:54:01 am »
Thanks for your followups.

It is always a good idea to adhere to a standard format, as much as possible. That'd make downstream analysis simple.

I've revised the mmCIF parser in the DSSR v1.7.9-2018sep06 release. It is more tolerant of input mmCIF files than previous versions. As a result, DSSR now works with the Biopython-produced 333D_biopython.cif file you originally attached, without any modifications.

Please download DSSR v1.7.9-2018sep06. Have a try and report back how it works.

Best regards,

Xiang-Jun

 

Funded by X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids (R24GM153869)

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University