Minimal required cif keys (cif files created with biopython not working)

Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Questions and answers > RNA structures (DSSR)

(1/2) > >>

Bernhard10:
Hallo,

I noticed that the cif files created with biopython are not supported by DSSR.
Biopython `Bio.PDB.MMCIFIO' writes rather minimal cif files which ONLY contain the following loop:

--- Quote ---loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.auth_seq_id
_atom_site.auth_asym_id
_atom_site.pdbx_PDB_model_num
ATOM 1 O 'O5'' . C A ? 1 ? 23.308 21.309 18.480 1.0 17.2 1 A ' '

--- End quote ---

When I analyze these biopython-generated cif files with DSSR, I get the following error: "no nucleotides found."

Could you tell me which dictionary keys of the cif format are essential for DSSR, so I can decide if I can extend biopython's cif writer to make it compatible with dssr.

(Attached is an example file. This is 333D.cif read with biopython and written to a new file. Tested with cif version v1.7.1-2017nov01)

xiangjun:
Hi,

Thanks for using DSSR and for posting your questions on the Forum.

For parsing mmCIF, DSSR requires a minimal set of required keys. In principle, the Biopython output you have should suffice. Your attached example mmCIF file, however, cannot be read by Jmol or PyMOL. With Jmol 14.27.1 (2017-12-11 09:38), the error message is: "Error reading file at end of file -1". Loading into PyMOL open-source version 1.8.7.0, I cannot see any atoms showing up at all.

Are you sure your example file is valid in mmCIF format? Please verify.

Best regards,

Xiang-Jun

xiangjun:
I've performed a bit more investigation of your attached 333D_biopython.cif file. My findings are as follows:

In 333D_biopython.cif, the header together with the first atom reads:

loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.auth_seq_id
_atom_site.auth_asym_id
_atom_site.pdbx_PDB_model_num
ATOM 1 O 'O5'' . C A ? 1 ? 23.308 21.309 18.480 1.0 17.2 1 A ' '
### the above should be changed to the following:
ATOM 1 O "O5'" . C A ? 1 ? 23.308 21.309 18.480 1.0 17.2 1 A 1
......

The sugar atom name 'O5'' looks weird: it should be replaced with "O5'". The last item is pdbx_PDB_model_num, an integer. However, the ATOM record gives ' ' . The space character should be replaced with a number (e.g., 1). The revised version is shown above in bold green.

After these two fixes for all the ATOM/HETATM records, DSSR works as expected. I've attached the revised mmCIF file for your reference.

Best regards,

Xiang-Jum

Bernhard10:
Thanks for the help.

It seems there is a bug in biopython. I have submitted a bugreport and I'm testing the fix: https://github.com/biopython/biopython/issues/1784

Bernhard10:
It seems that 'O5'' is actually correct.

The error was caused by using a non-integer model number.

Navigation

[0] Message Index

[#] Next page

Created and maintained by Dr. Xiang-Jun Lu [律祥俊] (xiangjun@x3dna.org)
The Bussemaker Laboratory at the Department of Biological Sciences, Columbia University.

Go to full version