Print Page - cif file compatibility?

Questions and answers => RNA structures (DSSR) => Topic started by: mauricio esguerra on May 07, 2015, 10:11:05 am

Netiquette · Download · News · Gallery · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · Video Overview · DSSR v2.6.0 (DSSR Manual) · Homepage

Title: cif file compatibility?
Post by: mauricio esguerra on May 07, 2015, 10:11:05 am

Hi Xiang-Jun,

I'm wondering if you have any tips or tricks as to what to do now that the cif format is the official pdb format and giving a lot of us a hard time on adjusting our analysis protocols.
I wanted to use get_part to easily split protein apart from rna in the latest structure of the mitochondrial ribosome from Venkatakrishnan's lab. and I'm having a hard time finding an efficient way to do it.

Any recommendations?

Thank you,

Mauricio

Title: Re: cif file compatibility?
Post by: xiangjun on May 07, 2015, 10:47:46 am

Hi Mauricio,

Nice to hear from you! I feel your pain on dealing with PDBx/mmCIF files. For 3DNA v2.2, I did try but decided not to add a parser for PDBx/mmCIF -- too many changes in the v2.x codebase. In my current thinking, v2.2 is the last release of the 3DNA version 2 series. I will keep maintaining v2.2 and fix bugs as reported. An up-to-date User Manual is coming soon, but no more new features.

To give you another incentive to use DSSR more, I can add a new option that extracts RNA from an input PDB or PDBx/mmCIF file and outputs a PDB file for 3DNA v2.2. Will that make sense to you?

Best regards,

Xiang-Jun

Title: Re: cif file compatibility?
Post by: mauricio esguerra on May 07, 2015, 11:36:58 am

Hi Xiang-Jun,

That would be great!

I have tried using openbabel, CIFTr (cif translator), cif2pdb, and have tried reading the cif format using mdtraj in the hopes of being able to translate it to pdb, but all fails for now.

Something very simple such as aligning the bacterial mitocondrial ribosome (4v63.cif) and the human one (3j9m), becomes very difficult.

Once more, having this additional feature in dssr would be great!

Thank you,

Mauricio

Title: Re: cif file compatibility?
Post by: xiangjun on May 07, 2015, 11:43:51 am

Quote

I have tried using open babel, CIFTr (cif translator), cif2pdb, and have tried reading the cif format using mdtraj in the hopes of being able to translate it to pdb, but all fails for now.

That's tough to hear, but quite real, isn't it?

Quote

Something very simple such as aligning the bacterial mitocondrial ribosome (4v63.cif) and the human one (3j9m), becomes very difficult.

Simple things should not become that difficult. It should be the other way around! When the tools that should serve have become a master?

Quote

Once more, having this additional feature in dssr would be great!

Will get it done by tomorrow, and report back along this thread.

Best regards,

Xiang-Jun

Title: Re: cif file compatibility?
Post by: xiangjun on May 08, 2015, 02:25:36 pm

Hi Mauricio,

As promised yesterday, I have added a new (currently undocumented) option --select to DSSR that extracts nucleotides in an input PDB or PDBx/mmCIF file. The output file is in PDB format, by default. There are quite a few variations built into the new --select option. I will elaborate after your verification that it works in your case.

Please download DSSR again from the 3DNA Forum (right now, the download site still says v1.2.6-2015mar28, and it includes updates only for Linux and Mac OS X, not the Windows version). Please report back on how it goes!

Xiang-Jun

Title: Re: cif file compatibility?
Post by: mauricio esguerra on May 11, 2015, 05:25:11 am

Hi Xiang-Jun,

I have tried using:

Code: [Select]

dssr --select=nt -i=3j9m.cif -o=3j9m_rna.pdb
For the recent structure of the full human mitochondrial ribosome.
Even though it's quite a large structure it finishes in 31 seconds.

It almost gets it right but it clumps parts of the residues into a column, or at least that is what pymol shows (see attached image at the end).

It would also be useful to get the protein part, sort of like with get_part -p instead of -n.

Thanks,

Mauricio

Title: Re: cif file compatibility?
Post by: xiangjun on May 11, 2015, 08:51:23 am

Hi Mauricio,

Thanks for trying out and report back of the new --select option in DSSR.

Quote

For the recent structure of the full human mitochondrial ribosome.
Even though it's quite a large structure it finishes in 31 seconds.

It almost gets it right but it clumps parts of the residues into a column, or at least that is what pymol shows (see attached image at the end).

Is it due to the limitation of the PDB format for such a large structure? Please also play around with some typical small structures to see if it works as expected (it should).

Quote

It would also be useful to get the protein part, sort of like with get_part -p instead of -n.

Try --select=protein or --select=aa. Other options are --select=dna or --select=rna. The default is --select=nt which can be shortened to --select. As noted in the User Manual, UPPER or MixED cases are also accepted (e.g., --select=Protein).

I will have a look of 3j9m later when I have more time.

Best regards,

Xiang-Jun

Title: Re: cif file compatibility?
Post by: xiangjun on May 12, 2015, 10:34:03 pm

Hi Mauricio,

Quote

It almost gets it right but it clumps parts of the residues into a column, or at least that is what pymol shows (see attached image at the end).

I have looked into 3j9m and did not find anything obviously wrong with DSSR-extracted RNA components in PDB format. Did you check the atom ids of the residues clumped into the (right) column as shown in PyMOL? Do they really correspond to any ATOM/HETATM records in the PDB file that DSSR generated?

Note that in the 3j9m_rna.pdb file you produced, the "Atom serial number" in columns 7-11 is a based on CIF "_atom_site.id", and is not necessarily continuous sequentially. You could write a short script to make the "Atom serial number" field consecutive from 1 to n. Please have a try and report back if that does the trick.

Xiang-Jun

Title: Re: cif file compatibility?
Post by: mauricio esguerra on May 28, 2015, 04:47:20 am

Hi Xiang-Jun,

Quote

Note that in the 3j9m_rna.pdb file you produced, the "Atom serial number" in columns 7-11 is a based on CIF "_atom_site.id", and is not necessarily continuous sequentially. You could write a short script to make the "Atom serial number" field consecutive from 1 to n. Please have a try and report back if that does the trick.

Yes, that solves the problem for 3j9m, but for molecules which have a higher content of atoms than 99999, for example the whole 70S ribosome of thermus thermophilus (pdbid=4v63), which has 200836 atoms, then the same problem comes back because it finds repeated atom numbers in the 7-11 columns.

Although much harder and messy to implement I think what would be ideal for users would be to have the same functionality of get_part but producing .cif output.

Thanks,

Mauricio

Title: Re: cif file compatibility?
Post by: xiangjun on May 28, 2015, 07:29:19 am

Quote

Yes, that solves the problem for 3j9m, but for molecules which have a higher content of atoms than 99999, for example the whole 70S ribosome of thermus thermophilus (pdbid=4v63), which has 200836 atoms, then the same problem comes back because it finds repeated atom numbers in the 7-11 columns

I am glad that you confirmed in 3j9m the non-continous atom serial number was what had caused the visualization problem in PyMOL. As for cases with > 99999 atoms, it is yet another story -- here the PDB format is clearly no longer applicable. As I mentioned previously, 3DNA v2.x is not PDBx/mmCIF compatible.

Quote

Although much harder and messy to implement I think what would be ideal for users would be to have the same functionality of get_part but producing .cif output.

Are you asking for .cif output by the DSSR --select option? That's not a problem. But then you need to still parse the .cif files yourself, and that's not what this thread is about in my understanding.

Xiang-Jun

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University