Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: cif file compatibility?  (Read 45835 times)

Offline mauricio esguerra

  • with-posts
  • *
  • Posts: 48
    • View Profile
    • http://mesguerra.org
cif file compatibility?
« on: May 07, 2015, 10:11:05 am »
Hi Xiang-Jun,

I'm wondering if you have any tips or tricks as to what to do now that the cif format is the official pdb format and giving a lot of us a hard time on adjusting our analysis protocols.
I wanted to use get_part to easily split protein apart from rna in the latest structure of the mitochondrial ribosome from Venkatakrishnan's lab. and I'm having a hard time finding an efficient way to do it.

Any recommendations?

Thank you,

Mauricio

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: cif file compatibility?
« Reply #1 on: May 07, 2015, 10:47:46 am »
Hi Mauricio,

Nice to hear from you! I feel your pain on dealing with PDBx/mmCIF files. For 3DNA v2.2, I did try but decided not to add a parser for PDBx/mmCIF -- too many changes in the v2.x codebase. In my current thinking, v2.2 is the last release of the 3DNA version 2 series. I will keep maintaining v2.2 and fix bugs as reported. An up-to-date User Manual is coming soon, but no more new features.

To give you another incentive to use DSSR more, I can add a new option that extracts RNA from an input PDB or PDBx/mmCIF file and outputs a PDB file for 3DNA v2.2. Will that make sense to you?

Best regards,

Xiang-Jun

Offline mauricio esguerra

  • with-posts
  • *
  • Posts: 48
    • View Profile
    • http://mesguerra.org
Re: cif file compatibility?
« Reply #2 on: May 07, 2015, 11:36:58 am »
Hi Xiang-Jun,

That would be great!

I have tried using openbabel, CIFTr (cif translator), cif2pdb, and have tried reading the cif format using mdtraj in the hopes of being able to translate it to pdb, but all fails for now.

Something very simple such as aligning the bacterial mitocondrial ribosome (4v63.cif) and the human one (3j9m), becomes very difficult.

Once more, having this additional feature in dssr would be great!

Thank you,

Mauricio




Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: cif file compatibility?
« Reply #3 on: May 07, 2015, 11:43:51 am »
Quote
I have tried using open babel, CIFTr (cif translator), cif2pdb, and have tried reading the cif format using mdtraj in the hopes of being able to translate it to pdb, but all fails for now.
That's tough to hear, but quite real, isn't it?

Quote
Something very simple such as aligning the bacterial mitocondrial ribosome (4v63.cif) and the human one (3j9m), becomes very difficult.
Simple things should not become that difficult. It should be the other way around! When the tools that should serve have become a master?

Quote
Once more, having this additional feature in dssr would be great!
Will get it done by tomorrow, and report back along this thread.

Best regards,

Xiang-Jun

« Last Edit: May 07, 2015, 11:46:40 am by xiangjun »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: cif file compatibility?
« Reply #4 on: May 08, 2015, 02:25:36 pm »
Hi Mauricio,

As promised yesterday, I have added a new (currently undocumented) option --select to DSSR that extracts nucleotides in an input PDB or PDBx/mmCIF file. The output file is in PDB format, by default. There are quite a few variations built into the new --select option. I will elaborate after your verification that it works in your case.

Please download DSSR again from the 3DNA Forum (right now, the download site still says v1.2.6-2015mar28, and it includes updates only for Linux and Mac OS X, not the Windows version). Please report back on how it goes!

Xiang-Jun

Offline mauricio esguerra

  • with-posts
  • *
  • Posts: 48
    • View Profile
    • http://mesguerra.org
Re: cif file compatibility?
« Reply #5 on: May 11, 2015, 05:25:11 am »
Hi Xiang-Jun,

I have tried using:

Code: [Select]
dssr --select=nt -i=3j9m.cif -o=3j9m_rna.pdb
For the recent structure of the full human mitochondrial ribosome.
Even though it's quite a large structure it finishes in 31 seconds.

It almost gets it right but it clumps parts of the residues into a column, or at least that is what pymol shows (see attached image at the end).

It would also be useful to get the protein part, sort of like with get_part -p instead of -n.

Thanks,

Mauricio

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: cif file compatibility?
« Reply #6 on: May 11, 2015, 08:51:23 am »
Hi Mauricio,

Thanks for trying out and report back of the new --select option in DSSR.

Quote
For the recent structure of the full human mitochondrial ribosome.
Even though it's quite a large structure it finishes in 31 seconds.

It almost gets it right but it clumps parts of the residues into a column, or at least that is what pymol shows (see attached image at the end).
Is it due to the limitation of the PDB format for such a large structure? Please also play around with some typical small structures to see if it works as expected (it should).

Quote
It would also be useful to get the protein part, sort of like with get_part -p instead of -n.
Try --select=protein or --select=aa. Other options are --select=dna or --select=rna. The default is --select=nt which can be shortened to --select. As noted in the User Manual, UPPER or MixED cases are also accepted (e.g., --select=Protein).

I will have a look of 3j9m later when I have more time.

Best regards,

Xiang-Jun
« Last Edit: May 11, 2015, 08:53:15 am by xiangjun »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: cif file compatibility?
« Reply #7 on: May 12, 2015, 10:34:03 pm »
Hi Mauricio,

Quote
It almost gets it right but it clumps parts of the residues into a column, or at least that is what pymol shows (see attached image at the end).
I have looked into 3j9m and did not find anything obviously wrong with DSSR-extracted RNA components in PDB format. Did you check the atom ids of the residues clumped into the (right) column as shown in PyMOL? Do they really correspond to any ATOM/HETATM records in the PDB file that DSSR generated?

Note that in the 3j9m_rna.pdb file you produced, the "Atom serial number" in columns 7-11 is a based on CIF "_atom_site.id", and is not necessarily continuous sequentially. You could write a short script to make the "Atom serial number" field consecutive from 1 to n. Please have a try and report back if that does the trick.

Xiang-Jun
« Last Edit: May 13, 2015, 09:59:34 am by xiangjun »

Offline mauricio esguerra

  • with-posts
  • *
  • Posts: 48
    • View Profile
    • http://mesguerra.org
Re: cif file compatibility?
« Reply #8 on: May 28, 2015, 04:47:20 am »
Hi Xiang-Jun,

Quote
Note that in the 3j9m_rna.pdb file you produced, the "Atom serial number" in columns 7-11 is a based on CIF "_atom_site.id", and is not necessarily continuous sequentially. You could write a short script to make the "Atom serial number" field consecutive from 1 to n. Please have a try and report back if that does the trick.

Yes, that solves the problem for 3j9m, but for molecules which have a higher content of atoms than 99999, for example the whole 70S ribosome of  thermus thermophilus (pdbid=4v63), which has 200836 atoms, then the same problem comes back because it finds repeated atom numbers in the 7-11 columns.

Although much harder and messy to implement I think what would be ideal for users would be to have the same functionality of get_part but producing .cif output.

Thanks,

Mauricio
« Last Edit: May 28, 2015, 04:49:05 am by mauricio esguerra »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: cif file compatibility?
« Reply #9 on: May 28, 2015, 07:29:19 am »
Quote
Yes, that solves the problem for 3j9m, but for molecules which have a higher content of atoms than 99999, for example the whole 70S ribosome of  thermus thermophilus (pdbid=4v63), which has 200836 atoms, then the same problem comes back because it finds repeated atom numbers in the 7-11 columns
I am glad that you confirmed in 3j9m the non-continous atom serial number was what had caused the visualization problem in PyMOL. As for cases with > 99999 atoms, it is yet another story -- here the PDB format is clearly no longer applicable. As I mentioned previously, 3DNA v2.x is not PDBx/mmCIF compatible.

Quote
Although much harder and messy to implement I think what would be ideal for users would be to have the same functionality of get_part but producing .cif output.
Are you asking for .cif output by the DSSR --select option? That's not a problem. But then you need to still parse the .cif files yourself, and that's not what this thread is about in my understanding.

Xiang-Jun
« Last Edit: May 28, 2015, 10:41:26 am by xiangjun »

 

Funded by X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids (R24GM153869)

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University