Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Topics - chemikeris

Pages: [1]
1
I try to use DSSR for large-scale analysis of nucleic acid structures which I take from Biological Assemblies from the Protein Data Bank. For this I use the option '--symmetry' which I find very useful.

Unfortunately, I noticed an inconsistency in naming of the chains in DSSR JSON output which creates some troubles when parsing the DSSR results.

When chains in the Biological Assembly file come from different asymmetric units of the crystal structure, their names usually include the MODEL number and chain name from PDB file. Using PDB entry 4ZSF, we see two chains named 'B', one is from MODEL 1, another from MODEL 2:

Code: [Select]
[justas@catfish tmp]$ x3dna-dssr -i=4zsf.pdb1 --symm --json | jq .chains
{
  "m1_chain_B": {
    "num_nts": 14,
    "bseq": "CTCGACCGGTCGAG",
    "sstr": "((((((((((((((",
    "form": "ABBBB...BBBB.-",
    "helical_rise": 3.489,
    "helical_rise_std": 0.789,
    "helical_axis": [
      0.828,
      0.076,
      0.555
    ],
    "point1": [
      -18.24,
      -31.778,
      5.458
    ],
    "point2": [
      19.36,
      -28.329,
      30.661
    ],
    "num_chars": 40,
    "suite": "C1bT!!C4bG!!A!!C1bC!!G4bG!!T!!C!!G!!A!!G"
  },
  "m2_chain_B": {
    "num_nts": 14,
    "bseq": "CTCGACCGGTCGAG",
    "sstr": "))))))))))))))",
    "form": "ABBBB...BBBB.-",
    "helical_rise": 3.489,
    "helical_rise_std": 0.789,
    "helical_axis": [
      -0.828,
      0.076,
      -0.555
    ],
    "point1": [
      18.24,
      -31.778,
      31.283
    ],
    "point2": [
      -19.36,
      -28.329,
      6.08
    ],
    "num_chars": 40,
    "suite": "C1bT!!C4bG!!A!!C1bC!!G4bG!!T!!C!!G!!A!!G"
  }
}

However, in the cases when there are chains from two assymetric units (MODEL 1 and MODEL 2 in input file), but their names are different, we see the no model numbers in chains section of the output.
For example, in PDB entry 4ILM Biological Assembly 2, we see only chains E and I:

Code: [Select]
[justas@catfish tmp]$ x3dna-dssr -i=4ilm.pdb2 --symm --json | jq .chains
{
  "chain_E": {
    "num_nts": 16,
    "bseq": "GCUAAUCUACUAUAGA",
    "sstr": "......((.....)).",
    "form": "A.....A......AA-",
    "helical_rise": 0.115,
    "helical_rise_std": 3.392,
    "helical_axis": [
      -0.734,
      -0.496,
      -0.464
    ],
    "point1": [
      64.548,
      -24.513,
      89.342
    ],
    "point2": [
      62.86,
      -25.653,
      88.275
    ],
    "num_chars": 46,
    "suite": "G!!C!!U!!A!!A4bU4nC1aU!!A!!C4pU2[A6pU!!A1aG1aA"
  },
  "chain_I": {
    "num_nts": 16,
    "bseq": "GCUAAUCUACUAUAGA",
    "sstr": "......((.....)).",
    "form": "A....BA......A.-",
    "helical_rise": 0.305,
    "helical_rise_std": 3.547,
    "helical_axis": [
      0.584,
      0.616,
      0.528
    ],
    "point1": [
      79.844,
      -52.783,
      70.422
    ],
    "point2": [
      83.132,
      -49.316,
      73.395
    ],
    "num_chars": 46,
    "suite": "G!!C!!U!!A!!A!!U4nC1aU!!A!!C4pU2[A6pU2aA1aG!!A"
  }
}


When analyzing the results in more detail (pairs, helices, multiplets, etc.), we see that chain E comes from MODEL 1 in the Biological Assembly file, and chain I is from MODEL 2:

Code: [Select]
[justas@catfish tmp]$ x3dna-dssr -i=4ilm.pdb2 --symm --json | jq .pairs
[
  {
    "index": 1,
    "nt1": "1:E.C7",
    "nt2": "1:E.G15",
    "bp": "C-G",
    "name": "WC",
    "Saenger": "19-XIX",
    "LW": "cWW",
    "DSSR": "cW-W"
  },
  {
    "index": 2,
    "nt1": "1:E.U8",
    "nt2": "1:E.A14",
    "bp": "U-A",
    "name": "WC",
    "Saenger": "20-XX",
    "LW": "cWW",
    "DSSR": "cW-W"
  },
  {
    "index": 3,
    "nt1": "2:I.U6",
    "nt2": "2:I.A16",
    "bp": "U+A",
    "name": "--",
    "Saenger": "n/a",
    "LW": "cWH",
    "DSSR": "cW+M"
  },
  {
    "index": 4,
    "nt1": "2:I.C7",
    "nt2": "2:I.G15",
    "bp": "C-G",
    "name": "WC",
    "Saenger": "19-XIX",
    "LW": "cWW",
    "DSSR": "cW-W"
  },
  {
    "index": 5,
    "nt1": "2:I.U8",
    "nt2": "2:I.A14",
    "bp": "U-A",
    "name": "WC",
    "Saenger": "20-XX",
    "LW": "cWW",
    "DSSR": "cW-W"
  }
]

This inconsistency causes troubles when parsing multiple DSSR output files generated for the PDB Biological Assemblies. I wonder, if the model number for the PDB chain could be included everywhere in the DSSR output, when '--symmetry' option is used?

Thank you very much in advance for your feedback.

Pages: [1]

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University