Author Topic: General questions of H-bond section in DSSR (Read 79416 times)

lvelve0901 · « **on:** November 01, 2017, 11:37:03 am »

Hi Xiangjun,

Sorry I gave a long list of questions yesterday. Here, I just post a few questions in terms of the H-bond in DSSR json.

For the H-bond between protein/peptide/ligand to nucleic acid, my target structure is 1PFE, which is a DNA bound to an antibiotic, echinomycin. I downloaded the biological assembly file and used the following command:

x3dna-dssr -i=1PFE.pdb -o=1PFE.json --json --more --symm

In the "hbonds" session of the output json file, I did found the all the DNA-drug interactions. For example,

{u'index': 31, u'atom2_serNum': 212, u'residue_pair': u'nt:aa', u'distance': 3.09, u'atom_pair': u'N:N', u'atom2_id': u'N@2:B.ALA6', u'donAcc_type': u'standard', u'atom1_id': u'N3@2:A.DG3', u'atom1_serNum': 69}

However, I have a few questions in terms of the hbonds output.

(1) How do I know which atom is H-bond donor and which is acceptor, like do you always put acceptor in the first place(atom1)?
(2) If the 'donAcc_type' is questionable, what does it mean? Does it mean that DSSR probably doesn't guess the valence properly?
(3) Wha does the 'serNum' mean here?

Here, I attached all my files.

Thank you.

Best,
Honglue

xiangjun · « **Reply #1 on:** November 02, 2017, 11:41:55 am »

Quote

Sorry I gave a long list of questions yesterday. Here, I just post a few questions in terms of the H-bond in DSSR json.

Related questions are always welcome on the Forum. For ease of communication, just remember to keep each thread focused on a single topic, as you did here.

Quote

{u'index': 31, u'atom2_serNum': 212, u'residue_pair': u'nt:aa', u'distance': 3.09, u'atom_pair': u'N:N', u'atom2_id': u'N@2:B.ALA6', u'donAcc_type': u'standard', u'atom1_id': u'N3@2:A.DG3', u'atom1_serNum': 69}

How did you get the above output for PDB id: 1PFE? Specifically, where does the 'u' before each tag name come from?

Using the following command, with jq (v1.5), the result seems clearer.

Code: [Select]

# x3dna-dssr -i=1pfe.pdb --symm --get-hbond --json | jq .hbonds[30]

{
  "index": 31,
  "atom1_serNum": 69,
  "atom2_serNum": 212,
  "donAcc_type": "standard",
  "distance": 3.09,
  "atom1_id": "N3@2:A.DG3",
  "atom2_id": "N@2:B.ALA6",
  "atom_pair": "N:N",
  "residue_pair": "nt:aa"
}

Quote

(1) How do I know which atom is H-bond donor and which is acceptor, like do you always put acceptor in the first place(atom1)?
(3) Wha does the 'serNum' mean here?

The list of H-bonds is ordered by atom serial numbers of the two H-bonding atoms. The atom serial number is taken from the corresponding PDB file. See the Coordinate Section, especially ATOM/HETATOM records of the documentation of the PDB format for details. The "toggle H-bonds" button in the DSSR-Jmol webpage takes advantage of this feature.

Quote

(2) If the 'donAcc_type' is questionable, what does it mean? Does it mean that DSSR probably doesn't guess the valence properly?

It simply means DSSR cannot decide this is a donor-acceptor compatible H-bond, even though it fulfills the geometric criteria. It is up to the user to decide if this H-bond is feasible.

If you provide a concrete example, I may be able to give you more details on this topic.

HTH,

Xiang-Jun

lvelve0901 · « **Reply #2 on:** November 10, 2017, 10:35:42 pm »

Hi Xiangjun,

I have follow up questions in terms of donAcc_type in H-bond.

Here I attach the json output file of PDB 3BNQ. In the H-bond section, I see there are three types of donAcc_type: standard, acceptable and questionable. You have already explained to me what questionable mean but could you please explain the difference between standard and acceptable?

Also, is there anyway to tell which atom is donor and which atom is acceptor?

Thank you.

Best,
Honglue

xiangjun · « **Reply #3 on:** November 10, 2017, 11:04:23 pm »

Could you please respond to my queries in answering your previous questions?

For your new questions, could you please post concrete examples to illustrate unambiguously what you mean? This is helpful not only for me and others to better understand you but also clarifies your own thought.

This Forum works best in a bidirectional conversation style instead of one-way Q&As.

Best regards,

Xiang-Jun

lvelve0901 · « **Reply #4 on:** November 11, 2017, 05:54:22 pm »

Hi Xiangjun,

Sorry I have been busy with other stuff in lab but I do remember your question last time.

----------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------
For your last question:

Quote from: xiangjun on November 02, 2017, 11:41:55 am

Quote
{u'index': 31, u'atom2_serNum': 212, u'residue_pair': u'nt:aa', u'distance': 3.09, u'atom_pair': u'N:N', u'atom2_id': u'N@2:B.ALA6', u'donAcc_type': u'standard', u'atom1_id': u'N3@2:A.DG3', u'atom1_serNum': 69}

How did you get the above output for PDB id: 1PFE? Specifically, where does the 'u' before each tag name come from?

Basically, I just load the json use my way (my own json parser) and print out the 'hbonds' section. In my python, when I load the json file (using import json module), the string format will be loaded as unicode. I think that's why those string will have the 'u'. I think that is just my python string encode issue. Here is more explanation of the unicode string (https://stackoverflow.com/questions/21808657/what-is-a-unicode-string). I also tried your way as you suggested (using jq) but I didn't make it work. Do I need to install jq in my computer? I installed jq from the website
https://stedolan.github.io/jq/ and put the file in my working folder then type.

x3dna-dssr -i=3bnq.pdb --symm --get-hbond --json | jq . hbonds[1]

However, it outputs

Processing file '3bnq.pdb'
jq: error: Could not open file hbonds[1]: No such file or directory

I don't know if I did the right way.

----------------------------------------------------------------------------------------------------------------------------------------------------------
----------------------------------------------------------------------------------------------------------------------------------------------------------
My new questions:

My new target structure is Mitochondrial Ribosomal Decoding Site (PDB ID: 3BNQ). I downloaded the PDB file (not biological assembly file) from RCSB. Then I try to generate the json file by typing:

x3dna-dssr -i=3bnq.pdb -o=3bnq.json --json --more --symm

I use my own json parser to look for the hydrogen bond between the RNA and the ligand PAR.

There are three examples with different don_Acc type here. All the hydrogen bonds mentioned below are labeled in the 3bnq.pse. Measure01 is the first example. Measure02 is the second example. Measure03 is the third example.

Example 1: Hbond index 117. donAcc_type acceptable.
{u'index': 117, u'atom2_serNum': 1928, u'residue_pair': u'nt:ligand', u'distance': 2.612, u'atom_pair': u'O:O', u'atom2_id': u'O41@C.PAR101', u'donAcc_type': u'acceptable', u'atom1_id': u'OP2@C.G22', u'atom1_serNum': 1426}

This is a hydrogen bond between a hydroxyl group in the ligand PAR and the OP2 atom in rG22.

Example 2: Hbond index 113. donAcc_type standard.
{u'index': 113, u'atom2_serNum': 1937, u'residue_pair': u'nt:ligand', u'distance': 2.63, u'atom_pair': u'O:N', u'atom2_id': u'N32@C.PAR101', u'donAcc_type': u'standard', u'atom1_id': u'OP2@C.C21', u'atom1_serNum': 1406}

This is a hydrogen bond between a amino group in the ligand PAR and the OP2 atom in rC21.

In both cases, it seems that the hydrogen bond geometry are very similar then why does the DSSR think they are different donAcc_type?

Example 3: Hbond index 107. donAcc_type questionable.
{u'index': 107, u'atom2_serNum': 1367, u'residue_pair': u'nt:nt', u'distance': 3.358, u'atom_pair': u'O:O', u'atom2_id': u"O4'@C.G19", u'donAcc_type': u'questionable', u'atom1_id': u"O4'@C.A17", u'atom1_serNum': 1323}

In this case, the DSSR identify a hbonds between two O4' atom, but we know that for ribose, the O4' is unlikely to be protonated. Is this the reason why DSSR think the donAcc_type is questionable?

I really appreciate your help.

Best,
Honglue

xiangjun · « **Reply #5 on:** November 12, 2017, 12:46:15 pm »

Hi Honglue,

Thanks for clarifying the Unicode issue of your Python-based JSON parser. DSSR-generated JSON output is in the ASCII charset, so Unicode is an overkill in this case. The detailed examples you provided are very helpful.

Quote

x3dna-dssr -i=3bnq.pdb --symm --get-hbond --json | jq . hbonds[1]

However, it outputs

Processing file '3bnq.pdb'
jq: error: Could not open file hbonds[1]: No such file or directory

The error is due to the extra space between the dot and hbonds[1], which makes jq to take hbonds[1] as the JSON file to process.

The following command (jq .hbonds[1]) works, as expected. Also note that no need to use the --symm option in this case.

Code: [Select]

#x3dna-dssr -i=3bnq.pdb --get-hbond --json | jq .hbonds[1]
{
  "index": 2,
  "atom1_serNum": 59,
  "atom2_serNum": 975,
  "donAcc_type": "standard",
  "distance": 2.532,
  "atom1_id": "O6@A.G3",
  "atom2_id": "N4@B.C23",
  "atom_pair": "O:N",
  "residue_pair": "nt:nt"
}

Quote

Example 1: Hbond index 117. donAcc_type acceptable.

Code: [Select]

#x3dna-dssr -i=3bnq.pdb --get-hbond --json | jq .hbonds[116]
{
  "index": 117,
  "atom1_serNum": 1426,
  "atom2_serNum": 1928,
  "donAcc_type": "acceptable",
  "distance": 2.612,
  "atom1_id": "OP2@C.G22",
  "atom2_id": "O41@C.PAR101",
  "atom_pair": "O:O",
  "residue_pair": "nt:ligand"
}

Quote

Example 2: Hbond index 113. donAcc_type standard.

Code: [Select]

#x3dna-dssr -i=3bnq.pdb --get-hbond --json | jq .hbonds[112]
{
  "index": 113,
  "atom1_serNum": 1406,
  "atom2_serNum": 1937,
  "donAcc_type": "standard",
  "distance": 2.63,
  "atom1_id": "OP2@C.C21",
  "atom2_id": "N32@C.PAR101",
  "atom_pair": "O:N",
  "residue_pair": "nt:ligand"
}

Quote

In both cases, it seems that the hydrogen bond geometry are very similar then why does the DSSR think they are different donAcc_type?

In DSSR, the donAcc_type is based on known or heuristically derived donor/acceptor properties of the two atoms in an H-bond.

In the case of H-bond #113, OP2@C.C21 is a known acceptor, and N32@C.PAR101 is judged as a donor. So this H-bond is between an acceptor and a donor, which is 'standard'.

In the case of H-bond #117, OP2@C.G22 is a known acceptor, but O41@C.PAR101 is judged as a hydroxyl group. As the 2'-hydroxyl group in RNA ribose sugar, it can be either an acceptor or a donor. So this H-bond is classified as 'acceptable'.

Quote

Example 3: Hbond index 107. donAcc_type questionable.

Code: [Select]

#x3dna-dssr -i=3bnq.pdb --get-hbond --json | jq .hbonds[106]
{
  "index": 107,
  "atom1_serNum": 1323,
  "atom2_serNum": 1367,
  "donAcc_type": "questionable",
  "distance": 3.358,
  "atom1_id": "O4'@C.A17",
  "atom2_id": "O4'@C.G19",
  "atom_pair": "O:O",
  "residue_pair": "nt:nt"
}

Quote

In this case, the DSSR identify a hbonds between two O4' atom, but we know that for ribose, the O4' is unlikely to be protonated. Is this the reason why DSSR think the donAcc_type is questionable?

That's right. DSSR does not know (or care) the protonation state of the ribose sugars. It only knows that the O4' atoms are H-bond acceptors. Yet they are close together in 3D space and fulfill DSSR's geometric definition of an H-bond. So it is reported as a 'questionable' H-bond.

This is feature of DSSR, not a bug: it allows DSSR to detect all 3 H-bonds in C+C pairs in an i-motif, for example. In other cases, it may indicate a certain type of errors where users should pay attention to.

Since you're interested in H-bonds between nucleotides and the ligands, you could run the following command. DSSR detects three H-bonds between RNA and the PAR ligand. Are they what you’d expect? Have you tried other well-known software tools for H-bonding identification?

Code: [Select]

#x3dna-dssr -i=3bnq.pdb --get-hbond --json | jq '.hbonds[] | select(.residue_pair=="nt:ligand")'
{
  "index": 113,
  "atom1_serNum": 1406,
  "atom2_serNum": 1937,
  "donAcc_type": "standard",
  "distance": 2.63,
  "atom1_id": "OP2@C.C21",
  "atom2_id": "N32@C.PAR101",
  "atom_pair": "O:N",
  "residue_pair": "nt:ligand"
}
{
  "index": 117,
  "atom1_serNum": 1426,
  "atom2_serNum": 1928,
  "donAcc_type": "acceptable",
  "distance": 2.612,
  "atom1_id": "OP2@C.G22",
  "atom2_id": "O41@C.PAR101",
  "atom_pair": "O:O",
  "residue_pair": "nt:ligand"
}
{
  "index": 118,
  "atom1_serNum": 1438,
  "atom2_serNum": 1926,
  "donAcc_type": "acceptable",
  "distance": 2.644,
  "atom1_id": "N7@C.G22",
  "atom2_id": "O31@C.PAR101",
  "atom_pair": "N:O",
  "residue_pair": "nt:ligand"
}

Hope this clarifies your confusions about H-bonding identification in DSSR.

Xiang-Jun

PS. Please remember to be concrete in asking questions. Be generous in summarizing what you've learned for the benefit of yourself, and other viewers of a thread. Let's work together to make the Forum more informative.

lvelve0901 · « **Reply #6 on:** November 13, 2017, 10:05:22 am »

Hi Xiangjun,

Your answer is very clear and concrete. I think now I understand how 3DNA identify the H-bonds in general. I will keep posting other PDB examples in the future if we find something wired in the Hbond section since my rotation students is manually inspecting them now. Do you think we should post here or start a new topic in this forum for other PDB?

Also, I tried your way to parse json using jq and it works. But the issue by doing this

Quote from: lvelve0901 on November 11, 2017, 05:54:22 pm

x3dna-dssr -i=3bnq.pdb --symm --get-hbond --json | jq .hbonds[1]

You will first run the DSSR and generate json file. Sometimes for a large PDB file it will take a long time. Is there any command to parse json if we have already generated the json file?

Thanks.

Best,
Honglue

xiangjun · « **Reply #7 on:** November 13, 2017, 02:01:21 pm »

Quote

Is there any command to parse json if we have already generated the json file?

Sure. The pipe form is just a shorthand to avoid an intermediate file. You can certainly generate the JSON file first, and then parse it using jq -- see the excellent documentation of jq for examples.

Code: Ruby

x3dna-dssr -i=3bnq.pdb --get-hbond --json | jq .hbonds[1]
 
# can be decomposed into the following two steps:
x3dna-dssr -i=3bnq.pdb --get-hbond --json -o=3bnq-hbonds.json
jq .hbonds[1] 3bnq-hbonds.json
 
# all with the following results:
{
  "index": 2,
  "atom1_serNum": 59,
  "atom2_serNum": 975,
  "donAcc_type": "standard",
  "distance": 2.532,
  "atom1_id": "O6@A.G3",
  "atom2_id": "N4@B.C23",
  "atom_pair": "O:N",
  "residue_pair": "nt:nt"
}
 

Xiang-Jun

News:

Author Topic: General questions of H-bond section in DSSR (Read 79416 times)

lvelve0901

General questions of H-bond section in DSSR

xiangjun

Re: General questions of H-bond section in DSSR

lvelve0901

Re: General questions of H-bond section in DSSR

xiangjun

Re: General questions of H-bond section in DSSR

lvelve0901

Re: General questions of H-bond section in DSSR

xiangjun

Re: General questions of H-bond section in DSSR

lvelve0901

Re: General questions of H-bond section in DSSR

xiangjun

Re: General questions of H-bond section in DSSR