Author Topic: list nucleotide/nucleotide contacts involving a phosphate group. (Read 84299 times)

auffinger · « **on:** April 05, 2013, 10:38:46 am »

Hi Xiang-Jun,

I was wondering if 3DNA provides (with some unknown options to me) a listing of the nucleotides that are linked by hydrogen bonds other than base-base ones.
For example, if two nucleotides interact by a single base/phosphate or a base/sugar or supar/phosphate contact, is there a way to get a list of them.
(of course this might involve specific hydrogen bond parameters).
Hope this is clear.

Best,

Pascal

xiangjun · « **Reply #1 on:** April 05, 2013, 11:12:40 am »

Hi Pascal,

Thanks for your request for adding non-pairing nucleotide-nucleotide contacts (nt-nt) in 3DNA output. Currently, there is no such an option in 3DNA, but I do have this topic in mind for quite a while. It won't be hard to implement this functionality in DSSR/3DNA, so I will probably be able to get something done next week.

While we are at it, in addition to non-pairing nt-nt interactions via H-bond, how about those with only base-stacking? What output format would be prefer?

Xiang-Jun

auffinger · « **Reply #2 on:** April 05, 2013, 11:35:46 am »

Hi Xiang-Jun,

Thanks for this quick reply.
I would be happy to try this out first for the non-base-base contacts.
Then of course stacking info would be great.

I suggest to add a stacking output file that would list stacking info in the usual way.
You will certainly find a good way to do that. I guess you would add info related to
the stacking of non-connected nucleotides. I am eager to see that.

Thanxs,

Pascal

xiangjun · « **Reply #3 on:** April 21, 2013, 06:23:55 pm »

Hi Pascal,

I've just released DSSR beta-r09-on-20130421 which contains a new option -non-pair (or --non-pair) to detect/output non-pairing interactions, including H-bonds and base stacking. As an example, see below for PDB entry 1msy which contains a GNRA tetra-loop.

x3dna-dssr --non-pair -i=1msy -o=1msy.out The output file '1msy.out' contains the following: **************************************************************************** List of 12 non-pairing interaction(s) 1 A.G2648 A.G2673 base-overlap-area=2.0 H-bonds[0]: "" 2 A.U2650 A.G2671 base-overlap-area=0.1 H-bonds[0]: "" 3 A.C2652 A.G2669 base-overlap-area=0.2 H-bonds[0]: "" 4 A.A2654 A.U2656 base-overlap-area=3.7 H-bonds[1]: "O4'*O4'[3.05]" 5 A.G2655 A.G2664 base-overlap-area=4.4 H-bonds[1]: "O2'(hydroxyl)-O6(carbonyl)[3.09]" 6 A.G2655 A.A2665 base-overlap-area=0.0 H-bonds[3]: "N1(imino)-OP2[2.77]; N2(amino)-OP2[3.34]; N2(amino)-O5'[2.89]" 7 A.U2656 A.G2664 base-overlap-area=0.0 H-bonds[2]: "OP2-N1(imino)[3.04]; OP2-N2(amino)[2.94]" 8 A.A2657 A.A2665 base-overlap-area=3.7 H-bonds[0]: "" 9 A.G2659 A.A2661 base-overlap-area=0.0 H-bonds[1]: "O2'(hydroxyl)-N7[2.60]" 10 A.G2659 A.G2663 base-overlap-area=3.9 H-bonds[0]: "" 11 A.U2660 A.A2661 base-overlap-area=7.5 H-bonds[0]: "" 12 A.A2661 A.A2662 base-overlap-area=6.3 H-bonds[0]: "" ****************************************************************************

Please let me know how you'd like to revise the content/format. As always, concrete examples work the best.

Xiang-Jun

auffinger · « **Reply #4 on:** April 24, 2013, 10:26:34 am »

Hi Xiang-Jun,

Thats just great. It looks like a combined version of find_pair and analyze. Is that correct ?
Of course it seems not possible to (re)construct NA structures with DSSR.

So first, why calling it DSSR and not DSSNA since it works also for DNA ?
I think that one should avoid the RNA domination, it is possible to learn from both structures.
thus, does DSSR really work for DNA ?

____

Then, as for formats,
I think that as I mentioned it somewhere earlier, and since I am processing the output files
for a large number of structures, I appreciate when there are spacesbetween fields (see).

For exemple, in the dssr-torsion.dat file :

base_id alpha beta gamma delta epsilon zeta e-z chi phase-angle sugar-type Zp Dp
1 A.C2649 --- 167.1 47.6 84.1 -146.6 -77.1 -69(BI) -160.5(anti) 12.9(C3'-endo) ~C3'-endo 4.41 4.66
2 A.U2650 -64.2 164.2 60.3 79.8 -154.5 -73.1 -81(BI) -167.2(anti) 21.3(C3'-endo) ~C3'-endo 4.40 4.55

is easier to process if you write:
2 A.U2650 -64.2 164.2 60.3 79.8 -154.5 -73.1 -81 (BI) -167.2 (anti) 21.3 (C3'-endo) ~C3'-endo 4.40 4.55

and is there a need for writing twice the sugar pucker in this file ?

---
you name this file torsion although there are sugar puckers in it.
Thus it might be called torsion_puckers.dat or something else.

---
For the non-pairing interactions that is just a great feature,

you had before two values for base overlap
one calculated by just using ring atoms the other by using all base atoms.

you could add this.

---

Why adding the name of the chemical groups (hydroxyl, amino, imino, ...)
again this complicates reading since some groups are named and others not like OP2 and so on.

I would appreciate another presentation here.

___

I haven't really checked, but are your base pair numbering scheme coherent with the one
you use in find_pair ? It would be really nice to be the case.

___

Also, I wanted to ask you that but know it seems to be done. You add various names
to each base pair. Thats great. Just a hint to the various nomenclatures (Leontis-Westhof, Saenger...)
would be helpful in the *.out files.

---

is there a configuration file that would allow to precise hydrogen bond and other parameters like in 3DNA.
I would really appreciate that.

---

more later,

Tanks for the great work,

Pascal

auffinger · « **Reply #5 on:** April 24, 2013, 12:41:54 pm »

Hi Xiang-Jun,

Just found your explanation about the pucker values at the top of the torsions file :

phase-angle: the phase angle of pseudorotation and puckering
sugar-type: sugar classification into C3'-like or C2'-like

Think its great to have such comments in the data files, it helps a lot to understand and refresh memory.
One case where redundancy is useful.

Yet, I noted a misalignment at some places with the ~C2'-endo value.

6 A.A2654 165.2 133.8 56.7 149.4 -98.3 161.4 100(BII) -145.3(anti) 151.0(C2'-endo) ~C2'-endo 0.91 0.92
7 A.G2655 -96.2 91.7 179.3 149.3 -167.2 141.4 51 -93.5(anti) 151.5(C2'-endo) ~C2'-endo 2.15 2.14
8 A.U2656 -72.5 157.8 37.9 91.6 -141.0 -65.6 -75(BI) -173.9(anti) 0.4(C3'-endo) ~C3'-endo 4.35 4.42
9 A.A2657 -68.7 174.5 50.2 83.7 -145.1 -61.1 -84(BI) -171.1(anti) 26.5(C3'-endo) ~C3'-endo 4.49 4.52

also, whats the reason for the "~" ?

Pascal

xiangjun · « **Reply #6 on:** April 24, 2013, 12:47:35 pm »

Hi Pascal,

Quote

sugar-type: sugar classification into C3'-like or C2'-like

I should have updated the phrase as "C3'-endo like or C2'-endo like"

Quote

Yet, I noted a misalignment at some places with the ~C2'-endo value.

The misalignment is on purpose, to make the two broad classes more visually separated. I chose '~' for similar to (close to), as in math.

Does that make sense?

Xiang-Jun

auffinger · « **Reply #7 on:** April 25, 2013, 06:49:19 am »

Hi Xiang-Jun,

I understand your point of view, yet if its not explained it looks a little bit weird.
thus, I think that writing a maximum of explanations in the files themselves would really be useful.

My point of view is that if you shift the pucker by one space, no one will really understand why unless explained.
Even though, it might be difficult for people trying to parse the files by using a strict format.

Same for the ~. Believe me, I can spend a lot of time wondering why this is and come up sometimes
but not always with a correct answer.

So best, is not having to guess or to wonder.

Do you agree ?

Pascal

xiangjun · « **Reply #8 on:** April 25, 2013, 09:27:18 am »

Hi Pascal,

I see your points, and agree with most of them. In the next DSSR-beta release (r10), there will be a section explaining specification of nucleotide id string, base pair classifications, helix vs stem definition etc. The goal is not necessarily maximum documentation in the output files, but a minimum that helps avoid confusion for DSSR's common usages. DSSR aims to be self-contained, self-explanatory, and simple to use for a typical non-expert user.

I will consider to remove ~ in sugar classification into C2'-endo like or C3'-endo like. Nevertheless, I still want to keep the one space offset of the two types, since it makes C2'-endo like sugar conformation in a mostly C3'-endo like RNA structure stand out for visual inspection. This works for me, and that's why I decided an extra space there. I believe this trick is helpful to other structural biologists/chemists as well. As noted above, I will add some explanation in r10 to make my intention explicit.

Regarding parsing the DSSR output files, I would suggest you to make your program/script flexible. At this stage at least, I want to reserve as much freedom as possible in making DSSR most logical/sensible to me. However, I do have a clear mind of allowing for easy web/database interface to DSSR. In the future (not necessarily that near), I will add a SQLite-based SQL output of the DSSR parameters that would make parsing straightforward. I am convinced that is the way to go.

As always, constructive interactions with users like you moves 3DNA (DSSR) forward.

Xiang-Jun

auffinger · « **Reply #9 on:** May 07, 2013, 11:21:20 am »

Dear Xiang-Jun,

To follow this line of question, it seems to us that some information are not found at their expected (for us) location.
For example, if we take the 2Z75 structure from the PDB:

We find this base-pairing info for residue 114 chain B

68 B.G114 B.A117 [G-A] 00-n/a tSH tm-M
-162.1(anti) C3'-endo lambda=85.5; -127.3(anti) C3'-endo lambda=20.6
d(C1'-C1')=10.02 d(N1-N9)=8.75 d(C6-C8)=8.73 tor(N1-C1'-C1'-N9)=-147.9
H-bonds[3]: "N1(imino)-OP2[2.96]; N2(amino)-OP2[2.56]; N2(amino)-N7[3.08]"

These lines do not only list base-base interactions, but also base-phosphate-interactions (in other instances also base-sugar information).
I believe that here we should only find base-base interactions
and that base backbone information should be placed in the "non-pairing interaction" list.

Furthermore, the H-bond count of 3 seems not appropriate for the base-base interaction.
may be, you could write

H-bonds[1]: "N2(amino)-N7[3.08]" followed by
H-bonds_backbone[2]: "N1(imino)-OP2[2.96]; N2(amino)-OP2[2.56]"

and also list this last line in the "non-pairing interaction" list since you already have lines like
5 A.G1 B.G39 base-overlap-area=0.0(0.0)
H-bonds[2]: "OP1-N1(imino)[2.73]; OP1-N2(amino)[3.39]"

Hope you agree to that,

Best Pascal

xiangjun · « **Reply #10 on:** May 07, 2013, 12:25:57 pm »

Hi Pascal,

Thanks for your followup. Overall, I do not intend to make changes as you suggested.

The newly-added non-pairing interactions (H-bonds and base-stacking) are for the cases where the two bases are not paired (as defined by 3DNA/DSSR). The base-pair section contains information related to the two nucleotides, thus all existent H-bonds, be it base to base, base to sugar, base to phosphate, or sugar to phosphate. The GpU story started from the identification of the sugar-phosphate O2'(G)...O2P(U) H-bond, which until then had been ignored by the community: see "What's special about the GpU dinucleotide platform?" and "Is the O2′(G)...O2P(U) H-bond in GpU platforms real?".

To get what you want, please consider to write a parser that combines the two DSSR sections. Also note that the H-bond identification algorithm in 3DNA/DSSR may not be that sophisticated (it is unpublished/undocumented) -- I've added this functionality mainly to make 3DNA/DSSR self-contained, i.e., without relying on third-party tools. For your purpose, you may well find dedicated tools more appropriate.

Alternatively, as a collaborative project, I could add a special option or write a parser, if you provide me with a detailed specification with examples.

HTH,

Xiang-Jun

auffinger · « **Reply #11 on:** May 07, 2013, 01:40:06 pm »

Hi Xiang-Jun,

Thanks again for the quick reply. I understand your point of view.
It is not about putting things aside, just ordering them differently.
I am sure that some other things will have to be detected in RNA structures
and for that we really need the maximum of info and your software is a great tool for this.

In the present case its more a philosophical point and I suppose that some
people out there will run into the same interrogations as us.

Thus, providing another output type seems a great option and I suggest
a roughly identical file format where in the first section only base base interactions are listed
and where base-backbone and backbone-backbone interactions appear in the "non-pairing interaction" list
which could be labeled more appropriately "Base-base and base-backbone interactions".

Thanks for proposing to work on this and thanks for the work you have already done.

Pascal

xiangjun · « **Reply #12 on:** May 07, 2013, 01:56:40 pm »

Quote

I am sure that some other things will have to be detected in RNA structures
and for that we really need the maximum of info and your software is a great tool for this.

Yes, other things can be detected in RNA structures, and DSSR has laid the basis (a framework) for more features to be built upon.

Quote

Thus, providing another output type seems a great option and I suggest
a roughly identical file format where in the first section only base base interactions are listed
and where base-backbone and backbone-backbone interactions appear in the "non-pairing interaction" list
which could be labeled more appropriately "Base-base and base-backbone interactions".

Okay, I will add a new option in DSSR beta r11 that puts all pair-wise nucleotide interactions (pairing or non-pairing, with various types of H-bond) in one file. Any suggestion for a name of the option? How about --pair-wise-nt-interactions? Note I am busying right now to meet a deadline, but should be able to get the proposed work done by next week.

Your proposed format is not that clear to me. As always, please use concrete examples to illustrate your point.

Xiang-Jun

News:

Author Topic: list nucleotide/nucleotide contacts involving a phosphate group. (Read 84299 times)

auffinger

list nucleotide/nucleotide contacts involving a phosphate group.

xiangjun

Re: list nucleotide/nucleotide contacts involving a phosphate group.

auffinger

Re: list nucleotide/nucleotide contacts involving a phosphate group.

xiangjun

Re: list nucleotide/nucleotide contacts involving a phosphate group.

auffinger

Re: list nucleotide/nucleotide contacts involving a phosphate group.

auffinger

Re: list nucleotide/nucleotide contacts involving a phosphate group.

xiangjun

Re: list nucleotide/nucleotide contacts involving a phosphate group.

auffinger

Re: list nucleotide/nucleotide contacts involving a phosphate group.

xiangjun

Re: list nucleotide/nucleotide contacts involving a phosphate group.

auffinger

Re: list nucleotide/nucleotide contacts involving a phosphate group.

xiangjun

Re: list nucleotide/nucleotide contacts involving a phosphate group.

auffinger

Re: list nucleotide/nucleotide contacts involving a phosphate group.

xiangjun

Re: list nucleotide/nucleotide contacts involving a phosphate group.