Author Topic: H-bonding information in MD analysis output (Read 11642 times)

Andre · « **on:** September 28, 2011, 12:35:23 am »

Hi All,

I have used the x3dna_md.rb script and it has worked perfectly. However, is there a H-bond parameter that it will analyse? I cannot see any H-bond information in the x3dna_md.out file.

Kind regards,

Andre

xiangjun · « **Reply #1 on:** September 28, 2011, 07:17:37 am »

Hi Andre,

First, thanks for posting your question in the forum! The more user feedback, the merrier.

Second, you may notice that I have split your post from the thread "Ruby scripts for the analysis of MD simulation trajectories" which is too long (18 posts now), and spans two pages. I have added a new subject line "Re: H-bonding information in MD analysis output" for the new thread.

Third, and more relevant to your question, are you referring to adding a new section for the H-bonding information? An example would be:

<hbond>
   1 C-----G  [2]  O2 - N1  2.99  N4 - O6  2.98
   2 C-----G  [3]  O2 - N2  2.74  N3 - N1  2.84  N4 - O6  3.17
   3 T-----A  [2]  N3 - N1  3.09  O4 - N6  2.86
   4 A-----T  [2]  N1 - N3  2.78  N6 - O4  3.05
   5 A-----T  [2]  N1 - N3  2.81  N6 - O4  2.76
   6 T-----A  [2]  N3 - N1  2.71  O4 - N6  3.06
   7 A-----T  [2]  N1 - N3  3.12  N6 - O4  3.12
   8 G-----C  [2]  N1 - N3  3.00  O6 - N4  2.75
   9 A-----T  [2]  N1 - N3  2.93  N6 - O4  3.12
  10 A-----T  [2]  N1 - N3  2.90  N6 - O4  2.93
  11 A-----T  [2]  N1 - N3  3.01  N6 - O4  3.05
  12 T-----A  [2]  N3 - N1  2.80  O4 - N6  3.12
</hbond>

If that's the case, then it won't be much a problem to parse and add it into the output file. Otherwise, please provide a concrete example to show exactly what you mean.

Please confirm.

Xiang-Jun

Andre · « **Reply #2 on:** September 29, 2011, 05:07:19 am »

Dear Xiang-Jun,

Thankyou for the quick reply.

Yes, I would like to have a new section with the H-bonding information in the output file, I am not familiar with ruby, and do not know how to do this using the x3DNA_md.rb script. There are 100 md*.out files to process for a given structure and the script has been very helpful in collecting the DNA parameters.

Kind regards,

Andre

xiangjun · « **Reply #3 on:** September 29, 2011, 07:03:12 am »

Hi Andre,

Thanks for your feedback. I will add a section of H-bond parameters (hopefully) by tomorrow, and then release an updated version of the Ruby scripts. So stay tuned, and check back the forum.

Xiang-Jun

Andre · « **Reply #4 on:** September 29, 2011, 07:43:33 am »

Dear Xiang-Jun,

Thankyou very much for that. Greatly appreciated.

Kind regards,

Andre

xiangjun · « **Reply #5 on:** September 30, 2011, 10:17:44 am »

Hi Andre,

Please check the updated post "Ruby scripts for the analysis of MD simulation trajectories". Now in v0.7, the scripts parse and extract H-bonding information you asked for, and also the overlap areas for quantifying stacking interactions.

Now I realized that I had a (solid) reason to exclude H-bond parameters in the scripts -- being variable-length text instead of float numbers (see my first reply in this thread for an example), they do not fit the pattern with other previously extracted parameters. Specifically, with N models (structures) and each M base-pairs, we have a N-by-M data matrix for each of the base-pair parameters. For buckle, see the attached file 'buckle.tsv' (based on the distributed dataset [mono:kldkpnjg]sample_md0.pdb[/mono:kldkpnjg]. Here '[mono:kldkpnjg]tsv[/mono:kldkpnjg]' stands for tab-separated values). Now for H-bonding, the extracted data file 'hbond.tsv' looks a bit wildly -- each item has different text length and number of H-bonds. I cannot see how such information can be processed directly as with other parameters, and I'd be interested in knowing how you proceed.

Have a try, and see how it goes.

Xiang-Jun

Andre · « **Reply #6 on:** October 01, 2011, 11:16:58 pm »

Dear Xian-jun,

Thankyou for that, it works fine and it's exactly what I wanted. Since the H-bond information for each base pair is arranged in columns, is there a way for the script to define each column for a specific H-bond and if the bond does not exist then it will leave it blank?
Also if other 'non-standard' H-bonds form (shown by the asterisk ' * '), is it possible for the script to put these in seperate columns?

Here is a file to show you what I mean;

I realise this is just formatting however it will make life easier to have the script format the output rather than do it manually.

Kind regards,

Andre

xiangjun · « **Reply #7 on:** October 02, 2011, 09:40:40 am »

Quote from: Andre

Since the H-bond information for each base pair is arranged in columns, is there a way for the script to define each column for a specific H-bond and if the bond does not exist then it will leave it blank?

The parsing script has no knowledge of any 'specific H-bond': it just takes H-bonds for each base-pair as a single unit of text string. The H-bond info is calculated with 'analyze', based on purely geometric criteria. Thus for example there can be different A-T pairs: e.g., Watson-Crick pair and Hoogsteen pair, and many other possibilities.

In the example shown in my first reply, note the first two C-G pairs:

1 C-----G  [2]  O2 - N1  2.99  N4 - O6  2.98
2 C-----G  [3]  O2 - N2  2.74  N3 - N1  2.84  N4 - O6  3.17

They are based on model #20 of the distributed PDB ensemble 'sample_md0.pdb'. Note specifically, while the 2nd C-G pair is normal with 3 H-bonds, the first C-G pair has 2 H-bonds including the unconventional 'O2 - N1 2.99'.

Overall, I cannot see a sensible, generally applicable way to "define each column for a specific H-bond".

Quote from: Andre

Here is a file to show you what I mean

I realise this is just formatting however it will make life easier to have the script format the output rather than do it manually.

See above. For your example, the first column does not align. Think about all the possible base sequence and pairing patterns, what you are asking for is certainly not just a formatting matter.

0   [3]   O2 - N2   2.96   N3 - N1   3.03   N4 - O6   3.49
1   [3]   O2 - N2   3.53   N3 - N1   2.97   N4 - O6   2.75
2   [3]   O2 - N2   3.01   N3 - N1   2.78   N4 - O6   3.19
3   [3]   O2 - N2   3.17   N3 - N1   3.25   N4 - O6   3.53
4   [3]   O2 - N2   2.81   N3 - N1   2.97   N4 - O6   3.22
5   [3]   O2 - N2   2.86   N3 - N1   2.87   N4 - O6   2.86
6   [1]         N3 - N1   3.08      
7   [2]         N3 - N1   3.25   N4 - O6   2.78
8   [2]   O2 - N1   3.07         N4 - O6   3.35
9   [3]   O2 - N2   2.96   N3 - N1   2.71   N4 - O6   2.85
10   [3]   O2 - N2   3.32   N3 - N1   2.87   N4 - O6   3.09
11   [3]   O2 - N2   3   N3 - N1   2.86   N4 - O6   3.18
12   [2]         N3 - N1   3.14   N4 - O6   3.04
13   [2]   O2 - N2   2.75   N3 - N1   3.29      
14   [3]   O2 - N2   3.04   N3 - N1   3.06   N4 - O6   3.38
15   [3]   O2 - N2   2.85   N3 - N1   2.99   N4 - O6   3.21
16   [3]   O2 - N2   3.17   N3 - N1   2.93   N4 - O6   2.98      
17   [3]   O2 - N2   2.79   N3 - N1   2.74   N4 - O6   2.6      
18   [3]   O2 - N2   3.03   N3 - N1   2.78   N4 - O6   2.66      
19   [2]   O2 - N1   2.75               N3 * O6   3.32
20   [2]   O2 - N1   2.99         N4 - O6   2.98

I understand you may have a clearly defined goal in hand, and that's where application-specific scripting/programming comes into play. I may be able to help in some way if you make your case more specific.

Xiang-Jun

Andre · « **Reply #8 on:** October 03, 2011, 05:19:29 pm »

Dear Xian-jun,

Ok, essentially I'm trying to group the H-bond information. Yes, there may be a lot of H-bond possibilities in a DNA structure including hoogsteen and reverse waston crick pairing.

This may be a process that the extract script will have to handle after 'analyze'.

Let's use wastson crick pairing as our conventional H-bond information. For your GC pair example, can we extract and organsie the information so the first column is O2 - N2, second is N3 - N1, and the third is N4 - O6. If there are non-waston crick H-bonds they can be placed in the fourth and subsequent columns.

Do we have to specify all the non-waston crick H-bonds or can the script look for the waston crick H-bonds and place the non-waston crick H-bonds after the standard waston crick information?

Here is a file of common H-bonding atoms between bases:

xiangjun · « **Reply #9 on:** October 03, 2011, 10:19:00 pm »

Hi Andre,

Quote

This may be a process that the extract script will have to handle after 'analyze'.

As mentioned previously, I am not convinced to add such a functionality directly in the current two Ruby scripts. If you want to proceed, it is preferable (in my understanding) to start from the extracted [mono:13mhwi7u]<hbond>[/mono:13mhwi7u] parameters. A purpose-specific script is the way to go.

Quote

Let's use wastson crick pairing as our conventional H-bond information. For your GC pair example, can we extract and organsie the information so the first column is O2 - N2, second is N3 - N1, and the third is N4 - O6. If there are non-waston crick H-bonds they can be placed in the fourth and subsequent columns.

In addition to G-C pair, you also need to consider C-G pair, which by default swaps the order of atoms in H-bond output.

Quote

Do we have to specify all the non-waston crick H-bonds or can the script look for the waston crick H-bonds and place the non-waston crick H-bonds after the standard waston crick information?

It is entirely up to you, with a purpose-specific script.

HTH,

Xiang-Jun

Andre · « **Reply #10 on:** October 04, 2011, 12:41:55 am »

Dear Xian-jun,

Thankyou for your help, I will try to work with the extract output and organise the data as you suggested. I guess I'm trying to make it easier to see the type of H-bond events that are occuring.

I am not experienced in programing/scripting so it will be an interesting challenge. I'll post the script here if I'm successful as others may have a use for it.

Much appreciated,

Andre

xiangjun · « **Reply #11 on:** October 04, 2011, 07:25:42 am »

Hi Andre,

Quote

I will try to work with the extract output and organise the data as you suggested. I guess I'm trying to make it easier to see the type of H-bond events that are occuring.

It makes sense. Over the years, 3DNA has been used in ways that are out of my original intention. As a principle, I won't add a 'feature' into 3DNA unless I fully understand it and can back it up when asked. Put another ways, feature-rich is not what 3DNA aims for, but quality is. This is where purpose-specific scripting comes into play, and such a skill is essential for a 'bioinformatican'.

Quote

I am not experienced in programing/scripting so it will be an interesting challenge. I'll post the script here if I'm successful as others may have a use for it.

We all learn from practice, and by making mistakes. I am willing to help you in the exercise: just put your script with a reproducible example here, then I (hopefully other viewers of the forum) can make suggestions. With several iterations, the problem will become clearer, and your script may well function for your needs. In retrospect, this is the most effective way to learn things, and I wish to have been guided this way. I aim to turn this forum into a 'educational' resource so that people with similar mind can share and benefit from each other's experience and contributions. I am glad that you are willing to share, which is precisely the incentive that I'd like to help you more.

Have a try, and let's see how it goes.

Xiang-Jun

News:

Author Topic: H-bonding information in MD analysis output (Read 11642 times)