Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: modified nucleotides incorrect.  (Read 32974 times)

Offline tctcab

  • with-posts
  • *
  • Posts: 3
    • View Profile
modified nucleotides incorrect.
« on: February 04, 2020, 01:26:15 am »
Hi, Dr. Li,

I've read your post regarding the modified nucleotides issue. https://x3dna.org/highlights/modified-nucleotides-in-the-pdb

However, during my usage, I noticed some modified nucleotides are still incorrect:

PDB: 1ASY_R

output of x3dna-dssr:

UCCGUGAUAGUUPAAuGGuCAGAAUGGGCGCPUGUCgCGUGCCAGAUcGGGGtPCAAUUCCCCGUCGCGGAGCCA

  50  G ( R.G651   0.012  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem,coaxial-stack
  51  G ( R.G652   0.011  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem,coaxial-stack
  52  G ( R.G653   0.013  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem-end,coaxial-stack,hairpin-loop,kissing-loop
  53  t . R.5MU654 0.017  modified,anti,~C3'-endo,BI,non-canonical,non-pair-contact,helix,hairpin-loop,kissing-loop
  54  P . R.PSU655 0.011  modified,u-turn,anti,~C3'-endo,BI,non-canonical,non-pair-contact,helix,hairpin-loop,kissing-loop,cap-acceptor

  55  C ] R.C656   0.019  pseudoknotted,turn,u-turn,anti,~C3'-endo,BI,isolated-canonical,non-pair-contact,helix-end,hairpin-loop,kissing-loop
  56  A . R.A657   0.010  u-turn,anti,~C3'-endo,non-pair-contact,hairpin-loop,kissing-loop,cap-donor,phosphate




the PDB fasta from PDB database:

>1ASY_R
UCCGUGAUAGUUUAAUGGUCAGAAUGGGCGCUUGUCGCGUGCCAGAUCGGGGUUCAAUUCCCCGUCGCGGAGCCA

According to the list in your provided https://x3dna.org/luxfiles/modified-bases-2013oct18.txt

5MU should be U, instead of t, PSU should be u, instead of P

hope this helps.

TC




Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: modified nucleotides incorrect.
« Reply #1 on: February 04, 2020, 11:14:49 am »
Hi TC,

Thanks for using 3DNA/DSSR and for posting your questions on the 3DNA Forum.

Quote
However, during my usage, I noticed some modified nucleotides are still incorrect:

I checked PDB entry 1ASY_R, and can reproduce your reported results regarding the DSSR auto-assigned modified nucleotides. However, I disagree with you that DSSR has made a mistake here, especially with regard to the pseudouridine (PSU).

As noted in the DSSR paper, in the section on "Identification of nucleotides":

Quote
In the derived base sequence, DSSR uses a one-letter shorthand for each identified nucleotide: upper case A, C, G, U and T for standard RNA and DNA bases, and lower case letters for modified nucleotides mapped to their canonical counterparts (e.g. ‘c’ for 5-methylcytidine, 5MC; Figure 2 and Supplementary Sample Output). Note that pseudouridine (PSU) is shortened to ‘P’, due to its special C1′–C5 glycosidic linkage (Figure 2).

Taking PSU as a modified U, i.e., using the standard base reference frame of U, would lead to wrong base-pair parameters. Thus 3DNA/DSSR specifically adds the P symbol -- this is a deliberate choice, a feature, not a bug.

As for taking 5MU as modified T ('t'), that's because of the 5-methyl group. I agree that choice here is arbitrary for the assignment of 5MU as modified U or T. Users can take it as a modified U explictly in 3DNA via the 'basedate.dat' file, as documented in that blogpost. For the purpose of 3DNA/DSSR, however, taking 5MU as a modified T or U does not has noticeable effect on the derived parameters. So DSSR always uses an implicit assignment, for simplicity.

For users who want to compare DSSR-derived sequences with other resources, they need to pay attention to the lower-case letters and P and take proper actions. By design, the DSSR-derived base sequences from 3D atomic coordinates would be different from those listed in the PDB when pseudouridine (PSU) is involved. I could add a new DSSR option so that users can explicitly set the mapping in cases like 5MU. In my support of DSSR for more than 6 years, however, this ambiguity has not been a concern in practice. Do you think such a feature would be useful to you?

Best regards,

Xiang-Jun


« Last Edit: February 04, 2020, 11:53:11 am by xiangjun »

Offline tctcab

  • with-posts
  • *
  • Posts: 3
    • View Profile
Re: modified nucleotides incorrect.
« Reply #2 on: February 04, 2020, 07:26:53 pm »
Hi, Xiang-Jun,

Thanks for your explanation, now I understand your choice.

However, regarding 1ASY_R and PSU, the basepair classification of DSSR would be:

command: x3dna-dssr -i=1ASY_R.pdb --json -o=1ASY_R.dssr.json

pairs:
...
       index      nt1      nt2  bp       name   Saenger  LW DSSR
28    28   R.C631   R.G639 C-G         WC    19-XIX cWW cW-W
29    29 R.PSU632   R.C638 P-C         --       n/a cWW cW-W
30    30 R.PSU632   R.G639 P-G         --       n/a cWW cW-W
...

You should notice the inconsistency between BP classification. briefly, the name, Saenger columns do not recognize the WC pair of 28,29, while LW and DSSR annotate them as canonical. So if I want to get canonical pairs, it seems that I can't use the former two columns, right? what's your advice for the task of retrieving canonical basepairs when the sequence has PSU?


Quote
For users who want to compare DSSR-derived sequences with other resources, they need to pay attention to the lower-case letters and P and take proper actions. By design, the DSSR-derived base sequences from 3D atomic coordinates would be different from those listed in the PDB when pseudouridine (PSU) is involved. I could add a new DSSR option so that users can explicitly set the mapping in cases like 5MU. In my support of DSSR for more than 6 years, however, this ambiguity has not been a concern in practice. Do you think such a feature would be useful to you?

This will definitely help and useful for other users, I believe.

In my workflow, I used your list https://x3dna.org/luxfiles/modified-bases-2013oct18.txt to convert sequence back to standard RNA sequence (AUGCNX) and do sequence-search.  a letter P in the output of DSSR will be treated as proline. My suggestion is to keep the output sequence in line with the IUPAC code in order to reduce ambiguity.
https://www.bioinformatics.org/sms/iupac.html
« Last Edit: February 04, 2020, 07:49:59 pm by tctcab »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: modified nucleotides incorrect.
« Reply #3 on: February 04, 2020, 08:26:42 pm »
Quote
       index      nt1      nt2  bp       name   Saenger  LW DSSR
28    28   R.C631   R.G639 C-G         WC    19-XIX cWW cW-W
29    29 R.PSU632   R.C638 P-C         --       n/a cWW cW-W
30    30 R.PSU632   R.G639 P-G         --       n/a cWW cW-W

You should notice the inconsistency between BP classification. briefly, the name, Saenger columns do not recognize the WC pair of 28,29, while LW and DSSR annotate them as canonical. So if I want to get canonical pairs, it seems that I can't use the former two columns, right? what's your advice for the task of retrieving canonical basepairs when the sequence has PSU?

I am confused by the first two columns here. Also you mentioned the "the WC pair of 28,29". What is it? What about 30? Please clarify.

DSSR follows the convention that "canonical pairs" include only WC and G-U wobble pairs. The DSSR (and its implementation of the LW) pair annotations are geometry based. If want to retrieve WC-like pairs involving PSU, you can check the 'cW-W' DSSR notation. You may check the "RNA Structure Atlas" website for 'authentic' LW annotation of base pairs.
 
As mentioned in my previous response, the mapping of PSU to symbol P is a deliberate decision in 3DNA/DSSR.  The P symbol makes it stands out for the most common modified nucleotide, pseudouridine. Users could easily replace P in DSSR-derived base sequence to U, as they wish. So it should not be an issue in practice.

I will add a new option to DSSR so users can have control over the mapping of 5MU to U, for example. However, PSU to U mapping will not be allowed: PSU is topologically different from U in terms of sugar-base connectivity.

Best regards,

Xiang-Jun



« Last Edit: February 04, 2020, 09:56:40 pm by xiangjun »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: modified nucleotides incorrect.
« Reply #4 on: February 05, 2020, 12:41:02 pm »
As a follow-up, I've updated DSSR to v1.9.9-2020feb06 on the download page. The update introduces a new option --nt-mapping that takes a comma-separated list of modified nucleotides in the form of 3-letter-id:1-letter-symbol. For example, to map 5MU to u, one can use --nt-mapping='5MU:u'. More modified nucleotides are allowed, which are separated by comma. The one-letter symbol must be among ACGTUP (or acgtup). By design, PSU is assigned to P by default, and cannot be changed via this option.

Using 1ASY_R as an example, here are the detailed steps (viewers can follow):
Code: Text
  1. curl https://files.rcsb.org/download/1ASY.pdb -o 1ASY.pdb
  2. x3dna-dssr -i=1ASY.pdb --select-chain=R -o=1ASY-R.pdb
  3. x3dna-dssr -i=1ASY-R.pdb
  4. x3dna-dssr -i=1ASY-R.pdb --nt-mapping='5MU: u'

The DSSR-derived sequences are listed below:
  UCCGUGAUAGUUPAAuGGuCAGAAUGGGCGCPUGUCgCGUGCCAGAUcGGGGtPCAAUUCCCCGUCGCGGAGCCA
  UCCGUGAUAGUUPAAuGGuCAGAAUGGGCGCPUGUCgCGUGCCAGAUcGGGGuPCAAUUCCCCGUCGCGGAGCCA

Note that mapping 5MU to 't' or 'u' has minimal influence on DSSR-derived base-pair parameters, as show below. 3DNA/DSSR is robust against the (potential) ambiguity in designating a modified nucleotide to its nearest canonical counterpart.
Code: Text
  1. 36 R.5MU654       R.A658         t-A rHoogsteen  24-XXIV   tWH  tW-M
  2.      bp-pars: [4.19    -2.18   -0.08   -4.49   6.74    -93.68]
  3. # with 5MU:u
  4. 36 R.5MU654       R.A658         u-A rHoogsteen  24-XXIV   tWH  tW-M
  5.      bp-pars: [4.20    -2.20   -0.08   -4.48   6.74    -93.47]

As a side note, the v1.9.9-2020feb06 also contains many refinements at the DSSR-PyMOL interface for producing the characteristic block schematics. See http://skmatic.x3dna.org.

Xiang-Jun
« Last Edit: February 05, 2020, 12:44:01 pm by xiangjun »

Offline tctcab

  • with-posts
  • *
  • Posts: 3
    • View Profile
Re: modified nucleotides incorrect.
« Reply #5 on: February 05, 2020, 10:13:05 pm »

Many thanks for your time!

I find DSSR to be handy and really like your work and your devotion to maintaining it for so long.

 

Funded by X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids (R24GM153869)

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University