Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: How the length of the sequence depends on _pdbx_unobs_or_zero_occ_atoms ???  (Read 11058 times)

Offline sk

  • non-commercial
  • with-posts
  • *
  • Posts: 5
    • View Profile
Hello,

I'm trying to understand why dssr says that 4AL5 has 16 nucleotides.

Cif file contains 20 nucleotides UUCACUGCCGUAUAGGCAGC as _entity_poly.pdbx_seq_one_letter_code,
dssr gives only 16 ACUGCCGUAUAGGCAG.

The sequence is explained as
Code: [Select]
loop_
_pdbx_poly_seq_scheme.asym_id
_pdbx_poly_seq_scheme.entity_id
_pdbx_poly_seq_scheme.seq_id
_pdbx_poly_seq_scheme.mon_id
_pdbx_poly_seq_scheme.ndb_seq_num
_pdbx_poly_seq_scheme.pdb_seq_num
_pdbx_poly_seq_scheme.auth_seq_num
_pdbx_poly_seq_scheme.pdb_mon_id
_pdbx_poly_seq_scheme.auth_mon_id
_pdbx_poly_seq_scheme.pdb_strand_id
_pdbx_poly_seq_scheme.pdb_ins_code
_pdbx_poly_seq_scheme.hetero

B 2 1   U   1   2   ?   ?   ?   B . n
B 2 2   U   2   3   ?   ?   ?   B . n
B 2 3   C   3   4   4   C   C   B . n
B 2 4   A   4   5   5   A   A   B . n
B 2 5   C   5   6   6   C   C   B . n
B 2 6   U   6   7   7   U   U   B . n
B 2 7   G   7   8   8   G   G   B . n
B 2 8   C   8   9   9   C   C   B . n
B 2 9   C   9   10  10  C   C   B . n
B 2 10  G   10  11  11  G   G   B . n
B 2 11  U   11  12  12  U   U   B . n
B 2 12  A   12  13  13  A   A   B . n
B 2 13  U   13  14  14  U   U   B . n
B 2 14  A   14  15  15  A   A   B . n
B 2 15  G   15  16  16  G   G   B . n
B 2 16  G   16  17  17  G   G   B . n
B 2 17  C   17  18  18  C   C   B . n
B 2 18  A   18  19  19  A   A   B . n
B 2 19  G   19  20  20  G   G   B . n
B 2 20  C   20  21  21  C   C   B . n

If I understand correctly you removed first to lines because _pdbx_poly_seq_scheme.pdb_mon_id  = ? (consequence of _pdbx_unobs_or_zero_occ_residues ? )

But why you should remove C21 and C4 ?
Looks like it has something to do with _pdbx_unobs_or_zero_occ_atoms.

Could you please clarify the situation ?
Thanks in advance.
« Last Edit: September 04, 2024, 09:43:30 am by sk »

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Re: How the length of the sequence depends on _pdbx_unobs_or_zero_occ_atoms ???
« Reply #1 on: September 05, 2024, 10:58:42 pm »
Hi,

DSSR is based on 3D structures of DNA/RNA, deriving features of base-pairing and stacking interactions. It also takes abasic sites into consideration in later releases, requiring only P or at least 5 out of the 6 main-chain backbone atoms (P, O5', C5', C4', C3', and O3'). In PDB entry 4AL5, nucleotide C4 has only one backbone atom (O3'), and C21 has 4 backbone atoms (P, O1P, O2P, and O5') as shown below.
ATOM   2826 O "O3'"  . C   B 2 3   ? 14.682 -18.630 19.841  1.00 152.11 ? 4    C   B "O3'"  1
......
ATOM   3343 P P      . C   B 2 20  ? 2.515  -3.243  14.608  1.00 43.27  ? 21   C   B P      1
ATOM   3344 O OP1    . C   B 2 20  ? 1.257  -3.732  14.022  1.00 60.70  ? 21   C   B OP1    1
ATOM   3345 O OP2    . C   B 2 20  ? 2.599  -1.863  15.133  1.00 37.31  ? 21   C   B OP2    1
ATOM   3346 O "O5'"  . C   B 2 20  ? 2.975  -4.175  15.812  1.00 40.82  ? 21   C   B "O5'"  1

So in previous DSSR versions, both nucleotides are ignored.

Following your question, I've revised DSSR to v2.4.4-2024sep06 which can recognize these two nucleotides. See below:

# x3dna-dssr -i=4AL5.cif
Secondary structures in dot-bracket notation (dbn) as a whole and per chain
>4AL5 nts=18 [whole]
CACUGCCGUAUAGGCAGC
..(((((.....))))).
-.AAAA..A...AAAA--

****************************************************************************
Summary of structural features of 18 nucleotides
  Note: the first five columns are: (1) serial number, (2) one-letter
    shorthand name, (3) dbn, (4) id string, (5) rmsd (~zero) of base
    ring atoms fitted against those in a standard base reference
    frame. The sixth (last) column contains a comma-separated list of
    features whose meanings are mostly self-explanatory, except for:
      turn: angle C1'(i-1)--C1'(i)--C1'(i+1) < 90 degrees
      break: no backbone linkage between O3'(i-1) and P(i)
   1  C . B.C4      ---    non-stack,ss-non-loop
   2  A . B.A5      0.013  anti,~C2'-endo,non-pair-contact,ss-non-loop,splayed-apart
   3  C ( B.C6      0.007  anti,~C3'-endo,BI,canonical,non-pair-contact,helix-end,stem-end,phosphate,splayed-apart
   4  U ( B.U7      0.009  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem,phosphate
   5  G ( B.G8      0.015  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem,phosphate
   6  C ( B.C9      0.011  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem,phosphate
   7  C ( B.C10     0.011  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem-end,hairpin-loop,phosphate
   8  G . B.G11     0.043  u-turn,anti,~C3'-endo,BI,non-canonical,non-pair-contact,helix-end,hairpin-loop,cap-acceptor,phosphate
   9  U . B.U12     0.019  turn,u-turn,anti,~C3'-endo,non-pair-contact,hairpin-loop
  10  A . B.A13     0.022  u-turn,anti,~C3'-endo,non-pair-contact,hairpin-loop,cap-donor,phosphate
  11  U . B.U14     0.006  turn,u-turn,anti,~C2'-endo,non-pair-contact,hairpin-loop,phosphate,splayed-apart
  12  A . B.A15     0.007  anti,~C3'-endo,BI,non-canonical,non-pair-contact,helix-end,hairpin-loop,splayed-apart
  13  G ) B.G16     0.017  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem-end,hairpin-loop
  14  G ) B.G17     0.011  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem
  15  C ) B.C18     0.011  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem
  16  A ) B.A19     0.014  anti,~C3'-endo,BI,canonical,non-pair-contact,helix,stem
  17  G ) B.G20     0.018  anti,~C2'-endo,BI,canonical,non-pair-contact,helix-end,stem-end
  18  C . B.C21     ---    non-stack,ss-non-loop,phosphate


Best regards,

Xiang-Jun
« Last Edit: September 05, 2024, 11:41:40 pm by xiangjun »

Offline sk

  • non-commercial
  • with-posts
  • *
  • Posts: 5
    • View Profile
Re: How the length of the sequence depends on _pdbx_unobs_or_zero_occ_atoms ???
« Reply #2 on: September 06, 2024, 10:23:46 am »
Thank you, Xiang-Jun.

 

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University