Author Topic: [Solved] Batch-processing structures with analyze -t (Read 106233 times)

persalteas · « **on:** January 06, 2020, 09:42:06 am »

Hi there,

First, thank you a lot to each and everyone of you developing DSSR, this is an awesome piece of software that perfectly suits my needs. I already use it for some time without any problems. Today, i have a technical question to enhance my productivity.

I am trying to "analyze -t" a few thousands of RNA chains to get their eta' and theta' pseudotorsions, which works very well. So, i use several threads which run analyze in parallel, on a multicore CPU.

The fact is "analyze -t" creates a file named Borg_P_C1_C4.dat with the cartesian coordinates of the atoms used. As we cannot control this file's name, i am afraid of conflicts between my threads, each of them trying to read/write different content into the same file.

Therefore, two questions:

Is the .dat file necessary to compute the .tor file, or the opposite ?
Is there a way to control that file's name ?

Example command i run:

Code: [Select]

analyze -t=/path/to/pseudotorsions/folder/4v8p[1]-B3.tor /path/to/chains/folder/4v8p[1]-B3.pdb
Thanks a lot for your help,

Louis Becquey
PhD Student @Univ Evry, Université Paris-Saclay

xiangjun · « **Reply #1 on:** January 06, 2020, 10:26:06 am »

Hi Louis Becquey,

Thanks for using 3DNA/DSSR and for your kind words about it.

You mentioned "analyze -t", which is a program in the classic 3DNA 2.x suite that is current in maintenance mode. You also referred to DSSR, which is a brand-new program I've been actively developing and continuously refining.

For the calculation of eta' and theta' pseudo-torsions, among other things, I'd suggest that you switch to DSSR. Using the --json option, you can easily parse DSSR output. Compared to the "analyze -t" approach, your pipeline would be significantly simplified. See the following two threads:

Hope this helps.

Xiang-Jun

PS: The fixed-name file "Borg_P_C1_C4.dat" is for information only. It is by-product of running the "analyze -t" program. With the source code, you could name the file in a way as you see fit.

persalteas · « **Reply #2 on:** January 06, 2020, 05:06:43 pm »

Thanks, this is actually much simple, faster and convenient.

I never noticed the eta_prime and theta_prime fields in the json output until now.
Thanks Markus for the R script too, but i won't need it, it is actually three lines of python:

Code: [Select]

import json, subprocess

filename = "/path/to/your/RNA/chain.pdb"

output = subprocess.run(["x3dna-dssr", f"-i={filename}", "--json", "--auxfile=no"], stdout=subprocess.PIPE, stderr=subprocess.DEVNULL)
nts = json.loads(output.stdout.decode('utf-8'))["nts"]
for nt in nts:
    print(nt["nt_resnum"], '\t', nt["eta_prime"], '\t', nt["theta_prime"])  # or whatever you want to do with them.

My issue is solved, thanks a lot, Xiang-Jun !

xiangjun · « **Reply #3 on:** January 06, 2020, 08:23:13 pm »

Thanks for your feedback, and for sharing your code with the community. Yes, JSON is the preferred means to connect 3DNA/DSSR to the outside world.

Best regards,

Xiang-Jun

PS. As a side note, DSSR also has the option --torsion-file (undocumented) which outputs commonly used backbone parameters, including pseudo-torsions you referred to. It serves as the replacement of "analyze -t", and may be useful for quick visual examinations of the output. Here is the DSSR command and output for PDB id 355d (the classic B-DNA dodecamer).

x3dna-dssr -i=355d.pdb --torsion-file -o=355d-tor.txt

Code: [Select]

         Output of DNA/RNA backbone conformational parameters
             DSSR v1.9.8-2019oct16 by xiangjun@x3dna.org
******************************************************************************************
Main chain conformational parameters:

  alpha:   O3'(i-1)-P-O5'-C5'
  beta:    P-O5'-C5'-C4'
  gamma:   O5'-C5'-C4'-C3'
  delta:   C5'-C4'-C3'-O3'
  epsilon: C4'-C3'-O3'-P(i+1)
  zeta:    C3'-O3'-P(i+1)-O5'(i+1)
  e-z:     epsilon-zeta (BI/BII backbone classification)

  chi for pyrimidines(Y): O4'-C1'-N1-C2; purines(R): O4'-C1'-N9-C4
    Range [170, -50(310)] is assigned to anti, and [50, 90] to syn

  phase-angle: the phase angle of pseudorotation and puckering
  sugar-type: ~C2'-endo for C2'-endo like conformation, or
               ~C3'-endo for C3'-endo like conformation
              Note the ONE column offset (for easy visual distinction)

ssZp: single-stranded Zp, defined as the z-coordinate of the 3' phosphorus atom
      (P) expressed in the standard reference frame of the 5' base; the value is
      POSITIVE when P lies on the +z-axis side (base in anti conformation);
      NEGATIVE if P is on the -z-axis side (base in syn conformation)
  Dp: perpendicular distance of the 3' P atom to the glycosidic bond
      [Ref: Chen et al. (2010): "MolProbity: all-atom structure
            validation for macromolecular crystallography."
            Acta Crystallogr D Biol Crystallogr, 66(1):12-21]
splay: angle between the bridging P to the two base-origins of a dinucleotide.

          nt               alpha    beta   gamma   delta  epsilon   zeta     e-z        chi            phase-angle   sugar-type    ssZp     Dp    splay
 1     C A.DC1               ---     ---   -70.0   144.7  -171.8   -98.4    -73(BI)   -105.9(anti)   163.5(C2'-endo) ~C2'-endo     1.60    1.87   20.42
 2     G A.DG2             -69.8  -172.2    43.0   148.1  -151.3  -157.0      6(..)    -85.4(anti)   160.0(C2'-endo) ~C2'-endo     1.46    1.59   21.98
 3     C A.DC3             -39.4   130.5    50.1    93.3  -165.4   -81.2    -84(BI)   -132.4(anti)    65.3(C4'-exo)     ....       2.36    3.05   17.11
 4     G A.DG4             -64.8   174.4    50.1   145.0  -167.3  -144.6    -23(..)    -93.7(anti)   156.2(C2'-endo) ~C2'-endo     2.19    2.01   20.93
 5     A A.DA5             -47.1   156.2    53.7   133.0  -177.5   -90.7    -87(BI)   -118.6(anti)   158.0(C2'-endo) ~C2'-endo     1.92    2.02   19.68
 6     A A.DA6             -64.2  -173.0    46.3   128.0   176.9   -95.8    -87(BI)   -109.2(anti)   152.6(C2'-endo) ~C2'-endo     1.87    2.02   18.84
 7     T A.DT7             -49.0   172.9    48.6   112.7  -176.6   -95.6    -81(BI)   -119.6(anti)   125.2(C1'-exo)  ~C2'-endo     2.03    2.27   18.01
 8     T A.DT8             -54.1   168.3    53.4   114.0   171.5   -95.0    -94(BI)   -119.9(anti)   126.9(C1'-exo)  ~C2'-endo     2.09    2.31   20.88
 9     C A.DC9             -53.3  -173.2    50.8   138.3  -155.9   -96.3    -60(BI)   -112.3(anti)   157.9(C2'-endo) ~C2'-endo     1.17    1.63   19.40
 10    G A.DG10            -60.3   163.2    39.5   143.2  -100.0   146.3    114(BII)   -83.6(anti)   145.6(C2'-endo) ~C2'-endo     1.41    1.25   21.62
 11    C A.DC11            -73.1   144.3    50.8   143.5  -164.4  -126.1    -38(BI)   -112.8(anti)   163.8(C2'-endo) ~C2'-endo     1.11    1.67   19.40
 12    G A.DG12             52.9   144.1   -65.6   147.9     ---     ---     ---       -79.3(anti)   207.4(C3'-exo)     ....        ---     ---     ---

 1     C B.DC13              ---     ---    58.5   138.1  -169.0  -104.8    -64(BI)   -106.9(anti)   163.5(C2'-endo) ~C2'-endo     1.67    1.97   18.36
 2     G B.DG14            -58.3   166.6    49.5   115.0   178.2   -94.3    -88(BI)   -110.2(anti)   129.6(C1'-exo)  ~C2'-endo     2.27    2.38   21.74
 3     C B.DC15            -56.8   160.1    56.8    82.1  -174.4   -84.7    -90(BI)   -137.2(anti)    42.0(C4'-exo)   ~C3'-endo    3.17    3.82   18.25
 4     G B.DG16            -61.6   175.5    62.1   140.9   150.8   -89.6   -120(BI)   -101.7(anti)   168.8(C2'-endo) ~C2'-endo     2.20    2.01   22.29
 5     A B.DA17            -56.3  -160.2    54.1   145.8  -178.3   -91.9    -86(BI)   -108.3(anti)   173.3(C2'-endo) ~C2'-endo     1.36    1.79   18.87
 6     A B.DA18            -61.5   175.7    45.3   113.1   166.4   -91.2   -102(BI)   -112.4(anti)   127.5(C1'-exo)  ~C2'-endo     2.20    2.33   18.64
 7     T B.DT19            -49.1   178.5    51.6   127.6  -172.8  -106.8    -66(BI)   -115.9(anti)   143.0(C1'-exo)  ~C2'-endo     1.86    2.11   19.09
 8     T B.DT20            -44.5   171.8    43.4   135.3  -164.1  -106.8    -57(BI)   -110.1(anti)   151.4(C2'-endo) ~C2'-endo     1.45    1.84   19.32
 9     C B.DC21            -58.1   159.5    51.3    95.5  -170.1   -81.3    -89(BI)   -124.4(anti)    93.3(O4'-endo)    ....       1.97    2.48   17.95
 10    G B.DG22            -67.1   177.0    45.3   143.5  -148.6  -173.0     24(..)    -86.2(anti)   151.2(C2'-endo) ~C2'-endo     1.96    1.90   23.90
 11    C B.DC23            -57.2   129.7    51.4    83.8  -161.3   -77.6    -84(BI)   -150.3(anti)    17.5(C3'-endo)  ~C3'-endo    3.83    4.34   18.70
 12    G B.DG24            -62.1   169.8    52.8    87.7     ---     ---     ---      -141.3(anti)    13.7(C3'-endo)  ~C3'-endo     ---     ---     ---

******************************************************************************************
Virtual eta/theta torsion angles:

  eta:    C4'(i-1)-P(i)-C4'(i)-P(i+1)
  theta:  P(i)-C4'(i)-P(i+1)-C4'(i+1)
    [Ref: Olson (1980): "Configurational statistics of polynucleotide chains.
          An updated virtual bond model to treat effects of base stacking."
          Macromolecules, 13(3):721-728]

  eta':   C1'(i-1)-P(i)-C1'(i)-P(i+1)
  theta': P(i)-C1'(i)-P(i+1)-C1'(i+1)
    [Ref: Keating et al. (2011): "A new way to see RNA." Quarterly Reviews
          of Biophysics, 44(4):433-466]

  eta":   base(i-1)-P(i)-base(i)-P(i+1)
  theta": P(i)-base(i)-P(i+1)-base(i+1)

          nt                eta   theta     eta'  theta'    eta"  theta"
 1     C A.DC1               ---     ---     ---     ---     ---     ---
 2     G A.DG2             165.5  -161.7  -165.6  -177.4  -126.7  -112.3
 3     C A.DC3             179.5  -145.2  -171.9  -147.9  -105.8  -123.9
 4     G A.DG4             158.9  -162.8  -164.8  -179.1  -135.8  -131.6
 5     A A.DA5             174.5  -128.4  -163.1  -149.5  -114.7  -104.0
 6     A A.DA6             168.1  -148.3  -165.5  -160.7  -112.5  -121.4
 7     T A.DT7             179.3  -155.6  -162.3  -163.0  -119.1  -121.9
 8     T A.DT8             172.1  -161.9  -165.4  -168.5  -120.3  -121.7
 9     C A.DC9            -173.0  -123.0  -152.9  -132.0  -106.6   -93.2
 10    G A.DG10            174.2   158.6  -153.5   154.1  -115.5  -152.3
 11    C A.DC11            174.4  -105.9  -158.7  -137.8  -113.5  -106.3
 12    G A.DG12              ---     ---     ---     ---     ---     ---

 1     C B.DC13              ---     ---     ---     ---     ---     ---
 2     G B.DG14            166.8  -152.9  -169.8  -166.3  -123.0  -106.1
 3     C B.DC15            174.1  -170.1  -172.6  -164.6  -104.0  -144.2
 4     G B.DG16            155.8  -134.3  -157.9  -167.2  -135.1  -111.4
 5     A B.DA17           -177.5  -121.1  -153.7  -142.5   -94.4  -102.8
 6     A B.DA18            164.2  -162.9  -170.6  -170.2  -122.5  -128.1
 7     T B.DT19           -178.9  -149.1  -158.1  -158.7  -114.1  -116.5
 8     T B.DT20            177.8  -138.9  -158.3  -153.9  -113.9  -104.5
 9     C B.DC21            173.0  -152.9  -171.8  -151.0  -118.0  -122.5
 10    G B.DG22            163.1   173.9  -167.2   156.7  -137.5  -141.3
 11    C B.DC23            167.8  -144.6   169.4  -150.8  -124.1  -124.9
 12    G B.DG24              ---     ---     ---     ---     ---     ---

******************************************************************************************
Sugar conformational parameters:

  v0: C4'-O4'-C1'-C2'
  v1: O4'-C1'-C2'-C3'
  v2: C1'-C2'-C3'-C4'
  v3: C2'-C3'-C4'-O4'
  v4: C3'-C4'-O4'-C1'

  tm: the amplitude of pucker
  P:  the phase angle of pseudorotation
    [Ref: Altona & Sundaralingam (1972): "Conformational analysis
          of the sugar ring in nucleosides and nucleotides. A new
          description using the concept of pseudorotation."
          J Am Chem Soc, 94(23):8205-8212]

          nt                 v0      v1      v2      v3      v4      tm      P   Puckering
 1     C A.DC1             -20.3    33.1   -33.1    22.1    -1.2    34.5   163.5  C2'-endo
 2     G A.DG2             -23.9    36.6   -34.9    22.3     0.8    37.1   160.0  C2'-endo
 3     C A.DC3             -24.8     5.6    13.9   -28.7    33.8    33.3    65.3   C4'-exo
 4     G A.DG4             -25.7    37.1   -33.9    20.1     3.4    37.1   156.2  C2'-endo
 5     A A.DA5             -20.9    31.2   -29.3    17.7     2.0    31.6   158.0  C2'-endo
 6     A A.DA6             -22.6    30.8   -27.2    14.8     4.8    30.6   152.6  C2'-endo
 7     T A.DT7             -34.2    33.1   -20.1     0.8    20.9    34.8   125.2   C1'-exo
 8     T A.DT8             -35.8    35.4   -22.0     1.8    21.1    36.7   126.9   C1'-exo
 9     C A.DC9             -21.5    31.7   -29.8    18.0     2.0    32.2   157.9  C2'-endo
 10    G A.DG10            -36.1    45.1   -36.0    16.8    11.4    43.6   145.6  C2'-endo
 11    C A.DC11            -21.3    35.1   -35.2    23.4    -1.5    36.6   163.8  C2'-endo
 12    G A.DG12              4.1    12.3   -22.5    25.4   -18.8    25.4   207.4   C3'-exo

 1     C B.DC13            -18.0    29.3   -29.4    19.4    -1.0    30.7   163.5  C2'-endo
 2     G B.DG14            -31.7    32.2   -21.1     3.2    17.9    33.1   129.6   C1'-exo
 3     C B.DC15            -15.4    -9.0    28.1   -37.8    33.6    37.8    42.0   C4'-exo
 4     G B.DG16            -15.6    28.7   -30.2    21.8    -4.0    30.8   168.8  C2'-endo
 5     A B.DA17            -13.5    27.8   -31.1    23.6    -6.5    31.4   173.3  C2'-endo
 6     A B.DA18            -31.3    31.1   -19.7     1.9    18.6    32.4   127.5   C1'-exo
 7     T B.DT19            -28.9    34.5   -27.4    11.1    11.1    34.2   143.0   C1'-exo
 8     T B.DT20            -28.0    37.2   -32.5    17.2     6.6    37.1   151.4  C2'-endo
 9     C B.DC21            -40.0    25.4    -2.4   -21.1    38.2    40.6    93.3  O4'-endo
 10    G B.DG22            -30.7    41.1   -35.3    19.0     6.9    40.3   151.2  C2'-endo
 11    C B.DC23              0.6   -22.5    34.5   -35.1    21.6    36.2    17.5  C3'-endo
 12    G B.DG24              2.9   -23.2    33.3   -32.7    18.6    34.3    13.7  C3'-endo

******************************************************************************************
Assignment of sugar-phosphate backbone suites

  bin: name of the 12 bins based on [delta(i-1), delta, gamma], where
       delta(i-1) and delta can be either 3 (for C3'-endo sugar) or 2
       (for C2'-endo) and gamma can be p/t/m (for gauche+/trans/gauche-
       conformations, respectively) (2x2x3=12 combinations: 33p, 33t,
       ... 22m); 'inc' refers to incomplete cases (i.e., with missing
       torsions), and 'trig' to triages (i.e., with torsion angle
       outliers)
  cluster: 2-char suite name, for one of 53 reported clusters (46
           certain and 7 wannabes), '__' for incomplete cases, and
           '!!' for outliers
  suiteness: measure of conformer-match quality (low to high in range 0 to 1)

    [Ref: Richardson et al. (2008): "RNA backbone: consensus all-angle
          conformers and modular string nomenclature (an RNA Ontology
          Consortium contribution)." RNA, 14(3):465-481]

          nt             bin    cluster   suiteness
 1     C A.DC1           inc      __       0
 2     G A.DG2           22p      !!       0
 3     C A.DC3           23p      !!       0
 4     G A.DG4           32p      1b       0.573
 5     A A.DA5           22p      !!       0
 6     A A.DA6           22p      !!       0
 7     T A.DT7           trig     !!       0
 8     T A.DT8           trig     !!       0
 9     C A.DC9           trig     !!       0
 10    G A.DG10          22p      !!       0
 11    C A.DC11          22p      4b       0.533
 12    G A.DG12          22m      !!       0

 1     C B.DC13          inc      __       0
 2     G B.DG14          trig     !!       0
 3     C B.DC15          trig     !!       0
 4     G B.DG16          32p      1b       0.476
 5     A B.DA17          trig     !!       0
 6     A B.DA18          trig     !!       0
 7     T B.DT19          trig     !!       0
 8     T B.DT20          22p      !!       0
 9     C B.DC21          23p      !!       0
 10    G B.DG22          32p      1b       0.369
 11    C B.DC23          23p      0a       0.015
 12    G B.DG24          33p      1a       0.790


Concatenated suite string per chain. To avoid confusion of lower case
modified nucleotide name (e.g., 'a') with suite cluster (e.g., '1a'),
use --suite-delimiter to add delimiters (matched '()' by default).

1   A DNA nts=12  C!!G!!C1bG!!A!!A!!T!!T!!C!!G4bC!!G
2   B DNA nts=12  C!!G!!C1bG!!A!!A!!T!!T!!C1bG0aC1aG

News:

Author Topic: [Solved] Batch-processing structures with analyze -t (Read 106233 times)

persalteas

[Solved] Batch-processing structures with analyze -t

xiangjun

Re: Batch-processing structures with analyze -t

persalteas

Re: Batch-processing structures with analyze -t

xiangjun

Re: [Solved] Batch-processing structures with analyze -t