Figure S8: The linear (arc) secondary structure diagram of the RNA-DNA hybrid structure in the CRISPR Cas9-sgRNA-DNA ternary complex (PDB id: 4oo8), annotated with DSSR-derived dot-bracket notation and key structural elements. The target DNA base se- quence is colored red, and the chain switch from sgRNA to DNA is marked by the dotted vertical line. DSSR detects no junction loops in this hybrid structure because of the chain break.
Starting from "
4oo8.pdb" downloaded from RCSB PDB, here is the script to get the secondary structure files to be rendered using VARNA.
pdb_frag B 1:97 C 1:20 4oo8.pdb 4oo8-BC.pdb
x3dna-dssr -i=4oo8-BC.pdb -o=4oo8-BC.out --prefix=4oo8-BC
The DSSR-derived
4oo8-BC-2ndstrs.ct file retains residue numbers as in the original PDB file. Here the RNA chain is numbered from 1 to 97, whilst the DNA chain is numbered from 1 to 20. The
.ct file, as shown below, is used for rendering with VARNA.
117 ENERGY = 0.0 [4oo8-BC] -- secondary structure derived by DSSR
1 G 0 1 117 1
2 G 1 2 116 2
3 A 2 3 115 3
4 A 3 4 114 4
5 A 4 5 113 5
6 U 5 6 112 6
7 U 6 7 111 7
8 A 7 8 110 8
9 G 8 9 109 9
10 G 9 10 108 10
11 U 10 11 107 11
12 G 11 12 106 12
13 C 12 13 105 13
14 G 13 14 104 14
15 C 14 15 103 15
16 U 15 16 102 16
17 U 16 17 101 17
18 G 17 18 100 18
19 G 18 19 99 19
20 C 19 20 98 20
21 G 20 21 50 21
22 U 21 22 49 22
23 U 22 23 48 23
24 U 23 24 47 24
25 U 24 25 46 25
26 A 25 26 45 26
27 G 26 27 0 27
28 A 27 28 0 28
29 G 28 29 40 29
30 C 29 30 39 30
31 U 30 31 38 31
32 A 31 32 37 32
33 G 32 33 0 33
34 A 33 34 0 34
35 A 34 35 0 35
36 A 35 36 0 36
37 U 36 37 32 37
38 A 37 38 31 38
39 G 38 39 30 39
40 C 39 40 29 40
41 A 40 41 0 41
42 A 41 42 0 42
43 G 42 43 0 43
44 U 43 44 0 44
45 U 44 45 26 45
46 A 45 46 25 46
47 A 46 47 24 47
48 A 47 48 23 48
49 A 48 49 22 49
50 U 49 50 21 50
51 A 50 51 0 51
52 A 51 52 0 52
53 G 52 53 61 53
54 G 53 54 60 54
55 C 54 55 58 55
56 U 55 56 0 56
57 A 56 57 0 57
58 G 57 58 55 58
59 U 58 59 0 59
60 C 59 60 54 60
61 C 60 61 53 61
62 G 61 62 0 62
63 U 62 63 0 63
64 U 63 64 0 64
65 A 64 65 0 65
66 U 65 66 0 66
67 C 66 67 0 67
68 A 67 68 0 68
69 A 68 69 80 69
70 C 69 70 79 70
71 U 70 71 78 71
72 U 71 72 77 72
73 G 72 73 0 73
74 A 73 74 0 74
75 A 74 75 0 75
76 A 75 76 0 76
77 A 76 77 72 77
78 A 77 78 71 78
79 G 78 79 70 79
80 U 79 80 69 80
81 G 80 81 0 81
82 G 81 82 96 82
83 C 82 83 95 83
84 A 83 84 94 84
85 C 84 85 93 85
86 C 85 86 92 86
87 G 86 87 91 87
88 A 87 88 0 88
89 G 88 89 0 89
90 U 89 90 0 90
91 C 90 91 87 91
92 G 91 92 86 92
93 G 92 93 85 93
94 U 93 94 84 94
95 G 94 95 83 95
96 C 95 96 82 96
97 U 96 97 0 97
98 G 0 98 20 1
99 C 98 99 19 2
100 C 99 100 18 3
101 A 100 101 17 4
102 A 101 102 16 5
103 G 102 103 15 6
104 C 103 104 14 7
105 G 104 105 13 8
106 C 105 106 12 9
107 A 106 107 11 10
108 C 107 108 10 11
109 C 108 109 9 12
110 T 109 110 8 13
111 A 110 111 7 14
112 A 111 112 6 15
113 T 112 113 5 16
114 T 113 114 4 17
115 T 114 115 3 18
116 C 115 116 2 19
117 C 116 0 1 20
Note:
- Among the two copies of the tertiary complex, the RNA chain B and DNA chain C are extracted to file 4oo8-BC.pdb for analysis.
- Among the three files (4oo8-BC-2ndstrs.bpseq, 4oo8-BC-2ndstrs.ct and 4oo8-BC-2ndstrs.dbn) for secondary structure representations, the .ct format is more informative.
- There are many options for rendering a secondary structure in VARNA. Here the linear form is used, with a number-period of three, and simple 'line' base-pair style etc.
- The VARNA-exported .svg file is then read into InkSkype for further revisions and annotation, including alignment of the dot-bracket notation with the base sequence, and labeling the six stems and the CUAG diloop etc.
- For completeness, here is the tarball file containing all the data files and the script ("tasks"): supp-fig8-crispr-4oo8.tar.gz