3DNA Forum

Questions and answers => RNA structures (DSSR) => Topic started by: jaswinder.singh on December 01, 2018, 01:40:23 am

Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Title: Non canonical RNA pairs in bpseq format
Post by: jaswinder.singh on December 01, 2018, 01:40:23 am
Hi,
I am working on RNA secondary structure predictions problem using deep learning. For that I need  RNA contact map which include all kind of possible pairs. I am using x3dna-dssr to get RNA secondary structure.  Is there any possible way to get to both canonical and  non canonical pairs of  RNA in bpseq format using DSSR software as .bpseq and .ct output file do not contains non canonical pairs? 

I am bit confused about the notation used in the output file which consists of both canonical and non canonical pairs. In the manual, it is mention that 'A.G19' means guanosine #19 on chain ‘A', but the output file that I got for '1hr2' RNA, nt index is off-set by some number. In the output file for this RNA index of first pair is A.U106 and A.G215. Here first two notation are fine. But notation third is 106 and 215 is off-set by some number. I have attached output file that I got for '1hr2' RNA.

Thanks

Title: Re: Non canonical RNA pairs in bpseq format
Post by: xiangjun on December 01, 2018, 09:59:40 am
Quote
Is there any possible way to get to both canonical and  non canonical pairs of  RNA in bpseq format using DSSR software as .bpseq and .ct output file do not contains non canonical pairs?

The .bpseq and .ct file formats are for RNA secondary structure, which is defined by canonical pairs. I'm not convinced that it is a good idea to extend DSSR-derived output files of .bpseq and .ct with non-canonical pairs included. With the pairing information from DSSR, you could (easily) write a utility program/script for your particular needs. You're welcome to contribute back your software so other users may benefit from your effort.

Quote
but the output file that I got for '1hr2' RNA, nt index is off-set by some number. In the output file for this RNA index of first pair is A.U106 and A.G215. Here first two notation are fine. But notation third is 106 and 215 is off-set by some number.

I am confused by this part of your questions. The shorthand A.U106 notation means U 106 on chain A. Here 106 is the sequence number as defined in the input PDB or mmCIF file. The first pair in your attached file is between A.U106 and A.G215 (see the attached image). There is no ambiguity here, as far as I'm concerned.

Best regards,

Xiang-Jun

Created and maintained by Dr. Xiang-Jun Lu [律祥俊] (xiangjun@x3dna.org)
The Bussemaker Laboratory at the Department of Biological Sciences, Columbia University.