Author Topic: Non canonical RNA pairs in bpseq format (Read 44678 times)

jaswinder.singh · « **on:** December 01, 2018, 01:40:23 am »

Hi,
I am working on RNA secondary structure predictions problem using deep learning. For that I need RNA contact map which include all kind of possible pairs. I am using x3dna-dssr to get RNA secondary structure. Is there any possible way to get to both canonical and non canonical pairs of RNA in bpseq format using DSSR software as .bpseq and .ct output file do not contains non canonical pairs?

I am bit confused about the notation used in the output file which consists of both canonical and non canonical pairs. In the manual, it is mention that 'A.G19' means guanosine #19 on chain ‘A', but the output file that I got for '1hr2' RNA, nt index is off-set by some number. In the output file for this RNA index of first pair is A.U106 and A.G215. Here first two notation are fine. But notation third is 106 and 215 is off-set by some number. I have attached output file that I got for '1hr2' RNA.

Thanks

xiangjun · « **Reply #1 on:** December 01, 2018, 09:59:40 am »

Quote

Is there any possible way to get to both canonical and non canonical pairs of RNA in bpseq format using DSSR software as .bpseq and .ct output file do not contains non canonical pairs?

The .bpseq and .ct file formats are for RNA secondary structure, which is defined by canonical pairs. I'm not convinced that it is a good idea to extend DSSR-derived output files of .bpseq and .ct with non-canonical pairs included. With the pairing information from DSSR, you could (easily) write a utility program/script for your particular needs. You're welcome to contribute back your software so other users may benefit from your effort.

Quote

but the output file that I got for '1hr2' RNA, nt index is off-set by some number. In the output file for this RNA index of first pair is A.U106 and A.G215. Here first two notation are fine. But notation third is 106 and 215 is off-set by some number.

I am confused by this part of your questions. The shorthand A.U106 notation means U 106 on chain A. Here 106 is the sequence number as defined in the input PDB or mmCIF file. The first pair in your attached file is between A.U106 and A.G215 (see the attached image). There is no ambiguity here, as far as I'm concerned.

Best regards,

Xiang-Jun

News:

Author Topic: Non canonical RNA pairs in bpseq format (Read 44678 times)

jaswinder.singh

Non canonical RNA pairs in bpseq format

xiangjun

Re: Non canonical RNA pairs in bpseq format