Author Topic: mutate_bases (Read 79851 times)

xiangjun · « **on:** February 11, 2012, 06:47:28 pm »

Note added on 2020-05-12: mutate_bases is now obsoleted by DSSR 2.0.

See also:

Blog post "Context-aware in silico base mutations enabled by DSSR 2.0"
The thread "Mutate_Bases: Option to mutate all residues of the same type to another"

The utility program mutate_bases can be used to mutate bases in nucleic-acid-containing structures (DNA, RNA, and their complexes with ligands and proteins). It has two key and unique features: (1) the sugar-phosphate backbone conformation is untouched; (2) the base reference frame (position and orientation) is conserved, i.e., the mutated structure shares the same base-pair/step parameters as those of the native structure.

The mutate_bases program was created in response to repeated requests from 3DNA users over the years. Written as a standalone ANSI C program, it is on a par with other major 3DNA components (e.g., find_pair, analyze, rebuild and fiber). The program was first released as a supplement to 3DNA v2.0, and then became an essential part of the v2.1 release.

Overall, mutate_bases has been designed to solve the in silico base mutation problem in a practical sense: robust and efficient, getting its job done and then out of the way. The program can have many possible applications: in addition to perform base-pair mutations in DNA-protein complexes, it should also prove handy in RNA modeling and in providing initial structures for QM/MM/MD energy calculations, and in DNA/RNA modeling studies.

The standard command line help (mutate_bases -h) is as below:

NAME
        mutate_bases -- mutate bases, with backbone conformation unchanged
SYNOPSIS
        mutate_bases [OPTIONS] mutinfo pdbfile outfile
DESCRIPTION
        perform in silico base mutations of 3-dimensional nucleic acid
        structures, with two key and unique features: (1) the sugar-
        phosphate backbone conformation is untouched; (2) the base
        reference frame (position and orientation) is reserved, i.e.,
        the mutated structure shares the same base-pair/step
        parameters as the original one.
        -e    enumeration of all bases in the structure
        -l    name of file, containing list of mutations
        'mutinfo' can contain upto 5 fields for each mutation
                  [name=residue_name] [icode=insertion_code]
                  chain=chain_id seqnum=residue_number
                  mutation=residue_name
            The five fields per mutation can be in any order or CaSe.
            Each field can be abbreviated to its first character.
            Multiple mutations specified per line are separated by ';'.
            Fields in [] (i.e., name and icode) are optional.
            Mutation info should be QUOTED to be taken as one entry.
INPUT
        Nucleic-acid-containing structure file in PDB format
EXAMPLES
            # mutate G2 in chain A of B-DNA 355d to Adenine
        mutate_bases "c=a s=2 m=DA" 355d.pdb 355d_G2A.pdb
            # mutate the second base-pair G-C to A-T in 355d
        mutate_bases "c=a s=2 m=DA; c=B s=23 m=DT" 355d.pdb 355d_GC2AT.pdb
            # the above also generates file 'mutations.dat'
            # and the following command gives the same results
        mutate_bases -l mutations.dat 355d.pdb 355d_GC2AT_v2.pdb
            # mutate C74 in chain A of tRNA 1evv to U       
        mutate_bases "c=A s=74 m=U" 1evv.pdb 1evv_C74U.pdb
            # list all bases to be tailored for mutation
        mutate_bases -e 355d.pdb stdout
OUTPUT
        mutated structure in PDB format, sharing the same backbone
        conformation and base pair parameters as the original one.
SEE ALSO
        analyze, find_pair, rebuild
AUTHOR
        3DNA v2.1 (c) 2012 Dr. Xiang-Jun Lu (http://x3dna.org)

Now let's take advantage of the web to illustrate the key features of mutate_bases using a set of worked examples. The scripts and corresponding data files & images are attached, so you can repeat the procedures in order to have a better understanding of how the program works.

In our GpU dinucleotide platform paper, we reported a previously unnoticed intra-dinucleotide sugar-phosphate H-bond that is unique to the GpU platform. This O2′(G)···O2P(U) H-bond readily rationalizes the over 60% occurrence of GpU over other platforms (e.g., ApA and UpC). Moreover, this H-bond has recently been validated by state-of-the-art quantum-chemical techniques.

In this section, we will use mutate_bases to answer the questions of (1) why GpU, not GpT? i.e., why the GpU platform is RNA-specific? (2) why no UpG platforms observed? i.e., why the GpU platform is directional? The GpU platform (1msy_gu.pdb) is derived from PDB entry 1msy. The figure below shows the identity of the two nucleotides (G2655 and U2656 on chain A) and names of the base atoms.

"GpU platform" title="GpU platform"

Why no GpT platform?
mutate_bases "c=a s=2656 m=t" 1msy_gu.pdb 1msy_gt.pdb
With the above command, we mutate U (which is residue #2656 on chain A) to T (see figure below). Clearly, the methyl group of T protrudes into the pocket, causing steric clash. Thus GpT is incompatible with the platform conformation.
Why no UpG platform?
mutate_bases "c=a s=2655 m=u; c=a s=2656 m=g" 1msy_gu.pdb 1msy_ug.pdb
Using the above command, we mutate G (which is residue #2655 on chain A) to U, and U to G simultaneously. That's what the plural 's' in mutate_bases stands for. From the figure below, one can see clearly that no intra-base H-bond is now possible, consistent with the fact that no UpG platform has been observed.

Note that the above command also generates a file named 'mutations.dat', which has the following content:
```
c=a s=2655 m=u
c=a s=2656 m=g
```
You can then use the -l option of mutate_bases as such:
mutate_bases -l mutations.dat 1msy_gu.pdb 1msy_ug2.pdb.
The two mutated PDB files, 1msy_ug.pdb and 1msy_ug2.pdb, are identical.

You can run find_pair and analyze to the raw and mutated PDB files and verify that they indeed have the same base-pair parameters and backbone conformation.

To summarize, here is the command-script:

Code: [Select]

mutate_bases "c=a s=2656 m=t" 1msy_gu.pdb 1msy_gt.pdb
mutate_bases "c=a s=2655 m=u; c=a s=2656 m=g" 1msy_gu.pdb 1msy_ug.pdb
mutate_bases -l mutations.dat 1msy_gu.pdb 1msy_ug2.pdb

The PDB files referred:

Note all the images used in this post were generated using Jmol. As much I like RasMol (v2.6.4), I am now gradually switching to Jmol and PyMOL.

Note added on Monday, July 17, 2017:

Single quotes in mutate_bases command-line option have been replaced by double quotes so that the program also works in native Windows. See follow-up messages below.

Hari Seldon · « **Reply #1 on:** July 17, 2017, 04:52:52 pm »

I cannot get mutate_bases to work. I am running Windows 10 ConEmu (x64) with 3DNA installed using this:
http://forum.x3dna.org/faqs/how-to-set-up-3dna-on-windows/

I think I installed my 3DNA correctly because find_pair seems to work:

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
> find_pair 1tc3.pdb 1tc3.inp

handling file <1tc3.pdb>

Time used: 00:00:00:00

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
> analyze 1tc3.inp

......Processing structure #1: <1tc3.inp>......
missing ' P ' atom : residue name ' DA', chain B, number [ 101 ]
missing ' OP1' atom : residue name ' DA', chain B, number [ 101 ]
missing ' OP2' atom : residue name ' DA', chain B, number [ 101 ]
missing ' P ' atom : residue name ' DA', chain A, number [ 1 ]
missing ' P ' atom : residue name ' DA', chain B, number [ 101 ]

Time used: 00:00:00:01

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
> mutate_bases 'c=a s=2656 m=t' 1msy_gu.pdb 1msy_gt.pdb
===========================================================================
NAME
mutate_bases -- mutate bases, with backbone conformation unchanged
SYNOPSIS
mutate_bases [OPTIONS] mutinfo pdbfile outfile
DESCRIPTION
perform in silico base mutations of 3-dimensional nucleic acid
structures, with two key and unique features: (1) the sugar-
phosphate backbone conformation is untouched; (2) the base
reference frame (position and orientation) is reserved, i.e.,
the mutated structure shares the same base-pair/step
parameters as the original one.
-e enumeration of all bases in the structure
-l name of file which contains a list of mutations
'mutinfo' can contain upto 5 fields for each mutation
[name=residue_name] [icode=insertion_code]
chain=chain_id seqnum=residue_number
mutation=residue_name
The five fields per mutation can be in any order or CaSe,
but must be separated by white space(s) or comma.
Each field can be abbreviated to its first character.
Multiple mutations on command line are separated by ';'.
Fields in [] (i.e., name and icode) are optional.
Mutation info should be QUOTED to be taken as one entry.
INPUT
Nucleic-acid-containing structure file in PDB format
EXAMPLES
# mutate G2 in chain A of B-DNA 355d to Adenine
mutate_bases 'c=a s=2 m=DA' 355d.pdb 355d_G2A.pdb
# mutate the second base-pair G-C to A-T in 355d
mutate_bases 'c=a s=2 m=DA; c=B s=23 m=DT' 355d.pdb 355d_GC2AT.pdb
# the above also generates file 'mutations.dat'
# and the following command gives the same results
mutate_bases -l mutations.dat 355d.pdb 355d_GC2AT_v2.pdb
# mutate C74 in chain A of tRNA 1evv to U
mutate_bases 'c=A s=74 m=U' 1evv.pdb 1evv_C74U.pdb
# list all bases to be tailored for mutation
mutate_bases -e 355d.pdb stdout
# enumerate all bases contained in 355d.pdb
OUTPUT
mutated structure in PDB format, sharing the same backbone
conformation and base pair parameters as the original one.
SEE ALSO
analyze, find_pair, rebuild
AUTHOR
3DNA v2.3.1-2017jun24, created and maintained by Xiang-Jun Lu (PhD)

Please post questions/comments on the 3DNA Forum: http://forum.x3dna.org/
Please check 'http://x3dna.org/citations' on how to cite 3DNA --- THANKS!
===========================================================================

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
>

xiangjun · « **Reply #2 on:** July 17, 2017, 06:49:07 pm »

It is a Windows peculiar again!

Try replacing single quote with double quote in the mutate_bases command again, and it should work. I've just tested the following example.

> mutate_bases 'c=a s=2 m=DA' 355d.pdb 355d_G2A.pdb
REM as you saw ... print the command-line help message

REM -- the following command works as expected
> mutate_bases "c=a s=2 m=DA" 355d.pdb 355d_G2A.pdb
   1    A:...2@:[@@@]    ===> [ DA.A]   ---done---
    Number of mutations: 1

Time used: 00:00:00:00

Since double quote also work in Linux/Mac OS, I'm refining the help message to use double quotes instead of single quotes.

Thanks for using 3DNA on Windows.

Xiang-Jun

Hari Seldon · « **Reply #3 on:** July 18, 2017, 03:24:03 am »

Thank you so much for the quick reply. The double quotes get the mutate_bases to run and produce 4jvh_250749_RNA_mut.pdb, but it fails to mutate the pdbs are identical:

# trying to mutate an A to a U

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
> mutate_bases "c=D s=2 m=U" 4jvh_250749_RNA.pdb 4jvh_250749_RNA_mut.pdb
Mutation entry 1 D:...2@:[@@@] has no PDB residue match
1 D:...2@:[@@@] ===> [ U.U] residue to be mutated not in the PDB file
Number of mutations: 0

Time used: 00:00:00:00

xiangjun · « **Reply #4 on:** July 18, 2017, 07:44:24 am »

Did you read the diagnostic message? The -e option may help.

Hari Seldon · « **Reply #5 on:** July 18, 2017, 03:37:37 pm »

How do I do -e?

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
> mutate_bases -e "c=D s=2 m=U" 4jvh_250749_RNA.pdb 4jvh_250749_RNA_mut.pdb
===========================================================================
NAME
mutate_bases -- mutate bases, with backbone conformation unchanged
SYNOPSIS
mutate_bases [OPTIONS] mutinfo pdbfile outfile
DESCRIPTION
perform in silico base mutations of 3-dimensional nucleic acid
structures, with two key and unique features: (1) the sugar-
phosphate backbone conformation is untouched; (2) the base
reference frame (position and orientation) is reserved, i.e.,
the mutated structure shares the same base-pair/step
parameters as the original one.
-e enumeration of all bases in the structure
-l name of file which contains a list of mutations
'mutinfo' can contain upto 5 fields for each mutation
[name=residue_name] [icode=insertion_code]
chain=chain_id seqnum=residue_number
mutation=residue_name
The five fields per mutation can be in any order or CaSe,
but must be separated by white space(s) or comma.
Each field can be abbreviated to its first character.
Multiple mutations on command line are separated by ';'.
Fields in [] (i.e., name and icode) are optional.
Mutation info should be QUOTED to be taken as one entry.
INPUT
Nucleic-acid-containing structure file in PDB format
EXAMPLES
# mutate G2 in chain A of B-DNA 355d to Adenine
mutate_bases 'c=a s=2 m=DA' 355d.pdb 355d_G2A.pdb
# mutate the second base-pair G-C to A-T in 355d
mutate_bases 'c=a s=2 m=DA; c=B s=23 m=DT' 355d.pdb 355d_GC2AT.pdb
# the above also generates file 'mutations.dat'
# and the following command gives the same results
mutate_bases -l mutations.dat 355d.pdb 355d_GC2AT_v2.pdb
# mutate C74 in chain A of tRNA 1evv to U
mutate_bases 'c=A s=74 m=U' 1evv.pdb 1evv_C74U.pdb
# list all bases to be tailored for mutation
mutate_bases -e 355d.pdb stdout
# enumerate all bases contained in 355d.pdb
OUTPUT
mutated structure in PDB format, sharing the same backbone
conformation and base pair parameters as the original one.
SEE ALSO
analyze, find_pair, rebuild
AUTHOR
3DNA v2.3.1-2017jun24, created and maintained by Xiang-Jun Lu (PhD)

Please post questions/comments on the 3DNA Forum: http://forum.x3dna.org/
Please check 'http://x3dna.org/citations' on how to cite 3DNA --- THANKS!
===========================================================================

xiangjun · « **Reply #6 on:** July 18, 2017, 05:06:19 pm »

Just follow the example. Report back how it goes.

Hari Seldon · « **Reply #7 on:** July 18, 2017, 05:10:47 pm »

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
> mutate_bases -e 4jvh_250749_RNA.pdb stdout
# add m=BASE_NAME (up-to three letters) to an entry for mutation
# e.g., change the line
# chain=A snum=2 name=C
# to
# chain=A snum=2 name=C mutation=G
# to mutate base C (on chain A and with residue number 2) to G

# Empty or comment (starting with #s) lines are ignored

chain=D snum=4 name=A # D:...4_:[..A]A A
chain=D snum=5 name=C # D:...5_:[..C]C C
chain=D snum=6 name=U # D:...6_:[..U]U U
chain=D snum=7 name=A # D:...7_:[..A]A A
chain=D snum=8 name=A # D:...8_:[..A]A A
chain=D snum=9 name=C # D:...9_:[..C]C C
chain=D snum=10 name=A # D:..10_:[..A]A A
chain=D snum=11 name=A # D:..11_:[..A]A A

Time used: 00:00:00:00

linux@DESKTOP-JS6SA18 D:\Documents\NYU\Projects\Gunsalus\Code\QKI5dimer
> mutate_bases -e 4jvh_250749_RNA_mut.pdb stdout
# add m=BASE_NAME (up-to three letters) to an entry for mutation
# e.g., change the line
# chain=A snum=2 name=C
# to
# chain=A snum=2 name=C mutation=G
# to mutate base C (on chain A and with residue number 2) to G

# Empty or comment (starting with #s) lines are ignored

chain=D snum=4 name=A # D:...4_:[..A]A A
chain=D snum=5 name=C # D:...5_:[..C]C C
chain=D snum=6 name=U # D:...6_:[..U]U U
chain=D snum=7 name=A # D:...7_:[..A]A A
chain=D snum=8 name=A # D:...8_:[..A]A A
chain=D snum=9 name=C # D:...9_:[..C]C C
chain=D snum=10 name=A # D:..10_:[..A]A A
chain=D snum=11 name=A # D:..11_:[..A]A A

Time used: 00:00:00:00

xiangjun · « **Reply #8 on:** July 18, 2017, 05:23:23 pm »

Dest the list contain a nucleotide with chain id 'D' and residue number of '2', as you originally specified?

Quote

> mutate_bases "c=D s=2 m=U" 4jvh_250749_RNA.pdb 4jvh_250749_RNA_mut.pdb
Mutation entry 1 D:...2@:[@@@] has no PDB residue match
1 D:...2@:[@@@] ===> [ U.U] residue to be mutated not in the PDB file
Number of mutations: 0

Now does the above message make sense to you? As a new user of the program, what would you suggest to make the message clearer?

Hari Seldon · « **Reply #9 on:** July 19, 2017, 01:47:21 am »

I am surprised that snum started at 4 instead of 1.

I think the error message you already have is adequate, because it says "residue to be mutated not in the PDB file" when I pick an snum that is not in the pdb file.

News:

Author Topic: mutate_bases (Read 79851 times)

xiangjun

mutate_bases

Hari Seldon

Re: mutate_bases

xiangjun

Re: mutate_bases

Hari Seldon

Re: mutate_bases

xiangjun

Re: mutate_bases

Hari Seldon

Re: mutate_bases

xiangjun

Re: mutate_bases

Hari Seldon

Re: mutate_bases

xiangjun

Re: mutate_bases

Hari Seldon

Re: mutate_bases