Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Netiquette · Download · News · Gallery · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · Video Overview · DSSR v2.5.4 (DSSR Manual) · Homepage

Messages - xiangjun

Pages: 1 ... 51 52 [53] 54 55 ... 65
1301
MD simulations / Re: How to properly use x3dna_ensemble?
« on: March 27, 2012, 01:15:31 pm »
Glad to know that you've made some progress. However, the message as shown below still bothers me:
Quote
ruby 5276 child_info_fork::abort: address space needed by 'etc.so' (0x370000) is already occupied
A quick Google search turns out quite a few hits concerning Ruby, and Cygwin on Windows. Will following Cygwin FAQ #4.44 "How do I fix fork() failures?" solve your problem? Specifically, the following sentence seems to address this problem:
Quote
Read the 'rebase' package README in /usr/share/doc/rebase/, and follow the instructions there to run 'rebaseall'

Additionally, what version of Ruby are you using?
Code: [Select]
ruby -v
I am switching from Perl to Ruby as the scripting language for 3DNA, hopefully this won't cause practical issues for Windows users.

Xiang-Jun

1302
MD simulations / Re: How to properly use x3dna_ensemble?
« on: March 27, 2012, 11:02:16 am »
Hi Nikolay,

The first two steps are fine. The third step is weird -- are you using MinGW or Cygwin version? This is a problem not necessarily specific to 3DNA, but obviously I'd like to see a solution to it.

Xiang-Jun

1303
MD simulations / Re: How to properly use x3dna_ensemble?
« on: March 27, 2012, 10:11:29 am »
Hi Nikolay,

Glad to hear that you've made progress. However, from my understanding of what you described, it seems something is still not quite right. The fixed-name file bp_step.par contains only the parameters for a single structure (snapshot), not the whole ensemble. It is a bit more difficult to explain the details in text, so I would suggest you repeat the examples in the directory $X3DNA/examples/ensemble/md/:
Code: [Select]
cd $X3DNA/examples/ensemble/md/
x3dna_ensemble analyze -h
x3dna_ensemble extract -h
Once you understand how the examples work, you should be able to apply the same idea to the analysis of your MD trajectories. Of course, if you have any questions, please do not hesitate to post back at the forum.

Xiang-Jun

PS: command-line help
The help page for x3dna_ensemble analyze
Code: [Select]
------------------------------------------------------------------------
Analyze a MODEL/ENDMDL delineated ensemble of NMR structures or MD
trajectories. All models must correspond to different conformations of
the same molecule. A template base-pair input file, generated with
'find_pair' and corrected manually as necessary, must be provided.

Usage:
        x3dna_ensemble analyze options
Examples:
        x3dna_ensemble analyze -b bpfile.dat -e sample_md0.pdb
             # 21 models (0-20); output (default): 'ensemble_example.out'
             # also generate 'model_list.dat', see example below
        x3dna_ensemble analyze -b bpfile.dat -m model_list.dat -o ensemble_example2.out
             # diff ensemble_example.out ensemble_example2.out

        x3dna_ensemble analyze -b bpfile.dat -p 'pdbdir/model_*.pdb' -o ensemble_example3.out
             # note to quote the -p option; 20 models (1-20)
             # also generate 'pdb_list.dat', see example below
        x3dna_ensemble analyze -b bpfile.dat -l pdb_list.dat -o ensemble_example4.out
             # diff ensemble_example3.out ensemble_example4.out
             # note the order of the models: 1, 10..19, 2, 20, 3..9
Options:
------------------------------------------------------------------------
    --bpfile, -b <s>:   Name of file containing base-pairing info
   --outfile, -o <s>:   Output file (default: ensemble_example.out)
  --ensemble, -e <s>:   Ensemble delineated with MODEL/ENDMDL pairs
    --models, -m <s>:   File containing an explicit list of model numbers
   --pattern, -p <s>:   Pattern of model files to process (e.g., *.pdb)
      --list, -l <s>:   File containing an explicit list of models
          --info, -i:   Show only model info in the ensemble [with -e]
          --help, -h:   Show this message

The help page for x3dna_ensemble extract
Code: [Select]
------------------------------------------------------------------------
Extract 3DNA structural parameters of an ensemble of NMR structures or
MD trajectories, after running 'x3dna_ensemble analyze'. The extracted
parameters are intended to be exported into Excel, Matlab and R etc for
further data analysis/visualization.

Usage:
        x3dna_ensemble extract options
Examples:
        x3dna_ensemble extract -l
             # to see a list of all parameters
        x3dna_ensemble extract -p prop
             # for propeller, no need to specify full: -p pr suffices
             # -p 36 also fine (see above); use 'ensemble_example.out'
        x3dna_ensemble extract -p slide -s , -f ensemble_example3.out
             # comma separated, from file 'ensemble_example3.out'
        x3dna_ensemble extract -p roll -s ' ' -n -o roll.dat
             # space separated, no row-label, to file 'roll.dat'
        x3dna_ensemble extract -e 1 -p chi1
             # extract the chi torsion angle of strand I, but exclude
             # those from the two terminal base pairs. For comparison,
             # run also: x3dna_ensemble extract -p chi1
        x3dna_ensemble extract -a
             # extract all parameters, each in a separate file
Options:
------------------------------------------------------------------------
  --separator, -s <s>:   Separator for fields [\t] (default: )
   --par-name, -p <s>:   Name of parameter to extract
   --fromfile, -f <s>:   Parameters file (default: ensemble_example.out)
    --outfile, -o <s>:   File of selected parameter (default: stdout)
   --end-bps, -e <i+>:   Number of end pairs to ignore (default: 0, 0)
            --all, -a:   Extract all parameters into separate files
          --clean, -c:   Clean up parameter files by the -a option
           --list, -l:   List all parameters
        --no-1col, -n:   Delete the first (label) column
           --help, -h:   Show this message


1304
MD simulations / Re: How to properly use x3dna_ensemble?
« on: March 26, 2012, 10:29:30 am »
Hi Nikolay,

I am glad to hear that Gromacs provides the facility to output an ensemble of multiple models delineated by MODEL/ENDMDL. As of 3DNA v2.1beta (currently distributed), the x3dna_ensemble Ruby script can handle multiple structures in a PDB MODEL/ENDMDL ensemble:

Code: [Select]
x3dna_ensemble -h
------------------------------------------------------------------------                                                       
Utilities for the analysis and visualization of an ensemble
    Usage: x3dna_ensemble [-h|-v] sub-command [-h] [options]
    where sub-command must be one of:
        analyze -- analyze MODEL/ENDMDL delineated ensemble (NMR or MD)
        block_image -- generate a base block schematic image
        extract -- extract structural parameters after running 'analyze'
        reorient -- reorient models to a particular frame/orientation
------------------------------------------------------------------------
  --version, -v:   Print version and exit
     --help, -h:   Show this message

Check the $X3DNA/examples/ensemble/ directory for examples, and report back if you have any problem. Note that I still need to add a detailed documentation of the new x3dna_ensemble utility. However, it should be straightforward to play with, and I am always quicker in responding to user's questions than writing doc ...

Xiang-Jun

1305
Hi Hugh,

Thank you so much for contributing back your Perl script that solves your problem, and providing new sample Gaussian-Babel-generated PDB date files. At it turns out, the three PDB files you attached -- AT.pdb, AU.pdb, and GC.pdb -- are all fine with 3DNA. You can verify this point by running find_pair with the -s option:
Code: [Select]
find_pair -s GC.pdb stdout
# and it will output the following:
GC.pdb
GC.outs
    1      # single helix
    2      # number of bases
    1    1 # explicit bp numbering/hetero atoms
    1      # ....>A:...1_:[..G]G
    2      # ....>B:...1_:[..C]C
However, none of the three PDB files contains a base pair, per the default parameters -- check using a molecular graphics viewer like Jmol or PyMOL.

The atom naming issue related to PDB files from computational chemistry packages (e.g. Gaussian) and Babel has appeared a few times in the 3DNA forum. As far as 3DNA is concerned, your effort has led to the first known solution (I am aware of) to this problem. Your question has prompted me to read the article "Open Babel: An open chemical toolbox" and download the latest Open Babel v2.3.1.

Best regards,

Xiang-Jun

1306
Thanks for using 3DNA and your elaborate post -- your attached PDB file helped in uncovering where the problem is.

Quote
1. Is the attached file properly formatted for use with 3DNA?
No, it is not. Specifically, the atom names do not conform to the PDB convention. Using one of the U residues as an example, see the following two images:
Gaussian-Babel PDBStandard PDB
On the left is the U based on Gaussian-Babel generated PDB file, and on the right is based on the standard PDB file. Notice how the standard PDB have names like " N1 " instead of " N  ", and " O2 " instead of " O  " etc. Proper atom names are important for 3DNA to identify which atom is which.

Quote
2. If find_pair does not find a base pair, will it still output the base pair geometry parameters that were calculated?
The problem is not that find_pair misses a pair due to parameter cutoffs, but the residues are not taken as nucleotides at all. Your best bet is simply to make your PDB file standard compliant, then both problems will be gone.

In your attached test.pdb file, there are two uracils, which follow the same atom ordering and naming convention. Could you provide me example files with A, C, G, and T? It may be worthwhile to have a utility program in 3DNA that can convert Gaussian-Babel generated PDB file to the standard format.

Xiang-Jun


1307
Hi Kumutha,

Thanks for posting the DNA sequence and its beautiful secondary structure predicted with mfold. 3DNA per se, and the 3DNA server based on it, does not have the magic (yet) to directly convert such a secondary structure into tertiary structure. However, I do believe 3DNA has functionality to help solve some components of the puzzle. In your case, the long double-helical stem can be approximated with a fiber B-DNA model; then you need to model the loop part at the top and the two terminal nucleotides at the bottom end. You need to assemble the three parts together, and possibly perform some energy minimization to achieve good stereochemistry.

You may find the article "RNA-Puzzles: A CASP-like evaluation of RNA three-dimensional structure prediction" published in the April 2012 [18 (4)] issue of the RNA journal helpful.

Xiang-Jun

1308
MD simulations / Re: How to properly use x3dna_ensemble?
« on: March 22, 2012, 08:31:51 am »
Hi Nikolay,

Thanks for using 3DNA, especially trying the new Ruby x3dna_ensemble utility. Sorry to hear that you are "completely lost". Regarding the convert sub-command, you are ahead of time:
Code: [Select]
x3dna_ensemble convert -h
...to be added for Amber/Gromacs/CHARMM etc
  --package, -p <s>:   Name of MD simulation package (default: amber)
         --help, -h:   Show this message
i.e., this functionality is currently not implemented yet. However, I'd be interested in adding such a converter for GROMACS if you provide me an example (shortened) trajectory file, and let me know the detailed description of the xtc/trr file formats. On the other hand, I believe (or guess) GROMACS should have a facility to convert its native trajectory file to the standard PDB MODEL/ENDMDL format.

Please let me know.

Xiang-Jun


1309
FAQs / How to calculate DNA bending angle?
« on: March 21, 2012, 02:10:48 pm »
DNA bending angle is a frequently used parameter in the literature, often associated with DNA-protein complexes. Nevertheless, 3DNA does not provide a direct measure of the "bending angle" in its output file of structural parameters. The topic is more subtle and complicated than it appears.

On its face, an angle is defined by two vectors; let's call them a and b, and if each is normalized, then the angle (in degrees) between them is: acos(dot(a, b)) * 180/pi. Geometrically, after moving the tails of the two vectors into the same position (e.g., origin), the heads would normally define a plane, unless a and b are strictly parallel (0°) or anti-parallel (180°).

DNA structures are three-dimensional, normally far more complicated than a single number can quantify. The concept of DNA bending angle, as I understand it, is only applicable to DNA structures with two relatively straight fragments (as in CAP-DNA complexes). Under such situations,  one can fit a least-squares (LS) linear helical axis to each of the two fragments, and calculate the angle between them. Towards this end, 3DNA outputs the following section when it judges that the input structure is not strongly curved. Using 355d/bdl084, which is distributed with 3DNA, as an example:
Code: [Select]
Global linear helical axis defined by equivalent C1' and RN9/YN1 atom pairs
Deviation from regular linear helix: 3.30(0.52)
Helix:    -0.127  -0.275  -0.953
HETATM 9998  XS    X X 999      17.536  25.713  25.665
HETATM 9999  XE    X X 999      12.911  15.677  -9.080
Average and standard deviation of helix radius:
P: 9.42(0.82), O4': 6.37(0.85),  C1': 5.85(0.86)

Where the Helix: line gives the normalized vector along the "best-fit" helical axis. The two HETATM records provides the two end points of the helix, and they are directly related to the Helix: line by a simple equation. Following the above example,  we have (Octave/Matlab code):

Code: [Select]
XE = [12.911  15.677  -9.080];
XS = [17.536  25.713  25.665];

dd = XE - XS
%   -4.6250  -10.0360  -34.7450

Helix = dd / norm(dd)
%  -0.12685  -0.27526  -0.95296  ==> [-0.127  -0.275  -0.953]

With the two HETATM records, one can easily add them into the original PDB file to display the helical axis using a molecular graphics programs (e.g., RasMol, Jmol or PyMOL). Moreover, the two helix vectors can be used to reorient the original PDB structure into a view so that one helical fragment lies along the x-axis, and the other in the xy-plane. As documented in detail in recipes #4 on "Automatic identification of double-helical regions in a DNA–RNA junction" of the 2008 3DNA Nature Protocols paper, "The chosen view allows for easy visualization and protractor measurement of the overall bending angle between the two relatively straight helices."

The following points are well worth noting:
  • The LS fitting procedure used in 3DNA follows SCHNAaP, which was based on the algorithm in the well-known NewHelix program, maintained by Dr. Richard Dickerson upto the 1990s. While fitting a global linear helical axis to strongly curved DNA structures makes no sense with derived parameters (NewHelix itself has been replaced by FreeHelix, also from Dickerson), I do believe it is meaningful to fit a linear helix to a relatively straight DNA fragment. That's why I have kept this functionality in SCHNAaP and 3DNA; 3DNA bending angle calculation serves as an example illustrating the point – it provides an "intuitive" way for biologists to understand how the bending angle is calculated; it can actually be measured directly.
  • Instead of directly LS-fitting a linear helical axis with 3DNA, one can alternatively superimpose a regular fiber model into the DNA fragment, and then derive the straight helical axis from the fitted coordinates. The two approaches normally gives slightly different numerical values, as would be expected.
  • Overall, bending angle is (at most) an approximate measure of DNA curvature. In my opinion, the concept is only applicable for comparing a set of structures, each with two relatively straight helical fragments. Even in such cases, the relative spatial relationship between two segments is more complicated than a simple (bending) angle could quantify. Be watchful – do not exaggerate the significance of small variations in bending angle.

1310
General discussions (Q&As) / Re: Re: 3DNA download
« on: March 21, 2012, 09:03:05 am »
Check documentations at the $X3DNA/doc directory; work out the recipes of the 2008 3DNA Nature Protocols paper; ask questions in the forum. I will update the brief tutorial shortly.

Xiang-Jun

1311
Yes, it is possible to install 3DNA on Windows XP -- the distributed v2.1beta-cygwin-win and v2.1beta-mingw-win were actually compiled on Windows XP. As to MinGW vs Cygwin, it is up to you: in my limited experience playing 3DNA on Windows, Cygwin seems more Linux-like, whilst MinGW is more Windows-like. If you are an experienced Windows user, you may try MinGW first.

Of course, if you have any technical problems, please post them here.

Xiang-Jun

1312
FAQs / How to handle modified (uncommon) bases?
« on: March 20, 2012, 09:40:19 pm »
In 3DNA, modified bases are mapped to their standard counterparts, e.g. 5‐iodouracil (5IU) to uracil (U) and 1‐methyladenine (1MA) to adenine (A), and are designated with lower case letters (as u and a respectively for the examples cited above). Technically, the mapping is stored in file $X3DNA/config/baselist.dat, and looks like this:
Code: [Select]
  A     A
 DA     A
ADE     A
....
5IU     u      # I connected to C5
....
1MA     a      # C connected to N1

Each mapped one-letter base (X = A/C/G/T/U for the standard nucleotides and x = a/c/g/t/u for the modified ones) has a corresponding Atomic_X.pdb (or Atomic.x.pdb) file oriented in the standard base reference frame. By default, the two sets (X and x) are identical, i.e., Atomic_A.pdb has the same content as Atomic.a.pdb. The mapping information is used in a ls-fitting procedure to define the base reference frame for each nucleotide in a PDB file, and allows for easy analysis of unusual DNA and RNA structures.

As of v2.1, when encountering a new modified base, 3DNA will automatically perform the mapping, and outputs the following message (using a contrived example):
Code: [Select]
Match '2MG' to 'g' for residue 2MG   10  on chain A [#1]
    check it & consider to add line '2MG     g' to file <baselist.dat>

Simply adding a line containing 2MG     g to file baselist.dat and the above info message will be gone. This is a contrived example because I deliberately deleted that line from baselist.dat for this illustration.

I implemented this auto-mapping as an experimental feature at least back in v1.5, but did not document it for public use. My experience over the years has shown that the auto-mapping is functioning as designed. Now with this feature set by default, processing of large datasets can be fully automated. Moreover, using find_pair, it is easy to get a complete list of modified bases in a dataset, e.g., in all the NDB entires.

1313
Structural analysis of nucleic acid used to be a rather tedious process, especially for irregular, complicated RNA structures and nucleic acid-protein complexes (e.g., the large ribosomal subunit 1jj2/rr0033). Without valid base-pairing information as input, the various analysis software will produce meaningless results. The program find_pair was originally created to solve this specific problem, by generating input file to 3DNA analysis routines (analyze/cehs) directly from a PDB file.

In its core, find_pair uses a pure geometric approach to identify all possible pairs (Watson-Cricks or non-canonical pairs actually exist in a structure), their H-bonding patterns and helix context. Specifically, the major criteria used are as follows:
  • The distance between the origins of the two bases (as defined by their standard reference frames) must be less than certain limit (15.0 Å by default) - otherwise, they would be too far away to be called a pair.
  • The vertical separation (i.e., stagger) between the two base planes must be less than certain limit (2.5 Å by default) - otherwise, they would be stacking instead of pairing.
  • The angle between the two base z-axes (i.e., their normal vectors) is less than a cut-off (65.0° by default).
  • There is at least one pair of nitrogen/oxygen base atoms that are within a H-bonding cut off distance (4.0 Å by default).
If two bases fulfill these geometric requirements, they are defined to be a pair, without taking consideration of their chemical constituents. Thus our method allows for identification of unconventional pairs as easily as the canonical ones. The program then checks for possible H-bonding patterns, whether the normal donor-acceptor (noted by '-' as in O6 - N4 for a G·C pair) or the unusual donor-donor, acceptor-acceptor (noted by '*' as in O2 * N3 for a C·C pair in urx057). The non-canonical pairs, especially those with unusual H-bonding patterns, should be checked more carefully - they could be due to errors in structure determination, or they could have some special meaning/significance unnoticed previously.

The default criteria mentioned above are based on a survey of the NDB structures. Generally speaking, they are pretty generous and work quite well in the most common cases we've encountered. However, we are aware of the possibilities of special cases where some of them might be too restrict or too generous, thus leading to find_pair to miss or produce superfluous base pairs. The default settings are stored in a text file named misc_3dna.par under the directory $X3DNA/config/ where users can modify as they see fit. Changes in that directory will have a global effect - wherever you run find_pair on your system, the modified values will be used. Alternately, users could make a copy of misc_3dna.par to their current working directory and change it over there for local effect. Note that the local setting has precedence over the global one.

As an example, find_pair will miss the 127th base-pair I:..53_:[.DT]T-----A[.DA]:.-53_:J in structure 1kx5/pd0287 in its default settings. This is because the H-bonding distance between T:N3 - A:N1 is 4.20 Å and that for T:O4 - A:N6 is 4.85 Å; both of them are larger than the default 4.0 Å cut off. Increasing the H-bonding criterion in file misc_3dna.par from 4.0 Å to 5.0 Å will solve this problem. Please note that in 3DNA, users can start directly from an uncompressed PDB file, without having to extract the DNA fragment first:
  • find_pair 1kx5.pdb 1kx5.inp to get input file for analyze
  • analyze 1kx5.inp to get detailed structural parameters in file 1kx5.out
  • The above two steps can be combined into one: find_pair 1kx5.pdb stdout | analyze stdin
In addition to (or instead of) manipulating parameters in misc_3dna.par, oftentimes it may be preferable to manually edit find_pair-generated base-piar files before feeding them into analyze/cehs. This allows for maximum flexibility as to which pair to consider in calculating 3DNA structural parameters.

Also worth noting is the -p option of find_pair: without this option, find_pair locates base pairs in double-helical regions; thus the Watson-Crick pairs take precedence over the Wobble and other non-canonical pairs. With the -p, then all pairs and higher order base associations (i.e., triplets and above) are detected.

 

1314
The easiest way to build a nucleic acid structure with the sugar-phosphate backbone, other than predefined fiber models, is to use the rebuild program. The backbone building scheme uses exactly the same protocol as the default for base-only model. The user needs to add the -atomic option to rebuild, and to choose the desired rigid sugar-phosphate backbone to be attached to the standard base geometry.

The four types of currently available backbone conformations are listed in the directory $X3DNA/config/atomic. To use any of these backbones, it is necessary to copy the standard nucleotide files associated with each type of backbone to $X3DNA/config or your current working directory, and to name each nucleotide as follows: Atomic_X.pdb (where X = A, C, G, T, U; or Atomic.x.pdb where x =  a, c, g, t, u for modified bases). The default Atomic_X.pdb files contains only the C1' backbone atom, and the base geometry is independent of the backbone conformation.

To build a DNA structure with B-DNA backbone conformation, for example, one uses the BDNA_X.pdb set to replace Atomic_X.pdb. There is a sub-command cp_std of the Ruby utility program x3dna_utils to help with this: x3dna_utils cp_std BDNA. This will copy BDNA_X.pdb to the current working directory and rename it Atomic_X.pdb. Please note that rebuild searches for Atomic_X.pdb files first in the current working directory, and then in $X3DNA/config.

To make the above description clear, here is an example. Go to the directory $X3DNA/examples/analyze_rebuild, and try to reproduce the following:
  • use the command, x3dna_utils cp_std BDNA, so that you will have Atomic_X.pdb files
  • use find_pair bdl084.pdb | analyze, to analyze the structure bdl084 (355d) and to generate a file named bp_step.par
  • use rebuild -atomic bp_step.par bdl084_3dna.pdb, to generate the PDB file bdl084_3dna.pdb with a standard B-backbone
The RMSD between all atoms of the original bdl084.pdb file and the generated bdl084_3dna.pdb file is only 0.73 Å. Please note that in the rebuilt bdl084_3dna.pdb file, some O3'(i-1) to P(i) linkages can be quite long (broken). This structure, however, serves well as a starting point for further energy minimization. See post "Restraint optimization of DNA backbone geometry using PHENIX" for how to regularize the overlong bonds.

1315
As of v2.1, the fiber utility has several new options that make building single-stranded RNA structures of arbitrary sequence a snap:
  • -s for single-strand
  • -r for RNA structure
  • -seq for specifying arbitrary sequence directly on the command line
For example, to generate a single-stranded RNA structure of sequence 'AUUGGUUC', do the following:
Code: [Select]
fiber -s -r -seq=AUUGGUUC ss-RNA.pdb
# the -s and -r options can be combined as -sr or -rs
fiber -sr -seq=AUUGGUUC ss-RNA.pdb

Technically, the RNA model is based on A-DNA model (#1 in the list), with the O2' atom attached on the sugar, and T replaced by U. The -s option simply extracts the leading strand from the default duplex.

1316
The easiest way to build a canonical double helical structure of specific sequence is to use the fiber program. The default option is structure #4, which corresponds to the calf thymus B-DNA double helix due to Struther Arnott. To build the A-form, the user should choose structure #1. Coordinates of these two structures are taken from Struther Arnott: "Polynucleotide secondary structures: an historical perspective", pp. 1-38 in "Oxford Handbook of Nucleic Acid Structure", edited by Stephen Neidle (Oxford Press, 1999).

For example, to build an A-DNA structure of sequence 'AAGCTTTC', one can do the following:
Code: [Select]
fiber -a -seq=AAGCTTTC fa.pdb
Note the -seq option is added as of v2.1.

1317
FAQs / What is the correct name of the package: 3DNA or X3DNA?
« on: March 20, 2012, 01:15:55 pm »
The "official" name of the package is 3DNA for 3-Dimensional Nucleic Acids. See my post titled "Does 3DNA work for RNA?" for how the name came about, and read more on "What can 3DNA do for RNA structures?".

As to the term X3DNA: in setting up the system, we need an environment variable to specify the directory where 3DNA is installed. Since 3DNA per se is not a valid identifier, I added an "X" before 3DNA, thus X3DNA. Over the years, I have noticed outside websites linking and literature references citing 3DNA as X3DNA. Recently, while registering a domain name for 3DNA, I firstly tried the obvious choice of 3dna.org. However, that name has already been taken, so I decided to resort to x3dna.org.

The term 3DNA is not unique to our software package on nucleic acids structures. A Google search reveals at least two other products that use this name. Interestingly, there is even a PDB entry of 3DNA, which is actually a protein structure. X3DNA, on the other hand, is a name special to our package. Moreover, the X3DNA distribution package means eXecuting 3DNA. X3DNA also implies eXtreme or eXtended 3DNA -- we are working to move the software to the next level.

1318
w3DNA -- web interface to 3DNA / Re: Missing atoms
« on: March 13, 2012, 06:12:02 pm »
Hi Xiao-Ping,

Thanks for reporting back, and I am glad to know that "Swiss Pdb Viewer (4.0.4) is happy with the file fb-pdbv3.pdb." You question prompted me to add a new option "-pdbv3" to make fiber/rebuild-generated structure files compliant with PDB format v3.x, which is apparently what PdbViewer likes. In connection with the option -three_letter_nts recently added to accomandate HADDOCK, 3DNA can now be directly connected with more third-party applications.

The "3DNA web server" is an interface to commonly used 3DNA functionality, and it is hosted and supported by the Olson laboratory at Rutgers University. Currently, w3DNA is based on 3DNA v2.0, thus not yet having the new features in v2.1beta. Your best bet is to download the v2.1beta I just compiled (2012mar13); then you can easily generate a fiber model that PdbViewer is happy with as follow (e.g. bDNA-pdbv3.pdb with sequence 'ACGTTTAA' -- case does not matter):

Code: [Select]
fiber -pdbv3 -seq=acgtttaa bDNA-pdbv3.pdb
Xiang-Jun

1319
w3DNA -- web interface to 3DNA / Re: Missing atoms
« on: March 08, 2012, 05:27:37 pm »
Okay, I guess we were both online at the same time, and you accessed the draft version while I was revising my post.

Anyway, have a try of the following fb-pdbv3.pdb, generated with the quick written Ruby script cvt2pdbviewer.rb:

Code: [Select]
#!/usr/bin/env ruby

raise "$0 inppdb outpdb" unless ARGV.size == 2

File.open(ARGV.last, "w") do |aFile|
    File.open(ARGV.first).each_line do |line|
        if line =~ /^ATOM  /
            line.sub!(/ O1P/, " OP1")
            line.sub!(/ O2P/, " OP2")
            line.sub!(/ C5M/, " C7 ")
            line.sub!(/  A (\w)/, ' DA \1')
            line.sub!(/  C (\w)/, ' DC \1')
            line.sub!(/  G (\w)/, ' DG \1')
            line.sub!(/  T (\w)/, ' DT \1')
            line.sub!(/1\.00  0\.00/, "1.00  1.00")
        end
        aFile.puts line
    end
end

In addition to change atom names for O1P/O2P/C5M, the script converts A/C/G/T to DA/DC/DG/DT, and sets the temperature factor to 1.00 instead of 0.00 which PDBViewer is not happy with. The complains about missing atom O2' or O2* is likely due to the fact that PdbViewer takes A/C/G/T as RNA nucleotides, which the above conversion also takes care of. It seems PdbViewer is quick strict in following PDB format v3.x.

Unless I have missed something obvious, I believe fb-pdbv3.pdb should make PdbViewer happy. It helps if you could report back how it goes.

Xiang-Jun

1320
w3DNA -- web interface to 3DNA / Re: Missing atoms
« on: March 08, 2012, 04:49:28 pm »
Hi Xiao-Ping,

Thanks for using 3DNA. Your attached log file shows the atom-naming issue with w3DNA-generated PDB file in Swiss-PdbViewer. I can sort of see where the problem is: currently 3DNA-generated PDB files use O1P/O2P instead of OP1/OP2, and C5M instead of C7 for thymine. I do not know if PdbViewer insists on using DA/DC/DG/DT instead of A/C/G/T.

To verify, could you please test the following PDB files using PdbViewer? What message do you get? Raw PDB file generated directly with fiber is named fb-raw.pdb; renamed O1P/O2P/C5M to OP1/OP2/C7 is fb-new.pdb.

Here is how the files are generated using 3DNA v2.1beta:
fiber -seq=acgt fb-raw.pdb
cp -f fb-raw.pdb fb-new.pdb
ruby -i -pe 'sub(/O1P/, "OP1"); sub(/O2P/, "OP2"); sub(/C5M/, "C7 ")' fb-new.pdb

Xiang-Jun

1321
RNA structures (DSSR) / What can 3DNA do for RNA structures?
« on: March 06, 2012, 08:33:34 pm »
See DSSR -- a new component in 3DNA for Defining the (Secondary) Structures of RNA, and more... note added on Saturday, March 16, 2013.


While 3DNA stands for 3d-NA instead of 3-DNA, my experience tells me that there are noticeable confusions in the structural bioinformatics community as to its capabilities for RNA structures. The favicon and logo of the new 3DNA homepage and forum have been designed to help clarify the issue. To make the message even clearer, I have set up this specific section titled "RNA structures".

This starting post summarizes some of 3DNA's facilities for the analysis and modeling of RNA structures, using concrete examples. Note PDB entry 6tna (NDB id trna04) is for the crystal structure of yeast phenylalanine transfer RNA. All the data files and images are generated directly from the commands given, and the results should be exactly reproducible.

  • Generate regular double-stranded or single-stranded RNA models of arbitrary sequence:
    fiber -r -seq=accugggga dsRNA.pdb
    fiber -r -seq=accugggga -s ssRNA.pdb
    As of v2.1, fiber has three new options: -r for RNA, -seq for specifying sequence (of lower, UPPER or Mixed case) directly on the command-line, and -s for single-stranded structure. Download ssRNA.pdb of sequence accugggga.
  • Calculate all backbone torsion angles:
    analyze -tor=6tna.tor 6tna.pdb   # as of 3DNA v2.1
    find_pair -s 6tna.pdb stdout | analyze stdin
    Note the -s option; as of v2.1, explicit input file name (here stdin) is required for analyze. The output file is saved in file 6tna.outs.
  • Detect coaxially stacked helices:
    find_pair 6tna.pdb 6tna.inp
    ex_str -1 hel_regions.pdb h1.pdb
    ex_str -2 hel_regions.pdb h2.pdb
    The two helical regions, saved in file h1.pdb and h2.pdb correspond to the two arms of the L-shaped tRNA structure (see the 6tna.jpg image below).
  • Find all (non-canonical) base-pairs, and higher-order base associations. Here only one of the three base triplets is shown:
    find_pair -p 6tna.pdb 6tna.mbp
    ex_str -1 multiplets.pdb triplet1.pdb
    r3d_atom -od -r=0.1 -b=0.2 triplet1.pdb stdout | render -jpeg > triplet1.jpg
    Note the -p option. Higher-order base associations (i.e., with 3+ bases) are saved in file multiplets.pdb where each multiplet is automatically set in the most extended views to the mean plane of the bases. Here render from Raster3D is used to convert 3DNA-generated .r3d file into a jpeg image.

  • Generate the schematics base block image in the overall "best" view:
    blocview -i=6tna.jpg 6tna.pdb
    The image is named 6tna.jpg, and is shown below:


In my understanding, 3DNA's RNA functionalities outlined above are distinct from the well-known FR3D program developed by the Leontis-Zirbel team. 3DNA and FR3D are somewhat complementary in purpose and running style. As a specific example, as noted in the 2003 3DNA NAR paper excerpted below, 3DNA has unique features for base-pair classification that are complementary to the 3-edge (Watson-Crick edge, Hoogsteen edge and sugar edge) based Leontif-Westhof scheme.
Quote
Since the six base pair parameters uniquely define the relative position and orientation of two bases, they can be used to reconstruct the base pair. Moreover, the parameters provide a simple mechanism for classification of structures (55) and database searching (X.-J. Lu, Y. Xin and W.K. Olson, unpublished data). Among the six base pair parameters, only Shear, Stretch and Opening are critical in characterizing key hydrogen bonding features, i.e. base pair type: Shear and Stretch define the relative offset of the two base origins in the mean base pair plane and Opening is the angle between the two x-axes with respect to the average normal to the base pair plane (see upper left panel in Fig. 1). For the Hoogsteen A+U base pair shown in Figure 2b, Shear is 0.5 Å , Stretch –3.5 Å and Opening 70°. Buckle, Propeller and Stagger, in contrast, are secondary parameters, which simply describe the imperfections, i.e. non-planarity, of a given base pair.
…………………………………………………………………………………………………………………………………………
Further details of the base pair search algorithm and the composition and geometries of all base pairing interactions observed to date in well-resolved RNA structures will be reported elsewhere (X.-J. Lu, Y. Xin and W.K. Olson, manuscript in preparation).

I am interested in extending 3DNA further into the RNA (structural) world. If you would like to collaborate on some specific project in this broad and important area, please drop me a message. Now that this section is open, I am hoping to see more applications of 3DNA in RNA structures. Questions, opinions, or comments? Please do not hesitate to post them here!

See also:

Xiang-Jun

1322
General discussions (Q&As) / Re: haddock compatible pdb
« on: March 05, 2012, 11:43:20 am »
Hi Sumedha,

Glad to hear that adapting x3dna2charmm_pdb has helped solve your problem!

Now I have better news: on second thought following our previous discussions, I reasoned that adding an option for 3-letter nucleotide names to fiber/rebuild-generated PDB files may not be a bad idea -- at least CHARMM and HADDOCK require them. So, I have revised the fiber program per se by adding two more options:
  • -three_letter_nts (can be abbreviated to -three) to generate a PDB file with three-letter names for nucleotides. Note this only applies to the standard nucleotides, ADE/CYT/GUA/THY/URA.
  • -connect (can be abbreviated to -co) to add CONECT records in the generated PDB file. I noticed the HADDOCK-compatible PDB file you attached, AAACCCAAA_fixed.pdb, contains such CONECT records.
Using the sequence AAACCCAAA as in your attached example, I have generated two PDB files as below:
    fiber -a -three -seq=AAACCCAAA AAACCCAAA-three.pdb
    fiber -a -co -three -seq=AAACCCAAA AAACCCAAA-three-connect.pdb
Both PDB files, AAACCCAAA-three.pdb and AAACCCAAA-three-connect.pdb, are attached for your verification. If I understand the issue correctly, both should be compatible with HADDOCK.

As always, please let me know how it goes.

Xiang-Jun

1323
General discussions (Q&As) / Re: haddock compatible pdb
« on: March 02, 2012, 11:31:32 am »
Quote
except that haddock expects the new PDB format for nucleotides which has three letter notations like ADE, GUA, CYT, THY.
Well, if that's indeed the case, HADDOCK is not using the new PDB format as documented in "Remediation of the protein data bank archive". Specifically, DNA residues now should be named DA, DC, DG, and DT, as shown in an example (355d) below:
ATOM     26  C2'  DG A   2      22.447  27.195  19.590  1.00 10.31           C  
ATOM     27  C1'  DG A   2      21.722  26.527  20.744  1.00  8.31           C 
ATOM     28  N9   DG A   2      20.293  26.737  20.884  1.00  6.86           N 
ATOM     29  C8   DG A   2      19.536  27.799  20.464  1.00  7.02           C 

3DNA generated nucleic acid structures (including fiber) do not follow strictly the PDB guideline, which seems having not caused any practical problems. As a rule, I do not tailor core 3DNA to any specific third-party tool. For background information, please see my blog post "PDB format, how many variants are there?".

That said, you may easily write up a format-converting script to fit your needs. In 3DNA v2.1beta, the perl_scripts/ directory contains a simple script named x3dna2charmm_pdb that converts DNA residue names from one-letter to three. It is enclosed below for reference, and you are welcome to customize it (especially the two lines in red).

Let us know how it goes.

Xiang-Jun

------------------------------------------------------------------------------------------------
#!/usr/bin/env perl
use strict;
use warnings;

## This is utility Perl script for converting 3DNA generated PDB file
## to that accepted by CHARMM. Initially written in response to a
## request from a 3DNA user.

## Please note that this script may not be that sophisticated, since I
## know little about the specifications of the CHARMM PDB format.
## Please let me know if you find any bug in it.

die "Usage: $0  3DNA_generated_PDB  converted_PDB\n" unless @ARGV == 2;
my $x3dna_pdb  = $ARGV[0];
my $charmm_pdb = $ARGV[1];

open( FH, "$x3dna_pdb" )   || die "Can't open <$x3dna_pdb> for reading: $!\n";
open( FO, ">$charmm_pdb" ) || die "Can't open <$charmm_pdb> for writing: $!\n";

while (<FH>) {
    if (/^ATOM/) {
        chomp;

        # expand this list as necessary ...
        my %one2three = (
                          '  A' => 'ADE',
                          '  C' => 'CYT',
                          '  G' => 'GUA',
                          '  T' => 'THY'
                        );
        my $residue = substr( $_, 17, 3 );
        substr( $_, 17, 3 ) = $one2three{$residue}
            if ( exists $one2three{$residue} );
        my $chainID = substr( $_, 21, 1 );
        substr( $_, 21, 1 )  = ' ';
        substr( $_, 54, 18 ) = '  0.00  0.00      ';
        substr( $_, 72, 1 )  = $chainID;

        print FO "$_\n";
    } else {
        print FO;
    }
}
close(FH);
close(FO);

=for example

Sample CHARMM PDB file:
ATOM     27  H2' ADE     1      -3.643   6.134   3.867  0.00  0.00      A
ATOM     28  C3' ADE     1      -4.155   7.022   5.806  0.00  0.00      A
ATOM     29  H3' ADE     1      -5.203   7.382   5.738  0.00  0.00      A
ATOM     30  O3' ADE     1      -3.256   8.063   5.448  0.00  0.00      A
ATOM     31  P   THY     2      -3.070   8.432   3.902  0.00  0.00      A
ATOM     32  O1P THY     2      -4.195   7.871   3.120  0.00  0.00      A
ATOM     33  O2P THY     2      -2.853   9.889   3.761  0.00  0.00      A
ATOM     34  O5' THY     2      -1.722   7.646   3.549  0.00  0.00      A

Sample 3DNA generated PDB file:
ATOM     50  O3'   C A   3      -7.224  -1.903   7.585
ATOM     51  C2'   C A   3      -6.408   0.308   8.023
ATOM     52  C1'   C A   3      -5.322   0.045   6.986
ATOM     53  N1    C A   3      -4.202   0.995   7.051

Transformed file:
ATOM     50  O3' CYT     3      -7.224  -1.903   7.585  0.00  0.00      A
ATOM     51  C2' CYT     3      -6.408   0.308   8.023  0.00  0.00      A
ATOM     52  C1' CYT     3      -5.322   0.045   6.986  0.00  0.00      A
ATOM     53  N1  CYT     3      -4.202   0.995   7.051  0.00  0.00      A

=cut
------------------------------------------------------------------------------------------------

1324
Thanks, that's helpful -- not only to your own understanding, but also to those who want to know the details.

Xiang-Jun

1325
For reference, here is a note about .par file from SNAP.
>> Following our Monday meeting, I have updated the SNAP program to do what we 
>> have discussed. Specifically,
>>
>> [1] I have added a command line option -frame=NUMBER. By default, it now
>>    uses the CA-CB-N based reference frame for AA as used by Pabo and
>>    Siggers/Honig. The ls-fitting scheme applies equally well to GLYCINE
>>    since FOUR atoms (N/CA/C/CB) are used by default. For glycine, where
>>    CB is missing, the other three are used.
>>
>>    With -frame=1, the previous peptide based reference frame is used.
>>
>> [2] When deciding the contacts, only heavy atoms (i.e., non-Hs) are
>>    considered.
>>
>> [3] SNAP now ouputs 4 types of AA-bp interactions, controlled by the new
>>    command line option -type=NUMBER, as follows:
>>        0: with any atom -- EITHER base OR backbone atom
>>        1: with base atom (could also contact backbone, default)
>>        2: with backbone atom (could also contact base)
>>        3: must contact BOTH base AND backbone atom
>>    The output files are now named like
>>        AT-ALA_1.par, AT-ALA_1.pdb etc for type=1, i.e., contacting base
>>        for -type=2, it would be AT-ALA_2.par etc, and so on.
>>    Note the interaction type info is now in the file name, and is NOT
>>        included in the .par file (see below).
>>
>> [4] The format of the .par file is now as follows:
>>
>> C4.A-D19.T:A55.ALA 1a73.pdb +     8.6310  -99.8367
>> # identifier, PDB file name, '+' or '-' as defined by dot(dz1, dz2)
>> # followed by "translational distance" and "rotational distance", as in #
>> Pabo, except that "translational distance" is NEGATIVE if it is below # the
>> base-pair mean plane. "rotational distance" is within [-180 to +180]
>>
>>   -3.6380    7.5388    2.1037    0.2592   38.5504  -93.9814
>> # six rigid body parameters: tx, ty, tz, rx, ry, rz
>>
>>    3.6415    7.1395    3.2034  # AA frame origin
>>   -0.1708   -0.8874   -0.4282  # x-axis
>>    0.8903    0.0471   -0.4529  # y-axis
>>    0.4220   -0.4586    0.7821  # z-axis
>> # The above 4 lines define the AA reference frame w.r.t. the bp frame, and
>> # can be *rigorously* deduced from the 6 rigid parameters. They are #
>> redundant, and can be safely ignored. They are included here for info #
>> purpose only.
>>
>>    3.6277    7.1391    3.1907  # CA atom coordinates
>>
>> [5] In processing all the PDB files in batch mode, you need to first run
>>    "snap -c" to initialize all the files. Then each run will "append" to
>>    the corresponding files to get the compilation of the whole set.
>>

Pages: 1 ... 51 52 [53] 54 55 ... 65

Funded by the NIH R24GM153869 grant on X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University