Author Topic: Ruby scripts for the analysis of MD simulation trajectories (Read 36976 times)

xiangjun · « **on:** January 19, 2011, 12:11:44 am »

Hi MD practitioners,

Here is the updated release v0.7 of two Ruby scripts that aim to streamline the analysis of MD simulation trajectories with 3DNA. There is also a blog post with more background information, but here are the most relevant:

Where to download http://3dna.rutgers.edu:8080/data/x3dna_md_v0.7.tar.gz
How to install (see README file for more information):
```
tar zxvf x3dna_md_v0.7.tar.gz
```
and you will get a directory named x3dna_md_v0.7/, underneath you will find two Ruby scripts: x3dna_md.rb and extract_par.rb, and associated data files for testing and verification purpose.

How to run x3dna_md.rb: this script needs to be run first. Detailed help message (with -h) is shown below:

----------------------------------------------------------------------
Usage:
        x3dna_md.rb options
Examples:
        x3dna_md.rb -b bpfile.dat -e sample_md0.pdb
             # 21 models (0-21); output (default): 'x3dna_md.out'
             # also generate 'model_list.dat', see below
        x3dna_md.rb -b bpfile.dat -m model_list.dat -o x3dna_md2.out
             # diff x3dna_md.out x3dna_md2.out

        x3dna_md.rb -b bpfile.dat -p 'pdbdir/model_*.pdb' -o x3dna_md3.out
             # note the quote for -p option; 20 models (1-20)
             # also also generate 'pdb_list.dat', see below
        x3dna_md.rb -b bpfile.dat -l pdb_list.dat -o x3dna_md4.out
             # diff x3dna_md3.out x3dna_md4.out
             # note the order of PDB files: 1, 10..19, 2, 20, 3..9
Options:
----------------------------------------------------------------------
    --bpfile, -b <s>:   File containing base-pairing info (as generated
                        from find_pair, and EDITED as appropriate)
                        
   --outfile, -o <s>:   Output file name (default: x3dna_md.out)
  --ensemble, -e <s>:   Model ensemble in  pairs
    --models, -m <s>:   Explicit list of model numbers
   --pattern, -p <s>:   Pattern of PDB files to process (e.g., *.pdb)
      --list, -l <s>:   Explicit list of individual PDB file
       --version, -v:   Print version and exit
          --help, -h:   Show this message

Note specifically that an input file with base-pairing (-b) information must be provided, which can be easily generated using find_pair and then manually edited as necessary. Needless to say, the base-pair file specified with -b must match the pairing configuration in each model of the ensemble. The input can be conveniently supplied with one of four options (-e, -m, -p, -l), allowing for great flexibility. Importantly, for the -e and -m options, each model in the ensemble must be delimited by an MODEL/ENDMDL pair, as clearly documented in the Coordinate Section of the PDB format.

The output file contains a comprehensive set of 3DNA calculated parameters, each enclosed in an xml-style tag pair; e.g., <propeller>...</propeller>. Each parameter is arranged in a tab-delimited m-by-n matrix, where m is the number of models, and n is the number of base-pairs or steps. The default file name is x3dna_md.out and an example is attached.

How to run extract_par.rb: this script needs to be run after x3dna_md.rb. Detailed help message (with -h) is shown below:

----------------------------------------------------------------------
Usage:
        extract_par.rb options
Examples:
        extract_par.rb -l
             # to see a list of all parameters
        extract_par.rb -p prop
             # -p 36 also fine (see above); from file 'x3dna_md.out'
             # for propeller, no need to specify full: -p pr suffices
        extract_par.rb -p slide -s , -f x3dna_md3.out
             # comma separated, from file 'x3dna_md3.out', to screen
        extract_par.rb -p roll -s ' ' -n > roll.dat
             # space separated, no row-label, to file 'roll.dat'
        extract_par.rb -a
             # extract all parameters, each in a separate file
             # prefixed with 'x3dna_md_': e.g., 'x3dna_md_chi1.out'
             # run 'extract_par.rb -c' to clean up all generated files
        extract_par.rb -e 1 -p chi1
             # extract the chi torsion angle of strand I, but exclude
             # those from the two terminal base pairs. For comparison,
             # run also: extract_par.rb -p chi1
Options:
----------------------------------------------------------------------
           --no-1col, -n:   Delete the first annotation column
     --separator, -s <s>:   Separator for fields [tab] (default: 	)
              --list, -l:   List all parameters
               --all, -a:   Extract all parameters into separate files
             --clean, -c:   Clean up parameter files by the above -a option
      --par-name, -p <s>:   Name of parameter
      --fromfile, -f <s>:   File name with parameters (default:
                            x3dna_md.out)
  --end-effects, -e :   No. of end pairs to ignore (default: 0, 0)
           --version, -v:   Print version and exit
              --help, -h:   Show this message

Three sample output files are attached below for reference: propeller.tsv contains propeller of 21 models of a 12-mer in the default tab-delimted format; slide.csv contains roll in comma separated format; and roll.dat in space separated format, without leading label column. The output parameter table is intended to be fed into R/Matlab/Octive/Excel etc for statistical analysis or visualization.

Acknowledgments: thanks to Aneesh for the final "push"; Alpay for sharing his Python script, and providing an example data set on which the Ruby scripts were tested.

The Ruby scripts takes advantage of William Morgan's Trollop (v1.16.2) (http://trollop.rubyforge.org/) for command line option parsing. To make the scripts self-contained, the single file trollop.rb is included with the distribution.

The scripts were tested with Ruby 1.9.2p0 on Ubuntu Linux (10.04), and 1.8.7 on Mac OS X Snow Leopard.

Enjoy, and do not forget to report back any problems you experience!

Version history

2011-01-18: v0.1, first release.
2011-02-12: v0.2, fixed a bug with `each': no block given -- thanks to shahabshariati!
2011-03-05: v0.3, removed the model_ prefix at the first column of extracted parameter file; added the -e option to delete parameters associated with terminal base-pairs -- thanks to Alpay's suggestions.
2011-03-16: v0.4, significant refinement of the scripts (in line of defensive programming) to check for various possible erroneous inputs (e.g., mismatched base-pair file, ensemble not delimited by MODEL/ENDMDL pairs etc); added -d option to make error message more obvious; added a comprehensive README file.
2011-04-02: v0.5, added return value checking of system() calls, plus other refinements.
2011-05-29: v0.6, refined system-call and pair checking with more informative message.
2011-09-30: v0.7, added H-bond and overlap areas parameters, and the -c option in extract_par.rb.

Xiang-Jun

News:

Author Topic: Ruby scripts for the analysis of MD simulation trajectories (Read 36976 times)

xiangjun

Ruby scripts for the analysis of MD simulation trajectories