Netiquette · Download · News · Gallery · Homepage · DSSR Manual · G-quadruplexes · DSSR-Jmol · DSSR-PyMOL · DSSR Licensing · Video Overview· RNA Covers

Author Topic: Ruby scripts for the analysis of MD simulation trajectories  (Read 21957 times)

Offline xiangjun

  • Administrator
  • with-posts
  • *****
  • Posts: 1650
    • View Profile
    • 3DNA homepage
Ruby scripts for the analysis of MD simulation trajectories
« on: January 19, 2011, 12:11:44 am »
Hi MD practitioners,

Here is the updated release v0.7 of two Ruby scripts that aim to streamline the analysis of MD simulation trajectories with 3DNA. There is also a blog post with more background information, but here are the most relevant:
  • Where to download http://3dna.rutgers.edu:8080/data/x3dna_md_v0.7.tar.gz
  • How to install (see README file for more information):
    tar zxvf x3dna_md_v0.7.tar.gz
    and you will get a directory named x3dna_md_v0.7/, underneath you will find two Ruby scripts:  x3dna_md.rb  and extract_par.rb, and associated data files for testing and verification purpose.
  • How to run x3dna_md.rb: this script needs to be run first. Detailed help message (with -h) is shown below:
    ----------------------------------------------------------------------
    Usage:
            x3dna_md.rb options
    Examples:
            x3dna_md.rb -b bpfile.dat -e sample_md0.pdb
                 # 21 models (0-21); output (default): 'x3dna_md.out'
                 # also generate 'model_list.dat', see below
            x3dna_md.rb -b bpfile.dat -m model_list.dat -o x3dna_md2.out
                 # diff x3dna_md.out x3dna_md2.out
    
            x3dna_md.rb -b bpfile.dat -p 'pdbdir/model_*.pdb' -o x3dna_md3.out
                 # note the quote for -p option; 20 models (1-20)
                 # also also generate 'pdb_list.dat', see below
            x3dna_md.rb -b bpfile.dat -l pdb_list.dat -o x3dna_md4.out
                 # diff x3dna_md3.out x3dna_md4.out
                 # note the order of PDB files: 1, 10..19, 2, 20, 3..9
    Options:
    ----------------------------------------------------------------------
        --bpfile, -b <s>:   File containing base-pairing info (as generated
                            from find_pair, and EDITED as appropriate)
                            
       --outfile, -o <s>:   Output file name (default: x3dna_md.out)
      --ensemble, -e <s>:   Model ensemble in  pairs
        --models, -m <s>:   Explicit list of model numbers
       --pattern, -p <s>:   Pattern of PDB files to process (e.g., *.pdb)
          --list, -l <s>:   Explicit list of individual PDB file
           --version, -v:   Print version and exit
              --help, -h:   Show this message
    
    Note specifically that an input file with base-pairing (-b) information must be provided, which can be easily generated using find_pair and then manually edited as necessary. Needless to say, the base-pair file specified with -b must match the pairing configuration in each model of the ensemble. The input can be conveniently supplied with one of four options (-e, -m, -p, -l), allowing for great flexibility. Importantly, for the -e and -m options, each model in the ensemble must be delimited by an MODEL/ENDMDL pair, as clearly documented in the Coordinate Section of the PDB format.

    The output file contains a comprehensive set of 3DNA calculated parameters, each enclosed in an xml-style tag pair; e.g., <propeller>...</propeller>. Each parameter is arranged in a tab-delimited m-by-n matrix, where m is the number of models, and n is the number of base-pairs or steps. The default file name is x3dna_md.out and an example is attached.
  • How to run extract_par.rb: this script needs to be run after x3dna_md.rb. Detailed help message (with -h) is shown below:
    ----------------------------------------------------------------------
    Usage:
            extract_par.rb options
    Examples:
            extract_par.rb -l
                 # to see a list of all parameters
            extract_par.rb -p prop
                 # -p 36 also fine (see above); from file 'x3dna_md.out'
                 # for propeller, no need to specify full: -p pr suffices
            extract_par.rb -p slide -s , -f x3dna_md3.out
                 # comma separated, from file 'x3dna_md3.out', to screen
            extract_par.rb -p roll -s ' ' -n > roll.dat
                 # space separated, no row-label, to file 'roll.dat'
            extract_par.rb -a
                 # extract all parameters, each in a separate file
                 # prefixed with 'x3dna_md_': e.g., 'x3dna_md_chi1.out'
                 # run 'extract_par.rb -c' to clean up all generated files
            extract_par.rb -e 1 -p chi1
                 # extract the chi torsion angle of strand I, but exclude
                 # those from the two terminal base pairs. For comparison,
                 # run also: extract_par.rb -p chi1
    Options:
    ----------------------------------------------------------------------
               --no-1col, -n:   Delete the first annotation column
         --separator, -s <s>:   Separator for fields [tab] (default: 	)
                  --list, -l:   List all parameters
                   --all, -a:   Extract all parameters into separate files
                 --clean, -c:   Clean up parameter files by the above -a option
          --par-name, -p <s>:   Name of parameter
          --fromfile, -f <s>:   File name with parameters (default:
                                x3dna_md.out)
      --end-effects, -e :   No. of end pairs to ignore (default: 0, 0)
               --version, -v:   Print version and exit
                  --help, -h:   Show this message
    
    Three sample output files are attached below for reference: propeller.tsv contains propeller of 21 models of a 12-mer in the default tab-delimted format; slide.csv contains roll in comma separated format; and roll.dat in space separated format, without leading label column. The output parameter table is intended to be fed into R/Matlab/Octive/Excel etc for statistical analysis or visualization.
  • Acknowledgments: thanks to Aneesh for the final "push"; Alpay for sharing his Python script, and providing an example data set on which the Ruby scripts were tested.

    The Ruby scripts takes advantage of William Morgan's Trollop (v1.16.2) (http://trollop.rubyforge.org/) for command line option parsing. To make the scripts self-contained, the single file trollop.rb is included with the distribution.

    The scripts were tested with Ruby 1.9.2p0 on Ubuntu Linux (10.04), and 1.8.7 on Mac OS X Snow Leopard.

Enjoy, and do not forget to report back any problems you experience!

Version history

  • 2011-01-18: v0.1, first release.
  • 2011-02-12: v0.2, fixed a bug with `each': no block given -- thanks to shahabshariati!
  • 2011-03-05: v0.3, removed the model_ prefix at the first column of extracted parameter file; added the -e option to delete parameters associated with terminal base-pairs -- thanks to Alpay's suggestions.
  • 2011-03-16: v0.4, significant refinement of the scripts (in line of defensive programming) to check for various possible erroneous inputs (e.g., mismatched base-pair file, ensemble not delimited by MODEL/ENDMDL pairs etc); added -d option to make error message more obvious; added a comprehensive README file.
  • 2011-04-02: v0.5, added return value checking of system() calls, plus other refinements.
  • 2011-05-29: v0.6, refined system-call and pair checking with more informative message.
  • 2011-09-30: v0.7, added H-bond and overlap areas parameters, and the -c option in extract_par.rb.

Xiang-Jun
« Last Edit: January 24, 2012, 11:28:40 am by xiangjun »

 

Funded by X3DNA-DSSR, an NIGMS National Resource for Structural Bioinformatics of Nucleic Acids (R24GM153869)

Created and maintained by Dr. Xiang-Jun Lu, Department of Biological Sciences, Columbia University