rk.preprocess
ROCKET Preprocessing Command
rk.preprocess performs the preprocessing of predicted protein structures for ROCKET. It runs OpenFold inference, processes structures using Phenix, and performs Molecular Replacement or Cryo-EM Docking.
TL;DR.
A typical preprocessing command for X-ray datasets (for our human serpin case)
rk.preprocess \
--file_id 1lj5 \
--method xray \
--output_dir ./1lj5_processed \
--max_recycling_iters 20 \
--use_deepspeed_evoformer_attentionIt requires working in the path with preexisting data files organized as:
.
βββ 1lj5_data
β βββ 1lj5-tng_withrfree.mtz
βββ 1lj5_fasta
β βββ 1lj5.fasta
βββ alignments
β βββ 1lj5
β βββ bfd_uniclust_hits.a3m
β βββ mgnify_hits.a3m
β βββ pdb70_hits.hhr
β βββ uniref90_hits.a3mNote: If the MTZ file provided contains more than one set of relevant columns (e.g. intensities and errors + structure factor amplitudes and errors), Phaser will pick the best set for you β it's likely better to work from the intensities if available! If you absolutely want to use a specific set of columns, you can provide an MTZ that contains only that data.
A typical preprocessing command for Cryo-EM map datasets (for our groEL case):
It requires preexisting data files organized as:
Input Parameters
--file_id
Identifier for input files.
--resolution
The best resolution for cryoEM map. Not used for x-ray case.
--method
Choose "xray" (calls Phaser) or "cryoem" (calls EMPlacement).
--output_dir
Directory to store results (default: "preprocessing_output").
--precomputed_alignment_dir
Path to OpenFold precomputed alignments (default: "alignments/").
--max_recycling_iters
N_recyclings for initial predictions (default: 4)
--use_deepspeed_evoformer_attention
Flag, whether to use the DeepSpeed evoformer attention layer. Must have deepspeed installed in the environment
--jax_params_path
Path to JAX parameter file ("params_model_1_ptm.npz"). Default None, will use system env var $OPENFOLD_RESOURCES
The scripts expects input files organized as follows:
Additional Parameters for Cryo-EM (--method cryoem)
--method cryoem)--map1
Path to Half-map 1.
--map2
Path to Half-map 2.
--full_composition
FASTA file containing sequences of everything expected to be in the reconstruction, whether there is a model for it or not
Optional Arguments
--predocked_model
Path to an already docked model (default: None).
--fixed_model
Optional fixed model contribution (default: None).
Outputs
After execution, results will be structured in the --output_dir directory:
Last updated