rk.preprocess
ROCKET Preprocessing Command
rk.preprocess
performs the preprocessing of predicted protein structures for ROCKET. It runs OpenFold inference, processes structures using Phenix, and performs Molecular Replacement or Cryo-EM Docking.
TL;DR.
A typical preprocessing command for X-ray datasets (for our human serpin case)
rk.preprocess \
--file_id 1lj5 \
--method xray \
--output_dir ./1lj5_processed \
--xray_data_label FP,SIGFP \
--max_recycling_iters 20 \
--use_deepspeed_evoformer_attention
It requires working in the path with preexisting data files organized as:
.
βββ 1lj5_data
β βββ 1lj5-tng_withrfree.mtz
βββ 1lj5_fasta
β βββ 1lj5.fasta
βββ alignments
β βββ 1lj5
β βββ bfd_uniclust_hits.a3m
β βββ mgnify_hits.a3m
β βββ pdb70_hits.hhr
β βββ uniref90_hits.a3m
A typical preprocessing command for Cryo-EM map datasets (for our groEL case):
rk.preprocess \
--file_id 8p4pA \
--resolution 9.6 \
--method cryoem \
--output_dir 8p4pA_processed \
--predocked_model 8p4pA_data/8p4pA_docked.pdb \
--fixed_model 8p4pA_data/8p4pA-fixed-model-forchainA.pdb \
--map1 8p4pA_data/emd_17425_half_map_1.map \
--map2 8p4pA_data/emd_17425_half_map_2.map \
--max_recycling_iters 20 \
--use_deepspeed_evoformer_attention
It requires preexisting data files organized as:
.
βββ 8p4pA_fasta
β βββ 8p4pA.fasta
βββ alignments
β βββ 8p4pA
β βββ bfd_uniclust_hits.a3m
β βββ pdb70_hits.hhr
β βββ uniref90_hits.a3m
βββ 8p4pA_data
β βββ emd_17425_half_map_1.map
β βββ emd_17425_half_map_2.map
β βββ 8p4pA-fixed-model-forchainA.pdb
β βββ 8p4pA_docked.pdb
Input Parameters
--file_id
Identifier for input files.
--resolution
The best resolution for cryoEM map. Not used for x-ray case.
--method
Choose "xray"
(calls Phaser) or "cryoem"
(calls EMPlacement).
--output_dir
Directory to store results (default: "preprocessing_output"
).
--precomputed_alignment_dir
Path to OpenFold precomputed alignments (default: "alignments/"
).
--max_recycling_iters
N_recyclings for initial predictions (default: 4)
--use_deepspeed_evoformer_attention
Flag, whether to use the DeepSpeed evoformer attention layer. Must have deepspeed installed in the environment
--jax_params_path
Path to JAX parameter file ("params_model_1_ptm.npz"
). Default None
, will use system env var $OPENFOLD_RESOURCES
The scripts expects input files organized as follows:
<working_directory>/
βββ {file_id}_fasta/
β βββ {file_id}.fasta # FASTA file containing the chain to refine
β # Header should be "> {file_id}"
β
βββ {file_id}_data/
β βββ *.mtz # For X-ray data
β βββ *_half_map*.mrc # For Cryo-EM data
β βββ <optional files> # e.g., predicted or docked models
β
βββ alignments/ # (default: --precomputed_alignment_dir)
β βββ {file_id}
| βββ*.a3m / *.hhr
Additional Parameters for X-ray (--method xray
)
--method xray
)--xray_data_label
Reflection data labels (e.g., "FP,SIGFP"
).
Additional Parameters for Cryo-EM (--method cryoem
)
--method cryoem
)--map1
Path to Half-map 1.
--map2
Path to Half-map 2.
--full_composition
FASTA file containing sequences of everything expected to be in the reconstruction, whether there is a model for it or not
Optional Arguments
--predocked_model
Path to an already docked model (default: None
).
--fixed_model
Optional fixed model contribution (default: None
).
Outputs
After execution, results will be structured in the --output_dir
directory:
output_dir/
|ββ {file_id}.fasta # FASTA file containing the chain to refine, copied from input
βββ alignments/ # MSA files for the input sequence, copied from input
| βββ*.a3m / *.hhr
βββ predictions/ # OpenFold structure predictions and pkl files
β βββ xxx_processed_feats.pickle # Processed feature dict with cluster profiles
βββ ROCKET_inputs/ # Final outputs for ROCKET main trunk
β βββ {file_id}-pred-aligned.pdb # Aligned prediction with pseudo-Bs
β βββ {file_id}-Edata.mtz # Experimental data in LLG convention
βββ ROCKET_config_phase1.yaml # Automatically generated config file for rk.refine phase1 run
βββ ROCKET_config_phase2.yaml. # Automatically generated config file for rk.refine phase2 run
βββ processed_predicted_files/ # Processed predictions from Phenix (including trimmed confidence loops)
βββ docking_outputs/ # Cryo-EM docking results (if exists)
βββ phaser_files/ # X-ray molecular replacement results (if exists)
Last updated