rk.preprocess

ROCKET Preprocessing Command

rk.preprocess performs the preprocessing of predicted protein structures for ROCKET. It runs OpenFold inference, processes structures using Phenix, and performs Molecular Replacement or Cryo-EM Docking.

TL;DR.

A typical preprocessing command for X-ray datasets (for our human serpin case)

rk.preprocess \
  --file_id 1lj5 \
  --method xray \
  --output_dir ./1lj5_processed \
  --xray_data_label FP,SIGFP \
  --max_recycling_iters 20 \
  --use_deepspeed_evoformer_attention

It requires working in the path with preexisting data files organized as:

.
├── 1lj5_data
│   └── 1lj5-tng_withrfree.mtz
├── 1lj5_fasta
│   └── 1lj5.fasta
├── alignments
│   └── 1lj5
│       ├── bfd_uniclust_hits.a3m
│       ├── mgnify_hits.a3m
│       ├── pdb70_hits.hhr
│       └── uniref90_hits.a3m

A typical preprocessing command for Cryo-EM map datasets (for our groEL case):

rk.preprocess \
  --file_id 8p4pA \
  --resolution 9.6 \
  --method cryoem \
  --output_dir 8p4pA_processed \
  --predocked_model 8p4pA_data/8p4pA_docked.pdb \
  --fixed_model 8p4pA_data/8p4pA-fixed-model-forchainA.pdb \
  --map1 8p4pA_data/emd_17425_half_map_1.map \
  --map2 8p4pA_data/emd_17425_half_map_2.map \
  --max_recycling_iters 20 \
  --use_deepspeed_evoformer_attention

It requires preexisting data files organized as:

.
├── 8p4pA_fasta
│   └── 8p4pA.fasta
├── alignments
│   └── 8p4pA
│       ├── bfd_uniclust_hits.a3m
│       ├── pdb70_hits.hhr
│       └── uniref90_hits.a3m
├── 8p4pA_data
│   ├── emd_17425_half_map_1.map
│   ├── emd_17425_half_map_2.map
│   ├── 8p4pA-fixed-model-forchainA.pdb
│   └── 8p4pA_docked.pdb

Input Parameters

Argument

Description

--file_id

Identifier for input files.

--resolution

The best resolution for cryoEM map. Not used for x-ray case.

--method

Choose "xray" (calls Phaser) or "cryoem" (calls EMPlacement).

--output_dir

Directory to store results (default: "preprocessing_output").

--precomputed_alignment_dir

Path to OpenFold precomputed alignments (default: "alignments/").

--max_recycling_iters

N_recyclings for initial predictions (default: 4)

--use_deepspeed_evoformer_attention

Flag, whether to use the DeepSpeed evoformer attention layer. Must have deepspeed installed in the environment

--jax_params_path

Path to JAX parameter file ("params_model_1_ptm.npz"). Default None, will use system env var $OPENFOLD_RESOURCES

The scripts expects input files organized as follows:

<working_directory>/
├── {file_id}_fasta/
│   └── {file_id}.fasta       # FASTA file containing the chain to refine
│                             # Header should be "> {file_id}"
│
├── {file_id}_data/
│   ├── *.mtz                 # For X-ray data
│   ├── *_half_map*.mrc       # For Cryo-EM data
│   └── <optional files>      # e.g., predicted or docked models
│
├── alignments/               # (default: --precomputed_alignment_dir)
│   └── {file_id}
|       └──*.a3m / *.hhr

Additional Parameters for X-ray (`--method xray`)

Argument

Description

--xray_data_label

Reflection data labels (e.g., "FP,SIGFP").

Additional Parameters for Cryo-EM (`--method cryoem`)

Argument

Description

--map1

Path to Half-map 1.

--map2

Path to Half-map 2.

--full_composition

FASTA file containing sequences of everything expected to be in the reconstruction, whether there is a model for it or not

Optional Arguments

Argument

Description

--predocked_model

Path to an already docked model (default: None).

--fixed_model

Optional fixed model contribution (default: None).

Outputs

After execution, results will be structured in the --output_dir directory:

output_dir/
|── {file_id}.fasta                 # FASTA file containing the chain to refine, copied from input
├── alignments/                     # MSA files for the input sequence, copied from input
|   └──*.a3m / *.hhr               
│── predictions/                    # OpenFold structure predictions and pkl files
│   └── xxx_processed_feats.pickle  # Processed feature dict with cluster profiles           
│── ROCKET_inputs/                  # Final outputs for ROCKET main trunk
│   ├── {file_id}-pred-aligned.pdb  # Aligned prediction with pseudo-Bs
│   ├── {file_id}-Edata.mtz         # Experimental data in LLG convention
├── ROCKET_config_phase1.yaml       # Automatically generated config file for rk.refine phase1 run
├── ROCKET_config_phase2.yaml.      # Automatically generated config file for rk.refine phase2 run
│── processed_predicted_files/      # Processed predictions from Phenix (including trimmed confidence loops)
│── docking_outputs/                # Cryo-EM docking results (if exists)
│── phaser_files/                   # X-ray molecular replacement results (if exists)

PreviousLaunch with Your Own Cryo-EM Data Nextrk.refine

Last updated 2 months ago

TL;DR.

Input Parameters

Additional Parameters for X-ray (--method xray)

Additional Parameters for Cryo-EM (--method cryoem)

Optional Arguments

Outputs

Additional Parameters for X-ray (`--method xray`)

Additional Parameters for Cryo-EM (`--method cryoem`)