> For the complete documentation index, see [llms.txt](https://rocket-9.gitbook.io/rocket-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://rocket-9.gitbook.io/rocket-docs/launch-with-your-own-cryo-em-data.md).

# Launch with Your Own Cryo-EM Data

This tutorial shows how to refine a prediction against your own **CryoEM/ET** data with ROCKET.

For cryo-EM, the best way to run ROCKET at the moment is to refine one domain or chain at a time. We use the `PDB ID 8P4P, chain H` system as an example.

{% hint style="info" %}
This path is best when you want to refine one chain or domain at a time against a cryo-EM or cryo-ET map.
{% endhint %}

**Note:** Precompute your MSA files first. ROCKET currently expects `a3m` or `sto` input from an external server or database. To use OpenFold locally, follow the [sequence database download instructions](https://openfold.readthedocs.io/en/latest/Inference.html). This requires about a terabyte of storage. The `--precomputed_alignment_dir` flag defaults to `alignments/`, and ROCKET will use all alignments found there.

### 1. Collect the required files

The ROCKET preprocessing script expects input files organized as follows:

```
<working_directory>/
├── {file_id}_fasta/
│   └── {file_id}.fasta       # FASTA file containing the chain to refine
│                             # Header should be "> {file_id}"
│
├── {file_id}_data/
│   ├── *_half_map*.mrc       # For Cryo-EM data
│   └── <optional files>/     # e.g., predicted or docked models
│
├── alignments/               # (default: --precomputed_alignment_dir)
│   └── {file_id}
|       └──*.a3m / *.hhr      # MSA files for the input sequence 
```

### 2. Run preprocessing

Once you have data organized in this way, you are ready to run `rk.preprocess` .

A common scenario is having a model where the chains are already docked and even built to a certain degree, but a specific chain remains problematic. In this case, providing the docked chain of interest as `--predocked_model` speeds up preprocessing by skipping the docking search. Providing the rest of the docked chains as `--fixed_model` helps account for other parts of the map during refinement.

For example:

```bash
rk.preprocess \
  --file_id 8p4pH \
  --resolution 9.6 \
  --method cryoem \
  --output_dir 8p4pH_processed \
  --precomputed_alignment_dir alignments/ \
  --predocked_model 8p4pH_data/8p4pH_docked.pdb \
  --fixed_model 8p4pH_data/8p4pH-fixed-model-forchainH.pdb \
  --map1 8p4pH_data/emd_17425_half_map_1.map \
  --map2 8p4pH_data/emd_17425_half_map_2.map \
  --max_recycling_iters 20 \
  --use_deepspeed_evoformer_attention
```

If you already have a predocked model and only a single post-processed map, you can use `--map` alone instead:

```bash
rk.preprocess \
  --file_id 8p4pH \
  --resolution 9.6 \
  --method cryoem \
  --output_dir 8p4pH_processed \
  --precomputed_alignment_dir alignments/ \
  --predocked_model 8p4pH_data/8p4pH_docked.pdb \
  --fixed_model 8p4pH_data/8p4pH-fixed-model-forchainH.pdb \
  --map 8p4pH_data/emd_17425_postprocessed.map \
  --max_recycling_iters 20 \
  --use_deepspeed_evoformer_attention
```

**Note:** This shortcut only works when the docked placement is already known. If ROCKET still needs to search for the placement, you must provide both half maps with `--map1` and `--map2`. In general, if you have low resolution data and you are trying to model only a small part of a larger complex, we recommend performing the docking separately with the full complex.

{% hint style="warning" %}
Use `--map` only for the predocked case. Use `--map1` and `--map2` when docking still needs to be searched.
{% endhint %}

### 3. Review the generated configs

`rk.preprocess` generates two YAML files under `--output_dir` for `rk.refine`. Review them before you start refinement.

{% hint style="success" %}
The generated phase 1 and phase 2 configs are usually the best first run before you start tuning.
{% endhint %}

If you want to generate another set of default config files:

```bash
rk.config --mode both --datamode cryoem --working-dir 8p4pH_processed --file-id 8p4pH
```

The `--mode both` flag sets up the default phase 1 and phase 2 workflow. You can edit the saved config files if you want to test a specific condition.

### 4. Run refinement

Run the phase 1 config first:

```bash
rk.refine 8p4pH_processed/ROCKET_config_phase1.yaml
```

This will start the refinement.

If you want live run tracking, see [Track Refinement with Weights & Biases](/rocket-docs/track-refinement-with-weights-and-biases.md).

The standard workflow is phase 1 followed by the lower learning-rate phase 2:

```bash
rk.refine 8p4pH_processed/ROCKET_config_phase2.yaml
```

Phase 2 requires an existing phase 1 folder. If you want to start with a lower learning rate, edit `config_phase1.yaml` directly.

### 5. Finalize geometry and B-factors

We recommend a short standard refinement run afterwards. We used `phenix.refine` in the paper. This helps polish geometry and B-factors on the ROCKET output.

### A note from us

We hope to make ROCKET as useful and general as we can. If you run into setup issues, let us know and we will try to help.

[Create an issue](https://github.com/alisiafadini/ROCKET/issues) in our repo


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://rocket-9.gitbook.io/rocket-docs/launch-with-your-own-cryo-em-data.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
