> For the complete documentation index, see [llms.txt](https://rocket-9.gitbook.io/rocket-docs/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://rocket-9.gitbook.io/rocket-docs/api/rk.msacluster.md).

# rk.msacluster

`rk.msacluster` subsamples sequences from an MSA with DBSCAN and uniform random sampling.

This script is modified from the [AF\_cluster repository](https://github.com/HWaymentSteele/AF_Cluster/blob/main/scripts/ClusterMSA.py).

It writes one `.a3m` file per subsample. It assumes the first sequence in the FASTA or A3M file is the query.

### Options

```
options:
  -h, --help            show this help message and exit
  -i I                  fasta/a3m file of original alignment, or path containing fasta/a3m files
  -o O                  name of output directory to write MSAs to.
  --n_controls N_CONTROLS
                        Number of control msas to generate (Default 10)
  --verbose             Print cluster info as they are generated.
  --scan                Select eps value on 1/4 of data, shuffled.
  --eps_val EPS_VAL     Use single value for eps instead of scanning.
  --resample            If included, will resample the original MSA with replacement before writing.
  --gap_cutoff GAP_CUTOFF
                        Remove sequences with gaps representing more than this frac of seq.
  --min_eps MIN_EPS     Min epsilon value to scan for DBSCAN (Default 3).
  --max_eps MAX_EPS     Max epsilon value to scan for DBSCAN (Default 20).
  --eps_step EPS_STEP   step for epsilon scan for DBSCAN (Default 0.5).
  --min_samples MIN_SAMPLES
                        Default min_samples for DBSCAN (Default 3, recommended no lower than that).
  --run_PCA             Run PCA on one-hot embedding of sequences and store in output_cluster_metadata.tsv
  --run_TSNE            Run TSNE on one-hot embedding of sequences and store in output_cluster_metadata.tsv
```

### Example

```bash
rk.msacluster v1 -i alignments/ -o msaclusters --run_TSNE
```

You can score the resulting subsamples with [rk.score](/rocket-docs/api/rk.score.md).


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://rocket-9.gitbook.io/rocket-docs/api/rk.msacluster.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
