OpenSearch 2.x — Cluster Operations
Practical cheatsheet focused on understanding and troubleshooting a running OpenSearch cluster. All commands use the REST API via curl.
Tip
Set a base variable to keep commands short:
Cluster health
Quick status check
| Status | Meaning |
|---|---|
| All primary and replica shards are assigned | |
| All primaries OK, but some replicas are unassigned | |
| Some primary shards are unassigned — data is missing |
Per-index health
This filters to only show indices that are not green — the ones you care about during an incident.
Nodes overview
List all nodes
Warning
Watch for disk.used_percent > 85% — OpenSearch starts blocking writes at the high watermark (default 90%).
Node roles cheatsheet
| Letter | Role |
|---|---|
m | master-eligible |
d | data |
i | ingest |
r | remote cluster client |
c | coordinating only (no letter shown) |
Disk allocation detail
Indices
List all indices
| Column | Meaning |
|---|---|
pri | Number of primary shards |
rep | Number of replica copies per primary |
health | Worst shard status for this index |
status | open (serving requests) or close (not loaded) |
Show only unhealthy indices
Shards
Understanding shards
An index is split into primary shards (the data) and replica shards (copies for HA). When a node goes down, replicas on surviving nodes get promoted to primary.
- Yellow = a replica can't be assigned (often: not enough nodes, or same-node restriction)
- Red = a primary shard has no copy anywhere — that data is unavailable
List all shards
| State | Meaning |
|---|---|
STARTED | Shard is active and serving |
UNASSIGNED | Shard has no node — this is the problem |
INITIALIZING | Shard is being created or recovered |
RELOCATING | Shard is moving to another node |
Show only unassigned shards
Diagnosing unassigned shards
This is the most important section for incident response.
Why is a shard unassigned?
This returns a detailed explanation for one unassigned shard. To ask about a specific shard:
Common unassigned reasons
| Reason | Typical cause | Fix |
|---|---|---|
NODE_LEFT | A node crashed or was removed | Bring the node back, or wait for replica promotion |
CLUSTER_RECOVERED | Cluster just restarted | Wait — shards are recovering |
ALLOCATION_FAILED | Disk full, corrupt shard | Check disk space, possibly delete old indices |
INDEX_CREATED | New index, not enough nodes | Add nodes or reduce number_of_replicas |
Force retry allocation
After fixing the root cause (disk space, node back up), nudge OpenSearch:
Common fix actions
Free disk space (delete old indices)
Danger
This permanently deletes data. Make sure you target the right indices.
Reduce replicas on a yellow index
If you only have 1 data node, replicas can never be assigned. Reduce to 0:
Close an index to save resources
A closed index uses almost no heap or CPU but cannot be searched:
Cluster-level diagnostics
Pending tasks
If this list is long, the master node is overwhelmed.
Hot threads (find CPU-heavy operations)
Recovery progress
During shard recovery (node restart, replica allocation), track progress:
Task list (running queries / operations)
Quick reference card
| What | Command |
|---|---|
| Cluster status | GET _cluster/health |
| Sick indices | GET _cat/indices?v&health=yellow |
| Unassigned shards | GET _cat/shards?v + grep UNASSIGNED |
| Why unassigned? | GET _cluster/allocation/explain |
| Node disk usage | GET _cat/allocation?v |
| Force re-allocate | POST _cluster/reroute?retry_failed=true |
| Recovery progress | GET _cat/recovery?v&active_only=true |
| Delete old data | DELETE /index-pattern-* |
Note
All _cat APIs accept ?format=json if you prefer JSON over the columnar format.