Elasticsearch UNASSIGNED 分片问题排查与解决

背景

启动 ES 后发现查询报错，且存在 Unassigned 分片状态。

分析问题

检查分片状态（`_cluster/allocation/explain`）

错误信息：cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster

[user_es@VM_113_96_centos elasticsearch-7.9.3]$ curl -XGET "http://localhost:9200/_cluster/allocation/explain/?pretty"
{
  "index" : ".kibana_task_manager_1",
  "shard" : 0,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "CLUSTER_RECOVERED",
    "at" : "2022-09-07T03:09:28.454Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
  "node_allocation_decisions" : [
    {
      "node_id" : "PVgL4CRlRwWHyCOmhCQLhQ",
      "node_name" : "VM_113_96_centos",
      "transport_address" : "192.168.0.16:9300",
      "node_attributes" : {
        "ml.machine_memory" : "8202780672",
        "xpack.installed" : "true",
        "transform.node" : "true",
        "ml.max_open_jobs" : "20"
      },
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    }
  ]
}

查看哪些索引处于 unassigned 状态

1	curl -s -XGET http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason \| grep UNASSIGNED

返回结果中每行列出索引名称、分片编号、主分片（p）或副本（r）、未分配原因，便于快速定位问题。

  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100   444  100   444    0     0  14255      0 --:--:-- --:--:-- --:--:-- 14322
.kibana_task_manager_1         0 p UNASSIGNED CLUSTER_RECOVERED
.kibana-event-log-7.9.3-000001 0 p UNASSIGNED CLUSTER_RECOVERED
.kibana_1                      0 p UNASSIGNED CLUSTER_RECOVERED
.apm-agent-configuration       0 p UNASSIGNED CLUSTER_RECOVERED

查看 ES 日志

1	low disk watermark [87%] exceeded on/replicas will not be assigned to this node

日志中存在警告。根据 ES 官方文档，磁盘使用率达到 85% 时，ES 会将该节点上的副本分片标为不可分配，导致无法写入数据。

解决办法

将 cluster.routing.allocation.disk.watermark.low 调整为 90%，并同时清理磁盘空间。

常见原因总结

分片数多于节点数

当节点加入或离开集群时，主节点会自动重新分配分片，以确保同一分片的主副本不会落在同一节点上。若主分片与副本同节点，节点宕机则数据丢失，副本失去意义。因此建议：$N \geq R + 1$（节点数 ≥ 副本数 + 1）。

Shard 默认延迟分配

节点与 master 失联后，集群不会立即重新分配分片，而会等待一段时间以确认该节点是否重新加入。如果重新加入，则保留现有分片数据，不触发新的分配。

可全局或按索引修改延迟时间（delayed_timeout）：

PUT /_all/_settings
{
  "settings": {
    "index.unassigned.node_left.delayed_timeout": "10m"
  }
}

通过 _all 可为所有索引设置该参数，示例中将默认等待时间改为 10 分钟。如不想等待，可设置 delayed_timeout: 0。

注意：延迟分配不会阻止副本被提升为主分片，集群仍会进行必要的提升以恢复到 yellow 状态。缺失副本的重建是唯一被延迟的过程。

若节点在超时后重新加入，且集群尚未完成分片迁移，ES 会比较该节点磁盘上的分片数据与当前集群主分片数据是否一致。若一致（无新增、修改、删除），master 会取消正在进行的再平衡并恢复本地数据——本地磁盘恢复远快于网络传输。若分片已产生分歧（节点离线期间有新文档写入），则重新加入的节点会删除本地过时数据并重新获取。

重启分片分配

一般情况下分配分配功能是默认开启的，不存在这种情况。但是某些时候可能禁用了分片分配（例如：滚动重启）

Disable shard allocation. This prevents Elasticsearch from rebalancing
missing shards until you tell it otherwise. If you know the
maintenance window will be short, this is a good idea. You can disable
allocation as follows:
https://www.elastic.co/guide/en/elasticsearch/guide/current/_rolling_restarts.html

需要手动开启分片分配：

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.enable": "all"
  }
}

低磁盘水印

Elasticsearch 的三级磁盘水印：

水印参数	默认值	触发行为
`cluster.routing.allocation.disk.watermark.low`	85%	不再将分片分配给该节点（对新建索引主分片无影响，但阻止副本分配）
`cluster.routing.allocation.disk.watermark.high`	90%	尝试将该节点上的分片迁移到其他低磁盘节点
`cluster.routing.allocation.disk.watermark.flood_stage`	95%	强制所有索引只读（`index.blocks.read_only_allow_delete = true`）

若节点较大（如 10 TB），可安全地将低水印调高至 90%：

PUT /_cluster/settings
{
  "transient": {
    "cluster.routing.allocation.disk.watermark.low": "90%"
  }
}

恢复写入：

PUT /_cluster/settings
{
  "persistent": {
    "cluster.blocks.read_only": false
  }
}

多个 ES 版本并存

滚动升级过程中，主节点不会将主分片副本分配给旧版本节点。若为此原因，升级旧版本节点即可解决。

分片数据不在集群中

若只有主分片未分配，可能是以下原因之一：

分片在无副本的节点上创建（加速初始化时关闭了副本），但节点在数据复制前与集群断开连接
节点重新连接时，将分片信息同步到主节点的过程中因某种原因失败

处理方式：优先尝试让原始节点重新加入集群；若无法恢复，可使用 Cluster Reroute API 强制分配空分片并重新索引数据。

警告：强制分配空分片后，若原节点重新加入，其数据会被新的空主分片覆盖（视为更旧的版本）。

若确认要强制分配，使用 allocate_empty_primary：

POST /_cluster/reroute?pretty
{
  "commands": [
    {
      "allocate_empty_primary": {
        "index": "test-index",
        "shard": 0,
        "node": "<NODE_NAME>",
        "accept_data_loss": "true"
      }
    }
  ]
}

必须指定 "accept_data_loss": "true" 以确认接受数据丢失风险，否则会报错：

{
  "error" : {
    "root_cause" : [
      {
        "type" : "remote_transport_exception",
        "reason" : "[NODE_NAME][127.0.0.1:9300][cluster:admin/reroute]"
      }
    ],
    "type" : "illegal_argument_exception",
    "reason" : "[allocate_empty_primary] allocating an empty primary for [test-index][0] can result in data loss. Please confirm by setting the accept_data_loss parameter to true"
  },
  "status" : 400
}

如需恢复丢失数据，请使用 Snapshot and Restore API 从备份快照中恢复。

`unable to find any unassigned shards to explain`

在 ES 集群管理过程中，分片出现 unassigned 状态的常见原因有两个：节点意外重启（NODE_LEFT）、磁盘空间不足（No space left on device）。若是重启节点姿势不正确导致主分区挂掉，日志无法写入，拖慢整个 index，稍等后查看日志，可以看到 cluster 状态由 red → green。

使用 `explain` 对具体索引排查问题

1
2
3

curl -XGET -H 'Content-Type: application/json' \
  <host>:9200/_cluster/allocation/explain \
  -d '{"index": "<index_name>", "shard": 0, "primary": true}'