Upgrades

How to Upgrade Ceph in Proxmox VE

Step-by-step guide for upgrading Ceph in Proxmox VE across release trains from Pacific to Quincy to Reef to Squid, including OSD restarts, health checks, and crush map compatibility.

ProxmoxR app icon

Managing Proxmox? Try ProxmoxR

Monitor and control your VMs & containers from your phone.

Try Free

Understanding the Ceph Release Train

Ceph follows a named release cycle, with each major version bringing new features, performance improvements, and deprecations. Proxmox VE has supported the following Ceph releases across its versions:

  • Pacific (16.2.x) - Shipped with PVE 7.0-7.1
  • Quincy (17.2.x) - Shipped with PVE 7.2+ and early PVE 8
  • Reef (18.2.x) - Shipped with PVE 8.1+
  • Squid (19.2.x) - Expected with PVE 9

Ceph upgrades must follow the release order sequentially. You cannot skip a release -- going from Pacific directly to Reef is not supported. Each hop must be completed and verified before proceeding to the next.

Pre-Upgrade Checks

Before upgrading Ceph, verify the current state of your cluster and ensure it is healthy:

# Check the current Ceph version on all daemons
ceph version
ceph versions

# Verify cluster health
ceph -s
ceph health detail

# Check OSD status
ceph osd tree
ceph osd df

# Verify all PGs are active+clean
ceph pg stat

The cluster must show HEALTH_OK or at most minor warnings before you begin. If there are degraded PGs, unhealthy OSDs, or ongoing recovery operations, resolve those first. Upgrading a degraded cluster dramatically increases the risk of data loss.

Setting the noout Flag

During the upgrade, OSDs will be restarted one at a time. Setting the noout flag prevents Ceph from marking restarting OSDs as permanently out of the cluster and triggering unnecessary data rebalancing:

# Set noout flag before starting
ceph osd set noout

# Verify the flag is active
ceph osd dump | grep noout

# You will remove this flag after ALL upgrades are complete
# ceph osd unset noout

This is a critical step. Without noout, each OSD restart triggers a recovery storm that dramatically slows the upgrade and puts unnecessary load on the cluster.

Upgrading Monitor and Manager Daemons First

The upgrade order within a Ceph release is: monitors (mon) first, then managers (mgr), then OSDs, and finally any MDS or RGW daemons. Upgrade monitors on each node one at a time:

# On each node, update Ceph packages
apt update
apt install ceph-mon ceph-mgr

# Restart the monitor on this node
systemctl restart ceph-mon@$(hostname -s)

# Wait for the monitor to rejoin quorum
ceph mon stat

# Restart the manager on this node
systemctl restart ceph-mgr@$(hostname -s)

# Verify manager status
ceph mgr dump | grep active

Wait for each monitor to fully rejoin the quorum before moving to the next node. Running ceph mon stat should show all monitors as part of the quorum.

OSD-by-OSD Restart Strategy

OSDs must be restarted individually, with a health check between each restart. This is the most time-consuming part but ensures data safety:

# On each node, install updated OSD packages
apt install ceph-osd

# List OSDs on this node
ceph osd ls-tree $(hostname -s)

# Restart each OSD one at a time
systemctl restart ceph-osd@0
sleep 10
ceph -s  # Wait for health to stabilize

systemctl restart ceph-osd@1
sleep 10
ceph -s

# Repeat for each OSD on this node

For large clusters, you can script this process but always include a health check loop between restarts:

#!/bin/bash
# Script to restart OSDs one at a time with health check
for osd_id in $(ceph osd ls-tree $(hostname -s)); do
    echo "Restarting OSD.$osd_id"
    systemctl restart ceph-osd@$osd_id

    # Wait for OSD to come back up
    sleep 15

    # Wait for cluster to be healthy
    while ! ceph health | grep -q "HEALTH_OK\|HEALTH_WARN"; do
        echo "Waiting for cluster health..."
        sleep 10
    done

    echo "OSD.$osd_id restarted successfully"
done

Verifying Versions After Upgrade

After all daemons have been upgraded, verify that every daemon is running the same version:

# Show versions of all running daemons
ceph versions

# Expected output should show all daemons on the new version
# {
#     "mon": { "ceph version 18.2.x (hash) reef (stable)": 3 },
#     "mgr": { "ceph version 18.2.x (hash) reef (stable)": 3 },
#     "osd": { "ceph version 18.2.x (hash) reef (stable)": 12 }
# }

# Remove the noout flag
ceph osd unset noout

# Verify cluster health
ceph -s

CRUSH Map Compatibility and Health Warnings

After upgrading, you may see CRUSH map compatibility warnings. Newer Ceph versions support updated CRUSH features that can be enabled once all daemons are upgraded:

# Check for required CRUSH updates
ceph osd require-osd-release reef

# Enable new features for the cluster
ceph osd set-require-min-compat-client luminous

# If you see stretch mode or balancer warnings
ceph balancer status
ceph balancer mode upmap

Common health warnings after upgrade include DAEMON_OLD_VERSION (if any daemon was missed), OSD_FLAGS (if you forgot to unset noout), and AUTH_INSECURE_GLOBAL_ID_RECLAIM (which should be resolved by disabling insecure reclaim after verifying all clients are updated).

# Fix AUTH_INSECURE_GLOBAL_ID_RECLAIM warning
ceph config set mon auth_allow_insecure_global_id_reclaim false

Managing Ceph upgrades across a multi-node Proxmox cluster is one of the more complex maintenance operations you will perform. Having a clear view of your cluster's health during the process is invaluable. ProxmoxR provides monitoring dashboards that can help you track OSD status and cluster health in real time as you work through each node's upgrade sequence.

Take Proxmox management mobile

All the features discussed in this guide — accessible from your phone with ProxmoxR. Real-time monitoring, power control, firewall management, and more.

ProxmoxR

Manage Proxmox from your phone

Monitor, control, and manage your clusters on the go.

Free 7-day trial · No credit card required