KVM: Enable HA heartbeat on ShareMountPoint#12773
KVM: Enable HA heartbeat on ShareMountPoint#12773weizhouapache wants to merge 5 commits intoapache:4.22from
Conversation
296835a to
93b95aa
Compare
Codecov Report✅ All modified and coverable lines are covered by tests.
Additional details and impacted files@@ Coverage Diff @@
## 4.22 #12773 +/- ##
=============================================
- Coverage 17.61% 3.70% -13.91%
=============================================
Files 5917 448 -5469
Lines 531402 38028 -493374
Branches 64971 7036 -57935
=============================================
- Hits 93586 1409 -92177
+ Misses 427262 36432 -390830
+ Partials 10554 187 -10367
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
@blueoranguran package |
There was a problem hiding this comment.
Pull request overview
Enhances KVM HA heartbeat to support SharedMountPoint primary storage by introducing a new heartbeat script and extending HA support checks across KVM storage and monitor components.
Changes:
- Added a new KVM “SharedMountPoint” heartbeat script (
kvmsmpheartbeat.sh) that writes heartbeat files to a mountpoint. - Expanded HA-supported storage pool types to include
SharedMountPointin driver, monitor, and pool logic. - Added a
setTypehook toKVMStoragePooland set the pool type after creation / retrieval to ensure HA detection works.
Reviewed changes
Copilot reviewed 7 out of 7 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| scripts/vm/hypervisor/kvm/kvmsmpheartbeat.sh | New local mountpoint-based heartbeat writer/checker script for SharedMountPoint HA. |
| plugins/storage/volume/default/.../CloudStackPrimaryDataStoreDriverImpl.java | Allows HA support for SharedMountPoint in primary storage capability logic. |
| plugins/hypervisors/kvm/.../LibvirtStoragePool.java | Enables HA support for SharedMountPoint and selects the correct heartbeat script. |
| plugins/hypervisors/kvm/.../KVMStoragePoolManager.java | Sets pool type after creation; exposes getStorageAdaptor. |
| plugins/hypervisors/kvm/.../KVMStoragePool.java | Introduces a default setType method for backward compatibility. |
| plugins/hypervisors/kvm/.../LibvirtCheckVMActivityOnStoragePoolCommandWrapper.java | Ensures retrieved pool has its type set before HA checks. |
| plugins/hypervisors/kvm/.../KVMHAMonitor.java | Expands heartbeat monitoring to include SharedMountPoint pools. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if [ ! -f "$hbFile" ]; then | ||
| # signal large difference if file missing | ||
| return 999999 | ||
| fi | ||
| now=$(date +%s) | ||
| hb=$(cat "$hbFile" 2>/dev/null) | ||
| if [ -z "$hb" ]; then | ||
| return 999998 | ||
| fi | ||
| diff=`expr $now - $hb 2>/dev/null` | ||
| if [ $? -ne 0 ] | ||
| then | ||
| return 999997 | ||
| fi | ||
| if [ -z "$interval" ]; then | ||
| # if no interval provided, consider 0 as success | ||
| if [ $diff -gt 0 ]; then | ||
| return $diff | ||
| else | ||
| return 0 | ||
| fi | ||
| fi | ||
| if [ $diff -gt $interval ] | ||
| then | ||
| return $diff | ||
| fi | ||
| return 0 | ||
| } | ||
|
|
||
| if [ "$rflag" == "1" ] | ||
| then | ||
| check_hbLog | ||
| diff=$? | ||
| if [ $diff == 0 ] |
There was a problem hiding this comment.
return in bash only supports exit codes 0–255; values like 999999 will wrap modulo 256, so diff=$? will not contain the intended “seconds ago” value and can lead to incorrect DEAD/ALIVE reporting. Consider having check_hbLog print the computed diff (e.g., via echo) and use its return code only as success/failure, or store the diff in a global variable and return 0/1.
| if [ ! -f "$hbFile" ]; then | |
| # signal large difference if file missing | |
| return 999999 | |
| fi | |
| now=$(date +%s) | |
| hb=$(cat "$hbFile" 2>/dev/null) | |
| if [ -z "$hb" ]; then | |
| return 999998 | |
| fi | |
| diff=`expr $now - $hb 2>/dev/null` | |
| if [ $? -ne 0 ] | |
| then | |
| return 999997 | |
| fi | |
| if [ -z "$interval" ]; then | |
| # if no interval provided, consider 0 as success | |
| if [ $diff -gt 0 ]; then | |
| return $diff | |
| else | |
| return 0 | |
| fi | |
| fi | |
| if [ $diff -gt $interval ] | |
| then | |
| return $diff | |
| fi | |
| return 0 | |
| } | |
| if [ "$rflag" == "1" ] | |
| then | |
| check_hbLog | |
| diff=$? | |
| if [ $diff == 0 ] | |
| hb_diff=0 | |
| if [ ! -f "$hbFile" ]; then | |
| # signal large difference if file missing | |
| hb_diff=999999 | |
| return 1 | |
| fi | |
| now=$(date +%s) | |
| hb=$(cat "$hbFile" 2>/dev/null) | |
| if [ -z "$hb" ]; then | |
| hb_diff=999998 | |
| return 1 | |
| fi | |
| diff=`expr $now - $hb 2>/dev/null` | |
| if [ $? -ne 0 ] | |
| then | |
| hb_diff=999997 | |
| return 1 | |
| fi | |
| if [ -z "$interval" ]; then | |
| # if no interval provided, consider 0 as success | |
| if [ $diff -gt 0 ]; then | |
| hb_diff=$diff | |
| return 1 | |
| else | |
| hb_diff=0 | |
| return 0 | |
| fi | |
| fi | |
| if [ $diff -gt $interval ] | |
| then | |
| hb_diff=$diff | |
| return 1 | |
| fi | |
| hb_diff=0 | |
| return 0 | |
| } | |
| if [ "$rflag" == "1" ] | |
| then | |
| check_hbLog | |
| status=$? | |
| diff="${hb_diff:-0}" | |
| if [ $status -eq 0 ] |
| if [ $diff == 0 ] | ||
| then | ||
| echo "=====> ALIVE <=====" | ||
| else | ||
| echo "=====> Considering host as DEAD because last write on [$hbFile] was [$diff] seconds ago, but the max interval is [$interval] <======" | ||
| fi |
There was a problem hiding this comment.
This is doing a numeric comparison but uses the string operator == and unquoted $diff. Use numeric operators (-eq, -ne) and quote variables in [ tests (or switch to [[ ... ]]) to avoid mis-comparisons or test errors if the variable is empty/unset.
| if [ $? -gt 0 ] | ||
| then | ||
| return | ||
| fi |
There was a problem hiding this comment.
$? here reflects only the exit status of the last command in the pipeline (awk), not whether ps/grep matched anything. As written, this check won’t behave as intended and is effectively redundant with the later -z "$vmPids" guard. Consider removing this $? block, or use set -o pipefail (with care for script-wide impact) or replace the pipeline with a more direct PID lookup (e.g., pgrep) and check for empty output.
| if [ $? -gt 0 ] | |
| then | |
| return | |
| fi |
| mounts=$(cat /proc/mounts | grep "$MountPoint") | ||
| if [ $? -gt 0 ] | ||
| then | ||
| # mount point not present — we don't remount in local-only script | ||
| # nothing to do here; keep for compatibility with original flow | ||
| : | ||
| else |
There was a problem hiding this comment.
grep "$MountPoint" can produce false positives when $MountPoint is a substring of another mount target (e.g., /mnt/cloudstack1 matching /mnt/cloudstack10). This can incorrectly treat the mount as present and trigger deleteVMs. Consider using a stricter mount check (e.g., matching the mount target field exactly or using findmnt).
There was a problem hiding this comment.
| mounts=$(cat /proc/mounts | grep "$MountPoint") | |
| if [ $? -gt 0 ] | |
| then | |
| # mount point not present — we don't remount in local-only script | |
| # nothing to do here; keep for compatibility with original flow | |
| : | |
| else | |
| mounts=$(cat /proc/mounts | grep "$MountPoint ") | |
| if [ $? -gt 0 ] | |
| then | |
| # mount point not present — we don't remount in local-only script | |
| # nothing to do here; keep for compatibility with original flow | |
| : | |
| else |
...ns/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStoragePoolManager.java
Outdated
Show resolved
Hide resolved
|
|
||
| public class KVMHAMonitor extends KVMHABase implements Runnable { | ||
|
|
||
| public static final List<StoragePoolType> STORAGE_POOL_TYPES_WITH_HA_SUPPORT = List.of(StoragePoolType.NetworkFilesystem, StoragePoolType.SharedMountPoint); |
There was a problem hiding this comment.
HA-supported pool types are now defined in multiple places (e.g., also in CloudStackPrimaryDataStoreDriverImpl), and LibvirtStoragePool depends on KVMHAMonitor just to reuse this constant. Consider centralizing this list in a small shared utility/helper in an appropriate common package for the KVM plugin (or a single authoritative location) to avoid duplication and reduce cross-class coupling.
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✖️ debian ✔️ suse15. SL-JID 17048 |
|
@blueorangutan package |
|
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress. |
|
Packaging result [SF]: ✔️ el8 ✔️ el9 ✔️ el10 ✔️ debian ✔️ suse15. SL-JID 17054 |
|
@blueorangutan test |
|
@weizhouapache a [SL] Trillian-Jenkins test job (ol8 mgmt + kvm-ol8) has been kicked to run smoke tests |
|
@sureshanaparti @rajujith |
|
[SF] Trillian test result (tid-15586)
|
…/storage/KVMStoragePoolManager.java
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/KVMHAMonitor.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (2)
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/storage/KVMStoragePoolManager.java:399
isPoolSupportHA()now includesSharedMountPoint, so this block will add SharedMountPoint primary pools to_haMonitor. However,deleteStoragePool(...)below still removes pools from_haMonitoronly whentype == NetworkFilesystem, which can leave SharedMountPoint pools registered in the HA monitor after deletion/disconnect. Update the removal condition to stay consistent with the expanded HA-support set.
private synchronized KVMStoragePool createStoragePool(String name, String host, int port, String path, String userInfo, StoragePoolType type, Map<String, String> details, boolean primaryStorage) {
StorageAdaptor adaptor = getStorageAdaptor(type);
KVMStoragePool pool = adaptor.createStoragePool(name, host, port, path, userInfo, type, details, primaryStorage);
pool.setType(type);
// LibvirtStorageAdaptor-specific statement
if (pool.isPoolSupportHA() && primaryStorage) {
KVMHABase.HAStoragePool storagePool = new KVMHABase.HAStoragePool(pool, host, path, PoolType.PrimaryStorage);
_haMonitor.addStoragePool(storagePool);
}
plugins/hypervisors/kvm/src/main/java/com/cloud/hypervisor/kvm/resource/KVMHAMonitor.java:145
- The debug log messages in this method are now used for both
NetworkFilesystemandSharedMountPoint, but still say "Found NFS storage pool". Also, this "Found ... continuing" message is logged even after the pool is marked for removal (e.g. whenstorage == null). Adjust the log text/placement to reflect "libvirt storage pool" and only log "found" when the pool is actually present and running.
private void checkForNotExistingLibvirtStoragePools(Set<String> removedPools, String uuid) {
try {
Connect conn = LibvirtConnection.getConnection();
StoragePool storage = conn.storagePoolLookupByUUIDString(uuid);
if (storage == null || storage.getInfo().state != StoragePoolState.VIR_STORAGE_POOL_RUNNING) {
if (storage == null) {
logger.debug(String.format("Libvirt storage pool [%s] not found, removing from HA list.", uuid));
} else {
logger.debug(String.format("Libvirt storage pool [%s] found, but not running, removing from HA list.", uuid));
}
removedPools.add(uuid);
}
logger.debug(String.format("Found NFS storage pool [%s] in libvirt, continuing.", uuid));
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mounts=$(cat /proc/mounts | grep "$MountPoint") | ||
| if [ $? -gt 0 ] | ||
| then | ||
| # mount point not present — we don't remount in local-only script | ||
| # nothing to do here; keep for compatibility with original flow | ||
| : | ||
| else | ||
| # mount exists; if not in read-check mode, consider deleting VMs similar to original behavior | ||
| if [ "$rflag" == "0" ] | ||
| then | ||
| deleteVMs $MountPoint | ||
| fi |
There was a problem hiding this comment.
The mount-point detection uses grep "$MountPoint" on /proc/mounts, which can yield false positives when the mount point is a prefix of another path (e.g. /mnt/cloudstack1 matching /mnt/cloudstack10). This can cause the script to treat the mount as present/absent incorrectly and trigger follow-on actions. Use an exact match on the mount target field (e.g. findmnt -n -o TARGET --target "$MountPoint" or matching the second column in /proc/mounts).
| mounts=$(cat /proc/mounts | grep "$MountPoint") | |
| if [ $? -gt 0 ] | |
| then | |
| # mount point not present — we don't remount in local-only script | |
| # nothing to do here; keep for compatibility with original flow | |
| : | |
| else | |
| # mount exists; if not in read-check mode, consider deleting VMs similar to original behavior | |
| if [ "$rflag" == "0" ] | |
| then | |
| deleteVMs $MountPoint | |
| fi | |
| if grep -q "^[^ ]\+ $MountPoint " /proc/mounts | |
| then | |
| # mount exists; if not in read-check mode, consider deleting VMs similar to original behavior | |
| if [ "$rflag" == "0" ] | |
| then | |
| deleteVMs $MountPoint | |
| fi | |
| else | |
| # mount point not present — we don't remount in local-only script | |
| # nothing to do here; keep for compatibility with original flow | |
| : |
| # mount exists; if not in read-check mode, consider deleting VMs similar to original behavior | ||
| if [ "$rflag" == "0" ] | ||
| then | ||
| deleteVMs $MountPoint | ||
| fi | ||
| fi | ||
|
|
There was a problem hiding this comment.
deleteVMs is executed when the mount exists (the grep succeeds) and the script is in write mode. In the original kvmheartbeat.sh, VMs are only killed after a remount attempt when the expected mount is missing. As written, this can kill running VMs on every successful heartbeat run whenever the storage is mounted. The condition should be inverted/adjusted so VM killing is only done for the intended failure/remount scenario (or removed entirely for the local-only heartbeat flow).
| # mount exists; if not in read-check mode, consider deleting VMs similar to original behavior | |
| if [ "$rflag" == "0" ] | |
| then | |
| deleteVMs $MountPoint | |
| fi | |
| fi | |
| # mount exists; in local-only heartbeat we do not kill VMs based on mount state | |
| # original kvmheartbeat.sh only deleted VMs after a failed remount; there is | |
| # no such remount logic here, so VM deletion is intentionally disabled | |
| : | |
| fi |
| return 999999 | ||
| fi | ||
| now=$(date +%s) | ||
| hb=$(cat "$hbFile" 2>/dev/null) | ||
| if [ -z "$hb" ]; then | ||
| return 999998 | ||
| fi | ||
| diff=`expr $now - $hb 2>/dev/null` | ||
| if [ $? -ne 0 ] | ||
| then | ||
| return 999997 | ||
| fi | ||
| if [ -z "$interval" ]; then | ||
| # if no interval provided, consider 0 as success | ||
| if [ $diff -gt 0 ]; then | ||
| return $diff | ||
| else | ||
| return 0 | ||
| fi | ||
| fi | ||
| if [ $diff -gt $interval ] | ||
| then | ||
| return $diff | ||
| fi | ||
| return 0 | ||
| } | ||
|
|
||
| if [ "$rflag" == "1" ] | ||
| then | ||
| check_hbLog | ||
| diff=$? | ||
| if [ $diff == 0 ] |
There was a problem hiding this comment.
check_hbLog returns large numeric values (e.g. return 999999 / return $diff) and the caller reads them via $?. In bash, function return/exit codes are limited to 0-255, so values wrap modulo 256 and can incorrectly report ALIVE (e.g. a diff of 512 seconds becomes exit code 0). Use stdout to pass the diff (or store it in a variable) and use the exit status purely as a boolean (alive/dead).
| return 999999 | |
| fi | |
| now=$(date +%s) | |
| hb=$(cat "$hbFile" 2>/dev/null) | |
| if [ -z "$hb" ]; then | |
| return 999998 | |
| fi | |
| diff=`expr $now - $hb 2>/dev/null` | |
| if [ $? -ne 0 ] | |
| then | |
| return 999997 | |
| fi | |
| if [ -z "$interval" ]; then | |
| # if no interval provided, consider 0 as success | |
| if [ $diff -gt 0 ]; then | |
| return $diff | |
| else | |
| return 0 | |
| fi | |
| fi | |
| if [ $diff -gt $interval ] | |
| then | |
| return $diff | |
| fi | |
| return 0 | |
| } | |
| if [ "$rflag" == "1" ] | |
| then | |
| check_hbLog | |
| diff=$? | |
| if [ $diff == 0 ] | |
| echo 999999 | |
| return 1 | |
| fi | |
| now=$(date +%s) | |
| hb=$(cat "$hbFile" 2>/dev/null) | |
| if [ -z "$hb" ]; then | |
| echo 999998 | |
| return 1 | |
| fi | |
| diff=`expr $now - $hb 2>/dev/null` | |
| if [ $? -ne 0 ] | |
| then | |
| echo 999997 | |
| return 1 | |
| fi | |
| if [ -z "$interval" ]; then | |
| # if no interval provided, consider 0 as success | |
| if [ $diff -gt 0 ]; then | |
| echo "$diff" | |
| return 1 | |
| else | |
| echo 0 | |
| return 0 | |
| fi | |
| fi | |
| if [ $diff -gt $interval ] | |
| then | |
| echo "$diff" | |
| return 1 | |
| fi | |
| echo 0 | |
| return 0 | |
| } | |
| if [ "$rflag" == "1" ] | |
| then | |
| diff=$(check_hbLog) | |
| status=$? | |
| if [ "$status" -eq 0 ] |
Description
This PR improves KVM HA heartbeat to support SharedMountPoint
Tested
Example of heartbeat files
Types of changes
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
Bug Severity
Screenshots (if appropriate):
How Has This Been Tested?
How did you try to break this feature and the system with this change?