Skip to content

Comments

Add I/O statistics on Linux#5

Open
MitchLewis930 wants to merge 1 commit intopr_015_beforefrom
pr_015_after
Open

Add I/O statistics on Linux#5
MitchLewis930 wants to merge 1 commit intopr_015_beforefrom
pr_015_after

Conversation

@MitchLewis930
Copy link

@MitchLewis930 MitchLewis930 commented Jan 30, 2026

PR_015


Note

Medium Risk
Touches monitoring stats generation and FsInfo serialization/output, which can affect node stats API compatibility and stability. Logic is Linux-gated and mostly best-effort (returns null on errors), reducing blast radius but still worth careful review.

Overview
Adds Linux-only disk I/O reporting to node fs stats by mapping each data path to its backing block device (major/minor) and sampling /proc/diskstats to compute per-interval read/write ops and KB, plus aggregated totals.

This extends FsInfo wire/XContent formats with an optional io_stats section (new DeviceStats/IoStats), updates FsProbe/FsService to pass prior samples for delta calculations, and caches device identifiers from /proc/self/mountinfo via new lucene:*device_number filestore attributes.

Includes security policy changes to allow reading /proc/diskstats, updates affected tests/fixtures for the new FsInfo constructor/serialization, and adds new unit tests for DeviceStats/FsProbe I/O parsing/deltas.

Written by Cursor Bugbot for commit ecce53f. This will update automatically on new commits. Configure here.

This commit adds a variety of real disk metrics for the block devices
that back Elasticsearch data paths. A collection of statistics are read
from /proc/diskstats and are used to report the raw metrics for
operations and read/write bytes.

Relates elastic#15915
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 5 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

builder.field(WRITE_OPERATIONS, totalWriteOperations);
builder.field(READ_KILOBYTES, totalReadKilobytes);
builder.field(WRITE_KILOBYTES, totalWriteKilobytes);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing endObject() call for "total" in IoStats serialization

High Severity

The IoStats.toXContent() method calls builder.startObject("total") on line 399 but never calls the corresponding builder.endObject(). This results in malformed JSON/XContent output where the "total" object is never closed, causing serialization errors when generating filesystem statistics.

Fix in Cursor Fix in Web

} catch (Exception e) {
majorDeviceNumber = -1;
minorDeviceNumber = -1;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Device numbers uninitialized when mount point not found

Medium Severity

The majorDeviceNumber and minorDeviceNumber fields default to 0 when no matching mount point is found in /proc/self/mountinfo. The code uses -1 to indicate "unknown" (as seen in the exception handler), but if the loop completes without finding a match, the values remain at 0. This inconsistency causes FsProbe to treat (0, 0) as a valid device number since it checks for != -1.

Fix in Cursor Fix in Web

} catch (IOException e) {
logger.debug("unexpected exception reading filesystem info", e);
return null;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cache refresh returns null violating SingleObjectCache contract

High Severity

The stats() helper method returns null on IOException, and FsInfoCache.refresh() propagates this null. However, SingleObjectCache requires refresh() to never return null—this is enforced by assertions and documented in its contract. With assertions enabled, this causes AssertionError. With assertions disabled, the null value is cached and returned to callers, causing NullPointerException. The previous code returned an empty FsInfo object instead of null.

Additional Locations (1)

Fix in Cursor Fix in Web

permission java.io.FilePermission "/proc/sys/vm/max_map_count", "read";

// io stats on Linux
permission java.io.FilePermission "/proc/diskstats", "read";
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing security permission for /proc/self/mountinfo file access

High Severity

The new code in ESFileStore reads /proc/self/mountinfo to obtain device numbers, but security.policy only adds permission for /proc/diskstats, not /proc/self/mountinfo. When running with a SecurityManager, this causes a SecurityException that's silently caught, setting device numbers to -1. This prevents any devices from being matched, completely disabling the I/O statistics feature in production.

Additional Locations (1)

Fix in Cursor Fix in Web

private final FsProbe probe;

private final SingleObjectCache<FsInfo> fsStatsCache;
private final TimeValue refreshInterval;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unused field stored after constructor

Low Severity

The refreshInterval field is assigned in the constructor (line 49), logged (line 50), and passed to FsInfoCache (line 51), but never accessed after construction. It should be a local variable in the constructor instead of a class field.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants