-
Notifications
You must be signed in to change notification settings - Fork 242
Description
Describe the bug
An NPE occurs in the task pipeline due to the difference between two maps.
_participantActiveTaskCount will be populated with all live and enabled instances as part of resetActiveTaskCount. If this is first populated when node X is already disabled (e.g., evacuate), then it will not contain node X. However, if that node X still has a task current state on it, then currentStateOutput.getPartitionCountWithPendingState will return a map with that key. In fillActiveTaskCount(..), we iterate over the keys in the current state map and get value of that key in both the that map and the _participantActiveTaskCount (live+enabled) map. This will cause _participantActiveTaskCount.get(participant) to return null and throw an NPE when we attempt to perform arithmetic (+ operator).
Fix should either be to safely handle this mismatch or prevent mismatch entirely
To Reproduce
Did not verify, but likely:
- Initiate evacuate on a node that has an active task current state in either init or running
- Reset controller so that it rebuilds
_participantActiveTaskCount - NPE will then occur as there are different keys in the current state map and the map generated from live and enabled instances
Expected behavior
Do not throw NPE on keyset mismatch
Additional context
stack trace
"message": Exception while executing TASK pipeline for cluster <cluster_name_here>. Will not continue to next pipeline,
"exceptionChain": [
{
"index": 0,
"message": "Cannot invoke \"java.lang.Integer.intValue()\" because the return value of \"java.util.Map.get(Object)\" is null",
"stackTrace": [
{
"index": 0,
"call": "fillActiveTaskCount",
"columnNumber": null,
"fileName": "WorkflowControllerDataProvider.java",
"lineNumber": 192,
"nativeMethod": false,
"source": "org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider"
},
{
"index": 1,
"call": "resetActiveTaskCount",
"columnNumber": null,
"fileName": "WorkflowControllerDataProvider.java",
"lineNumber": 178,
"nativeMethod": false,
"source": "org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider"
},
{
"index": 2,
"call": "process",
"columnNumber": null,
"fileName": "TaskSchedulingStage.java",
"lineNumber": 81,
"nativeMethod": false,
"source": "org.apache.helix.controller.stages.task.TaskSchedulingStage"
},
{
"index": 3,
"call": "handle",
"columnNumber": null,
"fileName": "Pipeline.java",
"lineNumber": 75,
"nativeMethod": false,
"source": "org.apache.helix.controller.pipeline.Pipeline"
},
{
"index": 4,
"call": "handleEvent",
"columnNumber": null,
"fileName": "GenericHelixController.java",
"lineNumber": 905,
"nativeMethod": false,
"source": "org.apache.helix.controller.GenericHelixController"
},
{
"index": 5,
"call": "run",
"columnNumber": null,
"fileName": "GenericHelixController.java",
"lineNumber": 1556,
"nativeMethod": false,
"source": "org.apache.helix.controller.GenericHelixController$ClusterEventProcessor"
}
],
"type": "java.lang.NullPointerException"
}
],
"level": ERROR,