Skip to content

NPE occurring in task pipeline if disabled instance has task current state #3079

@GrantPSpencer

Description

@GrantPSpencer

Describe the bug

An NPE occurs in the task pipeline due to the difference between two maps.

_participantActiveTaskCount will be populated with all live and enabled instances as part of resetActiveTaskCount. If this is first populated when node X is already disabled (e.g., evacuate), then it will not contain node X. However, if that node X still has a task current state on it, then currentStateOutput.getPartitionCountWithPendingState will return a map with that key. In fillActiveTaskCount(..), we iterate over the keys in the current state map and get value of that key in both the that map and the _participantActiveTaskCount (live+enabled) map. This will cause _participantActiveTaskCount.get(participant) to return null and throw an NPE when we attempt to perform arithmetic (+ operator).

Fix should either be to safely handle this mismatch or prevent mismatch entirely

To Reproduce

Did not verify, but likely:

  1. Initiate evacuate on a node that has an active task current state in either init or running
  2. Reset controller so that it rebuilds _participantActiveTaskCount
  3. NPE will then occur as there are different keys in the current state map and the map generated from live and enabled instances

Expected behavior

Do not throw NPE on keyset mismatch

Additional context

stack trace

"message": Exception while executing TASK pipeline for cluster <cluster_name_here>. Will not continue to next pipeline,
"exceptionChain": [
	{
		"index": 0,
		"message": "Cannot invoke \"java.lang.Integer.intValue()\" because the return value of \"java.util.Map.get(Object)\" is null",
		"stackTrace": [
			{
				"index": 0,
				"call": "fillActiveTaskCount",
				"columnNumber": null,
				"fileName": "WorkflowControllerDataProvider.java",
				"lineNumber": 192,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider"
			},
			{
				"index": 1,
				"call": "resetActiveTaskCount",
				"columnNumber": null,
				"fileName": "WorkflowControllerDataProvider.java",
				"lineNumber": 178,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.dataproviders.WorkflowControllerDataProvider"
			},
			{
				"index": 2,
				"call": "process",
				"columnNumber": null,
				"fileName": "TaskSchedulingStage.java",
				"lineNumber": 81,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.stages.task.TaskSchedulingStage"
			},
			{
				"index": 3,
				"call": "handle",
				"columnNumber": null,
				"fileName": "Pipeline.java",
				"lineNumber": 75,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.pipeline.Pipeline"
			},
			{
				"index": 4,
				"call": "handleEvent",
				"columnNumber": null,
				"fileName": "GenericHelixController.java",
				"lineNumber": 905,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.GenericHelixController"
			},
			{
				"index": 5,
				"call": "run",
				"columnNumber": null,
				"fileName": "GenericHelixController.java",
				"lineNumber": 1556,
				"nativeMethod": false,
				"source": "org.apache.helix.controller.GenericHelixController$ClusterEventProcessor"
			}
		],
		"type": "java.lang.NullPointerException"
	}
],
"level": ERROR,

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions