Conversation
- overrides in SnappyTaskSchedulerImpl to track per executor cores used by a job and cap it to number of physical cores on a node - combined some maps in TaskSchedulerImpl to recover performance due to above and improve further compared to base TaskSchedulerImpl - property "spark.scheduler.limitJobCores=false" can be set to revert to previous behaviour
rishitesh
left a comment
There was a problem hiding this comment.
Some comments and clarifications sought.
| bid | ||
| case Some(b) => b._blockId = msg.blockManagerId; b | ||
| } | ||
| sc.taskScheduler.asInstanceOf[SnappyTaskSchedulerImpl].addBlockId(executorId, blockId) |
There was a problem hiding this comment.
SnappyTaskSchedulerImpl.addBlockId() method has a condition blockId.numProcessors < blockId.executorCores. From here it will never be satisfied.
There was a problem hiding this comment.
The "case None" is for a corner one where blockManager gets added before executor. For normal cases onExecutorAdded will be invoked first where number of physical cores have been properly initialized so addBlockId will work fine. Will add the handling for that case in onExecutorAdded and invoke addBlockId from the Some() match case there.
There was a problem hiding this comment.
Will also add removal in onExecutorRemoved.
| private val lookupExecutorCores = new ToLongFunction[String] { | ||
| override def applyAsLong(executorId: String): Long = { | ||
| maxExecutorTaskCores.get(executorId) match { | ||
| case null => Int.MaxValue // no restriction |
There was a problem hiding this comment.
Should not defaultParallelism be better than Int.maxVal
There was a problem hiding this comment.
Null means that cores defined for executor are less than or equal to physical cores on the machine, or limit job has been explicitly disabled. Both cases imply the same thing that is don't put any limits on tasks on a node so this essentially falls back to Spark's TaskSchedulerImpl behaviour.
| val manager = createTaskSetManager(taskSet, maxTaskFailures) | ||
| val stage = taskSet.stageId | ||
| val (stageAvailableCores, stageTaskSets) = stageCoresAndAttempts.computeIfAbsent( | ||
| stage, createNewStageMap) |
There was a problem hiding this comment.
Should we not be setting the manager for the stageSet. I can see
stageTaskSets(taskSet.stageAttemptId) = manager in original TaskSchedulerImpl.
There was a problem hiding this comment.
Yes done below in line number 112.
| taskIdExecutorAndManager.justPut(tid, execId -> taskSet) | ||
| executorIdToRunningTaskIds(execId).add(tid) | ||
| if (availableCores ne null) { | ||
| availableCores.addValue(execId, -CPUS_PER_TASK) |
There was a problem hiding this comment.
Can we put an assertion similar to assert(availableCpus(i) >= 0) ?
We might catch some of the erroneous updates.
8b43301 to
2b254d9
Compare
2c254f0 to
0f2888f
Compare
a466d26 to
ea127bd
Compare
99ec79c to
c7b84fa
Compare
See some details in the JIRA https://jira.snappydata.io/browse/SNAP-2231
These changes limit the maximum cores given to a job to the physical cores on a machine.
With the default of (2 * physical cores) in the cluster, this allows other cores to be free
for any other concurrent jobs. Especially important for short point-lookup queries.
Additionally these improve performance for disk intensive queries. For example measured
a 30-50% improvement in performance in TPCH load and some queries when cores were
limited to physical cores and lot of data has overflowed to disk.
Question: should the default cores in ExecutorInitiator be increased to (4 * physical cores)
to allow for more concurrency?
Changes proposed in this pull request
and cap it to number of physical cores on a node
and improve further compared to base TaskSchedulerImpl
Patch testing
precheckin -Pstore -Pspark
TODO: working on porting Spark's TaskScheduler unit tests
ReleaseNotes.txt changes
document the new property and behaviour
Other PRs
TIBCOSoftware/snappy-spark#96