Skip to content

Databricks cell timeout issue #220

@harshcurious

Description

@harshcurious

Hey @JacekPliszka ,
I have been facing some timeout issue with the plugin recently when I try to run notebooks in the interactive clusters. Essentially if there is a command running for a while (over 4-5 minutes) I get the following error:

Error: com.databricks.backend.common.rpc.CommandCancelledException
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$5(SequenceExecutionState.scala:132)
	at scala.Option.getOrElse(Option.scala:189)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3(SequenceExecutionState.scala:132)
	at com.databricks.spark.chauffeur.SequenceExecutionState.$anonfun$cancel$3$adapted(SequenceExecutionState.scala:129)
	at scala.collection.immutable.Range.foreach(Range.scala:158)
	at com.databricks.spark.chauffeur.SequenceExecutionState.cancel(SequenceExecutionState.scala:129)
	at com.databricks.spark.chauffeur.ExecContextState.abortRunningSequence(ExecContextState.scala:758)
	at com.databricks.spark.chauffeur.ExecContextState.$anonfun$shutdown$5(ExecContextState.scala:461)
	at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
	at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
	at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
	at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
	at scala.collection.TraversableLike.map(TraversableLike.scala:286)
	at scala.collection.TraversableLike.map$(TraversableLike.scala:279)
	at scala.collection.AbstractTraversable.map(Traversable.scala:108)
	at com.databricks.spark.chauffeur.ExecContextState.shutdown(ExecContextState.scala:461)
	at com.databricks.spark.chauffeur.ExecutionContextManagerV1.shutdownContext(ExecutionContextManagerV1.scala:764)
	at com.databricks.spark.chauffeur.ExecutionContextManagerV1.removeExecutionContext(ExecutionContextManagerV1.scala:317)
	at com.databricks.spark.chauffeur.ChauffeurState.$anonfun$process$1(ChauffeurState.scala:738)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:510)
	at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:616)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:643)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
	at com.databricks.spark.chauffeur.ChauffeurState.withAttributionContext(ChauffeurState.scala:84)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:96)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:77)
	at com.databricks.spark.chauffeur.ChauffeurState.withAttributionTags(ChauffeurState.scala:84)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:611)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:519)
	at com.databricks.spark.chauffeur.ChauffeurState.recordOperationWithResultTags(ChauffeurState.scala:84)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:511)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:475)
	at com.databricks.spark.chauffeur.ChauffeurState.recordOperation(ChauffeurState.scala:84)
	at com.databricks.spark.chauffeur.ChauffeurState.process(ChauffeurState.scala:728)
	at com.databricks.spark.chauffeur.Chauffeur$$anon$1$$anonfun$receive$1.handleDriverRequest$1(Chauffeur.scala:911)
	at com.databricks.spark.chauffeur.Chauffeur$$anon$1$$anonfun$receive$1.$anonfun$applyOrElse$4(Chauffeur.scala:937)
	at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:616)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:643)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
	at com.databricks.rpc.ServerBackend.withAttributionContext(ServerBackend.scala:22)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:96)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:77)
	at com.databricks.rpc.ServerBackend.withAttributionTags(ServerBackend.scala:22)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:611)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:519)
	at com.databricks.rpc.ServerBackend.recordOperationWithResultTags(ServerBackend.scala:22)
	at com.databricks.spark.chauffeur.Chauffeur$$anon$1$$anonfun$receive$1.handleDriverRequestWithUsageLogging$1(Chauffeur.scala:936)
	at com.databricks.spark.chauffeur.Chauffeur$$anon$1$$anonfun$receive$1.applyOrElse(Chauffeur.scala:959)
	at com.databricks.spark.chauffeur.Chauffeur$$anon$1$$anonfun$receive$1.applyOrElse(Chauffeur.scala:776)
	at com.databricks.rpc.ServerBackend.$anonfun$internalReceive0$2(ServerBackend.scala:174)
	at com.databricks.rpc.ServerBackend$$anonfun$commonReceive$1.applyOrElse(ServerBackend.scala:200)
	at com.databricks.rpc.ServerBackend$$anonfun$commonReceive$1.applyOrElse(ServerBackend.scala:200)
	at com.databricks.rpc.ServerBackend.internalReceive0(ServerBackend.scala:171)
	at com.databricks.rpc.ServerBackend.$anonfun$internalReceive$1(ServerBackend.scala:147)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperation$1(UsageLogging.scala:510)
	at com.databricks.logging.UsageLogging.executeThunkAndCaptureResultTags$1(UsageLogging.scala:616)
	at com.databricks.logging.UsageLogging.$anonfun$recordOperationWithResultTags$4(UsageLogging.scala:643)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
	at com.databricks.rpc.ServerBackend.withAttributionContext(ServerBackend.scala:22)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags(AttributionContextTracing.scala:96)
	at com.databricks.logging.AttributionContextTracing.withAttributionTags$(AttributionContextTracing.scala:77)
	at com.databricks.rpc.ServerBackend.withAttributionTags(ServerBackend.scala:22)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags(UsageLogging.scala:611)
	at com.databricks.logging.UsageLogging.recordOperationWithResultTags$(UsageLogging.scala:519)
	at com.databricks.rpc.ServerBackend.recordOperationWithResultTags(ServerBackend.scala:22)
	at com.databricks.logging.UsageLogging.recordOperation(UsageLogging.scala:511)
	at com.databricks.logging.UsageLogging.recordOperation$(UsageLogging.scala:475)
	at com.databricks.rpc.ServerBackend.recordOperation(ServerBackend.scala:22)
	at com.databricks.rpc.ServerBackend.internalReceive(ServerBackend.scala:146)
	at com.databricks.rpc.JettyServer$RequestManager.handleRPC(JettyServer.scala:1033)
	at com.databricks.rpc.JettyServer$RequestManager.handleRequestAndRespond(JettyServer.scala:953)
	at com.databricks.rpc.JettyServer$RequestManager.$anonfun$handleHttp$5(JettyServer.scala:548)
	at com.databricks.rpc.JettyServer$RequestManager.$anonfun$handleHttp$5$adapted(JettyServer.scala:513)
	at com.databricks.logging.activity.ActivityContextFactory$.$anonfun$withActivityInternal$13(ActivityContextFactory.scala:831)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
	at com.databricks.logging.activity.ActivityContextFactory$.withAttributionContext(ActivityContextFactory.scala:53)
	at com.databricks.logging.activity.ActivityContextFactory$.$anonfun$withActivityInternal$3(ActivityContextFactory.scala:831)
	at com.databricks.context.integrity.IntegrityCheckContext$ThreadLocalStorage$.withValue(IntegrityCheckContext.scala:73)
	at com.databricks.logging.activity.ActivityContextFactory$.withActivityInternal(ActivityContextFactory.scala:794)
	at com.databricks.logging.activity.ActivityContextFactory$.withActivityInternal(ActivityContextFactory.scala:776)
	at com.databricks.logging.activity.ActivityContextFactory$.$anonfun$withServiceRequestActivity$15(ActivityContextFactory.scala:285)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
	at com.databricks.logging.activity.ActivityContextFactory$.withAttributionContext(ActivityContextFactory.scala:53)
	at com.databricks.logging.activity.ActivityContextFactory$.withServiceRequestActivity(ActivityContextFactory.scala:285)
	at com.databricks.rpc.JettyServer$RequestManager.handleHttp(JettyServer.scala:513)
	at com.databricks.rpc.JettyServer$RequestManager.doPost(JettyServer.scala:408)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:665)
	at com.databricks.rpc.HttpServletWithPatch.service(HttpServletWithPatch.scala:33)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:750)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:554)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
	at org.eclipse.jetty.server.Server.handle(Server.java:516)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint.onFillable(SslConnection.java:555)
	at org.eclipse.jetty.io.ssl.SslConnection.onFillable(SslConnection.java:410)
	at org.eclipse.jetty.io.ssl.SslConnection$2.succeeded(SslConnection.java:164)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
	at com.databricks.rpc.InstrumentedQueuedThreadPool$$anon$1.$anonfun$run$2(InstrumentedQueuedThreadPool.scala:110)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.logging.AttributionContextTracing.$anonfun$withAttributionContext$1(AttributionContextTracing.scala:49)
	at com.databricks.logging.AttributionContext$.$anonfun$withValue$1(AttributionContext.scala:293)
	at scala.util.DynamicVariable.withValue(DynamicVariable.scala:62)
	at com.databricks.logging.AttributionContext$.withValue(AttributionContext.scala:289)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext(AttributionContextTracing.scala:47)
	at com.databricks.logging.AttributionContextTracing.withAttributionContext$(AttributionContextTracing.scala:44)
	at com.databricks.rpc.InstrumentedQueuedThreadPool.withAttributionContext(InstrumentedQueuedThreadPool.scala:45)
	at com.databricks.rpc.InstrumentedQueuedThreadPool$$anon$1.$anonfun$run$1(InstrumentedQueuedThreadPool.scala:110)
	at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
	at com.databricks.instrumentation.QueuedThreadPoolInstrumenter.trackActiveThreads(QueuedThreadPoolInstrumenter.scala:132)
	at com.databricks.instrumentation.QueuedThreadPoolInstrumenter.trackActiveThreads$(QueuedThreadPoolInstrumenter.scala:129)
	at com.databricks.rpc.InstrumentedQueuedThreadPool.trackActiveThreads(InstrumentedQueuedThreadPool.scala:45)
	at com.databricks.rpc.InstrumentedQueuedThreadPool$$anon$1.run(InstrumentedQueuedThreadPool.scala:92)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
	at java.base/java.lang.Thread.run(Thread.java:840)
	at p._doExecution (/Users/username/.cursor/extensions/paiqo.databricks-vscode-2.8.3/dist/node/extension.js:2:68874)
	at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
	at async p.executeHandler (/Users/username/.cursor/extensions/paiqo.databricks-vscode-2.8.3/dist/node/extension.js:2:56715)
	at async Yfe.$executeCells (file:///Applications/Cursor.app/Contents/Resources/app/out/vs/workbench/api/node/extensionHostProcess.js:198:39241)

When I went through the stderr logs in the cluster settings, I saw the following error:

ERROR:root:KeyboardInterrupt while sending command.
Traceback (most recent call last):
  File "/databricks/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/java_gateway.py", line 1038, in send_command
    response = connection.send_command(command)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/databricks/spark/python/lib/py4j-0.10.9.9-src.zip/py4j/clientserver.py", line 535, in send_command
    answer = smart_decode(self.stream.readline()[:-1])
                          ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/socket.py", line 707, in readinto
    return self._sock.recv_into(b)
           ^^^^^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt

So it seems like there is a KeyboardInterrupt being sent by the IDE. I couldn't figure out if the timeout is because of this extentions or from vscode/cursor itself. I found a setting for jupyter "jupyter.jupyterInterruptTimeout": 600000, and increased it to over 2 hrs. Since then I am stuck. Any clues on what could be going on?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions