-
Notifications
You must be signed in to change notification settings - Fork 3
fix(docworker): Fix handling deleted document #353
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
|
@@ -19,7 +19,8 @@ | |||||||||||||||||
| CMD_COMPONENT, CMD_CHANNEL, PROG_NAME | ||||||||||||||||||
| from .context import Context | ||||||||||||||||||
| from .documents import DocumentFile, DocumentNameGiver | ||||||||||||||||||
| from .exceptions import create_job_exception, JobException | ||||||||||||||||||
| from .exceptions import create_job_exception, JobException, \ | ||||||||||||||||||
| DocumentNotFoundException | ||||||||||||||||||
| from .limits import LimitsEnforcer | ||||||||||||||||||
| from .templates import TemplateRegistry, Template, Format | ||||||||||||||||||
| from .utils import byte_size_format, check_metamodel_version | ||||||||||||||||||
|
|
@@ -84,6 +85,21 @@ def safe_format(self) -> Format: | |||||||||||||||||
| raise RuntimeError('Format is not set but it should') | ||||||||||||||||||
| return self.format | ||||||||||||||||||
|
|
||||||||||||||||||
| def check_document_exists(self) -> bool: | ||||||||||||||||||
| LOG.debug('Checking if document "%s" exists in DB', self.doc_uuid) | ||||||||||||||||||
| try: | ||||||||||||||||||
| doc = self.ctx.app.db.fetch_document( | ||||||||||||||||||
| document_uuid=self.doc_uuid, | ||||||||||||||||||
| tenant_uuid=self.tenant_uuid, | ||||||||||||||||||
| ) | ||||||||||||||||||
| exists = doc is not None | ||||||||||||||||||
| LOG.debug('Document "%s" exists: %s', self.doc_uuid, exists) | ||||||||||||||||||
| return exists | ||||||||||||||||||
| except Exception as e: | ||||||||||||||||||
| LOG.error('Failed to check if document "%s" exists: %s', | ||||||||||||||||||
| self.doc_uuid, str(e)) | ||||||||||||||||||
| return False | ||||||||||||||||||
|
|
||||||||||||||||||
| @handle_job_step('Failed to get document from DB') | ||||||||||||||||||
| def get_document(self): | ||||||||||||||||||
| SentryReporter.set_tags(phase='fetch') | ||||||||||||||||||
|
|
@@ -102,6 +118,7 @@ def get_document(self): | |||||||||||||||||
| raise create_job_exception( | ||||||||||||||||||
| job_id=self.doc_uuid, | ||||||||||||||||||
| message='Document record not found in database', | ||||||||||||||||||
| document_found=False, | ||||||||||||||||||
| ) | ||||||||||||||||||
| self.doc.retrieved_at = datetime.datetime.now(tz=datetime.UTC) | ||||||||||||||||||
| LOG.info('Job "%s" details received', self.doc_uuid) | ||||||||||||||||||
|
|
@@ -312,6 +329,11 @@ def _run(self): | |||||||||||||||||
| self.finalize() | ||||||||||||||||||
|
|
||||||||||||||||||
| def _set_failed(self, message: str): | ||||||||||||||||||
| document_exists = self.check_document_exists() | ||||||||||||||||||
| if not document_exists: | ||||||||||||||||||
| LOG.warning('Document %s does not exist, cannot set to FAILED', | ||||||||||||||||||
| self.doc_uuid) | ||||||||||||||||||
| return | ||||||||||||||||||
|
Comment on lines
+334
to
+336
|
||||||||||||||||||
| LOG.warning('Document %s does not exist, cannot set to FAILED', | |
| self.doc_uuid) | |
| return | |
| LOG.warning( | |
| 'Document %s may not exist or could not be checked; ' | |
| 'attempting to set state to FAILED anyway', | |
| self.doc_uuid, | |
| ) |
Copilot
AI
Jan 15, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The DocumentNotFoundException is caught and logged, but no error reporting to Sentry occurs. This creates inconsistent error reporting compared to the JobException handler (lines 350-353) which captures the exception in Sentry. Add SentryReporter.capture_exception(e) to ensure these errors are properly tracked.
| LOG.warning('Document not found: %s', e.log_message()) | |
| LOG.warning('Document not found: %s', e.log_message()) | |
| SentryReporter.capture_exception(e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
check_document_exists()method duplicates the document fetching logic fromget_document()(lines 113-116). Both methods callfetch_documentwith the same parameters. This duplication could lead to maintenance issues if the fetching logic needs to change. Consider refactoring to have a single method that performs the fetch, or have this method leverage the existingget_document()method's fetch logic.