Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions classes/robot/crawler.php
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,7 @@ public function process_queue($verbose = false) {

// Get a new zero second timeout lock for the resource.
if (!$lock = $lockfactory->get_lock($resource, 0)) {
$hastime = time() < $cronstop; // Check if we need to break from processing
continue; // Try crawl the next node, this one is already being processed.
}

Expand All @@ -567,6 +568,7 @@ public function process_queue($verbose = false) {
$persistent = new url(0, $node);
$persistent->update();
$lock->release();
$hastime = time() < $cronstop; // Check if we need to break from processing
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect the flawed logic might be in this loop specifically.

since when stracing they seemed to be spinning calling update and then checking the lock advisory table.

Copy link
Author

@catalystfd catalystfd Jul 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it could be specifically courses that have a lastcrawled of null, they are always eligible for queue selection.

And setting the needscrawl to lastcrawled will simply set it to null, which means they will always be eligible for queue selection even if the course is never in recent courses, and this would be an infinite loop

and if all the available cron processes are tied up running this loop, there will never be time/space to update the needscrawl field.

I think this change is still good, but perhaps we also need to skip this entire if block if the nodes lastcrawled is null.

continue;
}

Expand Down