Skip to content

Crawling from WP CLI - Mangled URL - Uncaught InvalidArgumentException: Unable to parse URI #908

@stellarpower

Description

@stellarpower

Before creating an issue / filing a support request

  • try to troubleshoot the issue yourself (see Troubleshooting guide)
  • prepare as much information as possible to help the developer
  • Identify the issue as likely: Theme / Plugin / Environment or WP2Static bug

Determining if it's an issue with Theme, Plugin, Environment or a bug in WP2Static

This is a difference in behaviour between the CLI and web interfaces to WP2Static. So, even if there are issues elsewhere, I believe this is most appropriate as a bug against WP2Static itself for the time being.

Describe the bug
So far, WP2Static has worked without a hitch. I have installed from a ZIP file and only ever processed form the web UI.

This site is hosted on a local machin in a contianer; after exporting, the static site is sent up into the cloud for live hosting. _In the settings, I set the "Deployment URL" to be simply /; this has allowed maximal flexibility with hosting downstream, where the static site can be viewed under multiple subdomains without any problems. The export process from the web UI works fine with this.

I took some time today to have a play kicking off the process programmatically using the wp CLI tool. If I begin the export this way (wp wp2static crawl), from the logs it fetches all the pages okay, and then after this, at some point it seems an invalid URL replacement is being performed - or in some other manner, a totally mangled URL comes out. Then, this is throwing an exception and I get a backtrace in the logs.

If I then proceed to generate the export again from the web UI, I get a 500 message back from the browser same as this one

If I delete the plugin, and re-upload from a zip, to nuke my settings (can I do this a faster way, BTW?); go back and change my settings, then we are back to normal. Given the documentation seems to be a little out of date, it's possible I am not using the CLI tool properly. Ideally I'd like it to kick off a job with the exact same settings as currently configured in the UI; but perhaps I need to give it some more options. Otherwise, this seems to suggest that the CLI tool is missing or adding a step that mutates the state in the settings, and so then web-based calls are failing too.

To Reproduce
Steps to reproduce the behavior:

  • Remove and re-install the WP2Static plugin (7.2 zip upload)
  • Exort okay from the web UI.
  • wp wp2static crawl

Environment (please complete the following information):

  • Hosting OS: Linux
  • Web server setup: container (image)
  • Hosting company: local installation.

The website is behind a reverse-proxy using a self-signed certificate. The reverse proxy only serves TLS; it communicates with the WordPress unencrypted only. The instance is externally visible on a non-standard port.

Log files (please complete the following information):

[04-Nov-2023 02:55:56 UTC] PHP Fatal error:  Uncaught InvalidArgumentException: Unable to parse URI: https://machine.domain:888http/machine.domain:888/wp-content/et-cache/1010/et-core-unified-1010.min.css in /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpsr7/src/Uri.php:72
Stack trace:
#0 /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpsr7/src/Request.php(42): WP2StaticGuzzleHttp\Psr7\Uri->__construct()
#1 /var/www/html/sitename/wp-content/plugins/wp2static/src/Crawler.php(136): WP2StaticGuzzleHttp\Psr7\Request->__construct()
#2 /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticguzzle/src/Pool.php(56): WP2Static\Crawler->WP2Static\{closure}()
#3 [internal function]: WP2StaticGuzzleHttp\Pool::WP2StaticGuzzleHttp\{closure}()
#4 /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpromises/src/EachPromise.php(212): Generator->next()
#5 / in /var/www/html/sitename/wp-content/plugins/wp2static/vendor/leonstafford/wp2staticpsr7/src/Uri.php on line 72
[2023-11-04T02:12:43+00:00] Starting crawling
[2023-11-04T02:12:43+00:00] Using basic auth credentials to crawl
[2023-11-04T02:12:43+00:00] Starting to crawl detected URLs.
[2023-11-04T02:12:43+00:00] Using CrawlCache.
[2023-11-04T02:13:21+00:00] Crawling progress: 300 crawled, 300 skipped (cached).
[2023-11-04T02:13:25+00:00] Crawling progress: 600 crawled, 600 skipped (cached).
[2023-11-04T02:13:29+00:00] Crawling progress: 900 crawled, 900 skipped (cached).
[2023-11-04T02:13:32+00:00] Crawling progress: 1200 crawled, 1200 skipped (cached).
[2023-11-04T02:13:45+00:00] Crawling progress: 1500 crawled, 1500 skipped (cached).
[2023-11-04T02:13:51+00:00] Crawling progress: 1800 crawled, 1800 skipped (cached).
[2023-11-04T02:13:54+00:00] Crawling progress: 2100 crawled, 2100 skipped (cached).
[2023-11-04T02:13:58+00:00] Crawling progress: 2400 crawled, 2400 skipped (cached).
[2023-11-04T02:14:01+00:00] Crawling progress: 2700 crawled, 2700 skipped (cached).
[2023-11-04T02:14:05+00:00] Crawling progress: 3000 crawled, 3000 skipped (cached).
[2023-11-04T02:14:09+00:00] Crawling progress: 3300 crawled, 3300 skipped (cached).
[2023-11-04T02:14:12+00:00] Crawling progress: 3600 crawled, 3600 skipped (cached).
[2023-11-04T02:14:17+00:00] Crawling progress: 3900 crawled, 3900 skipped (cached).
[2023-11-04T02:14:22+00:00] Crawling progress: 4200 crawled, 4200 skipped (cached).
[2023-11-04T02:14:25+00:00] Crawling progress: 4500 crawled, 4500 skipped (cached).
[2023-11-04T02:14:28+00:00] Crawling progress: 4800 crawled, 4800 skipped (cached).
[2023-11-04T02:14:32+00:00] Crawling progress: 5100 crawled, 5100 skipped (cached).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions