A Simple tool to parse website metadata on server-side using puppeteer.
To perform metadata scraping in a serverless environment (e.g., Vercel), two key libraries are used:
-
puppeteer-core: The core API of Puppeteer, distributed without a bundled Chromium binary. This allows us to manage and deploy our own Chromium binary, which is essential in constrained environments like serverless functions. -
@sparticuz/chromium-min: A lightweight Chromium binary optimized for serverless environments. It is a smaller fork ofchrome-aws-lambda, designed specifically to comply with size limitations (e.g., the 50 MB unzipped limit imposed by platforms like Vercel).
Some serverless platforms, such as Vercel, restrict the maximum size of files within the deployment package. @sparticuz/chromium-min provides a trimmed-down Chromium binary that can be hosted externally (e.g., via a CDN) to keep deployment packages within acceptable limits.
In this setup, the Chromium binary is hosted on GitHub. While its not ideal, it serves as a free and functional CDN alternative for development and low-traffic use cases.
Note: Ensure that the version of
puppeteer-coreused matches the version of Chromium provided by@sparticuz/chromium-minto avoid compatibility issues.