-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/multiple provider updates #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
rupertgermann
wants to merge
9
commits into
main
Choose a base branch
from
feature/multiple-provider-updates
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
I've introduced the initial implementation for several new image providers and updated existing configurations in your code:
**New Providers:**
1. **Adobe Stock (`adobestock`):**
* I added `src/providers/adobestock-provider.js` with API mode implementation for searching images and retrieving watermarked preview URLs.
* I created `src/providers/configs/playwright/adobestock.js` with placeholder configurations for a potential scraping fallback mode.
* I integrated this into `provider-registry.js`, `config.json.example`, and `README.md`.
2. **Getty Images (`gettyimages`):**
* I added `src/providers/gettyimages-provider.js` with API mode implementation using `Api-Key` for searching images and retrieving preview URLs from `display_sizes`.
* I created `src/providers/configs/playwright/gettyimages.js` with placeholder configurations for a potential scraping fallback mode.
* I integrated this into `provider-registry.js`, `config.json.example`, and `README.md`.
3. **Dreamstime.com (`dreamstime`):**
* I added `src/providers/dreamstime-provider.js` with a placeholder structure. API implementation details are pending your acquisition of API documentation and keys from Dreamstime.
* I created `src/providers/configs/playwright/dreamstime.js` with placeholder configurations.
* I integrated this into `provider-registry.js`, `config.json.example`, and `README.md`.
**Existing Provider Updates:**
* **PublicDomainPictures.net (`publicdomainpictures`):**
* I updated `src/providers/configs/playwright/publicdomainpictures.js` through several iterations to refine CSS selectors and navigation settings. The current state allows finding detail page links, but full-size image URL extraction remains problematic and requires further testing from you.
* Selector for image links: `a[href*="view-image.php?image="]`
* Detail page navigation: `navigationWaitUntil: 'domcontentloaded'`
* Full-size image selector (needs verification): `a[href*="/velka/"]`
* **Reshot.com (`reshot`):**
* I reviewed the configuration. No code changes were made here. You will need to test this.
**Shared Code Updates (from `publicdomainpictures` testing):**
* I made modifications to `src/index.js`, `src/modes/playwright-crawler.js`, `src/providers/generic-playwright-provider.js`, and `src/utils/config.js` to enhance logging and improve provider activation for testing. I've kept these changes as they may be beneficial.
**General:**
* All new providers requiring API keys have been added to `config.json.example` with instructions.
* `README.md` has been updated to reflect the new providers.
Further testing by you is required for all mentioned providers to ensure full functionality and to complete API details for Dreamstime.
This commit includes the initial implementation for multiple image providers, focusing on API mode where documentation was available, and setting up placeholder structures where API details need to be acquired by you. **Implemented Providers (API Mode):** - Adobe Stock (`adobestock`): API search and preview. - Getty Images (`gettyimages`): API search and preview. - iStock (`istock`): Leverages Getty Images API. - 500px (`500px`): Legacy API search and preview. **Placeholder Providers (API Documentation Required by You):** - Dreamstime (`dreamstime`) - Stocksy (`stocksy`) - Alamy (`alamy`) - Bigstock (`bigstock`) (API docs not found, may be scraping only) - Pond5 (`pond5`) **Updates to Existing Providers:** - PublicDomainPictures.net (`publicdomainpictures`): Iteratively updated configuration. Full-size image extraction still needs work. - Reshot.com (`reshot`): Configuration reviewed. **General Changes:** - All new providers registered in `provider-registry.js`. - `config.json.example` updated for all new providers. - `README.md` updated to reflect new providers and their status. - Enhanced logging in some shared files from testing. This commit covers steps 1-12 of the provider implementation plan. Further work involves implementing the remaining providers and thorough testing of all implementations by you.
- Fetch more detailed image information from the MediaWiki API by adjusting the `iiprop` parameter (requesting url, size, mime, user, timestamp, commonmetadata, extmetadata). - Request a specific thumbnail width (`iiurlwidth=200`) to get a properly sized `thumbnailUrl`. - Return an array of structured `imageInfo` objects instead of just image URLs, aligning it with the data structure used by other providers. This includes fields like id, title, thumbnailUrl, detailPageUrl, fullSizeUrl, dimensions, uploader, etc. - The `getFullSizeImage` method was slightly adapted to correctly extract `fullSizeUrl` from the `imageInfo` object. This change improves the data quality obtained from Wikimedia Commons and makes its integration more consistent within the application.
…New Old Stock
This commit sets up and updates configurations for several scraping-based image providers, ensuring they align with the GenericPlaywrightProvider model.
- **Freeimages (`freeimages`):**
- I reviewed the existing config `src/providers/configs/playwright/freeimages.js`.
- I updated its entry in `config.json.example` to use the nested scrolling object structure for consistency and added `displayName` and `notes`.
- **Little Visuals (`littlevisuals`):**
- I confirmed the site is an archive.
- I created `src/providers/configs/playwright/littlevisuals.js` for `GenericPlaywrightProvider` to scrape all images from its main page (your search query will be ignored).
- I updated `config.json.example` and `README.md`.
- **New Old Stock (`newoldstock`):**
- My attempt to inspect the site was blocked by `robots.txt`.
- I created a placeholder config `src/providers/configs/playwright/newoldstock.js` with guessed selectors. This configuration requires manual verification and updates by you.
- I updated `config.json.example` and `README.md` with notes regarding its placeholder status.
All three providers are intended to be handled by `GenericPlaywrightProvider`.
…e URLs from srcset attribute
- Fix "providerConfig is not defined" error in detail_page function - Ensure the extracted image URL is returned instead of the detail page URL - Improve error handling to never download HTML content instead of images - Add robust URL resolution with multiple fallback sources
- Make URL handlers robust to different parameter types - Add URL extraction from objects in url_cleaning and url_param_decode functions - Improve error handling and logging for URL processing failures - Fix "Invalid URL" errors when cleaning Unsplash image URLs
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.