-
Notifications
You must be signed in to change notification settings - Fork 120
Downscale images before embedding #4609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
image-embedder-lambda/src/index.ts
Outdated
| // We use sqrt because scaling affects both dimensions | ||
| const sizeRatio = MAX_IMAGE_SIZE_BYTES / imageBytes.length; | ||
| // Be conservative: target 80% of max size to account for compression variance | ||
| let scaleFactor = Math.sqrt(sizeRatio * 0.8); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need to think through if this square root thing is actually right... 🤔
You have a 4x4 image = 16 pixels = (let's pretend one byte per pixel, uncompressed) = 16 bytes
You want to halve the size to 8 bytes
If you simply halve the width & height to 2x2, you'll get 4 pixels = 4 times smaller than 16 bytes
Whereas if you scale width & height by 1/sqrt(2) you get:
about 2.8 * 2.8 = 7.84 = about 8
OK I get it!
|
Brute force seems the only way to ensure they will fall below certain filesize indeed. Sharp is a good choice (based on Vips) if lambda can haz any. Without either reading about model behaviour with different resolution/compression and other characteristics (eg. does transparency even matter?) or doing a lot of tests, it’s impossible to know what are the best compromises (better larger and more compressed or the other way around? fine to bake transparency into JPEGs or is transparency taken seriously? etc). One thing I can’t think wouldn’t be sensible is to convert everything to sRGB (this, I think). |
|
Thanks @paperboyo, those are great points. Interestingly, Cohere v4 (which we may move to soon), "images > 2,458,624 pixels are downsampled to that size; images < 3,136 pixels are upsampled". This makes me think that I should actually target 2.4 megapixels first, and see what size those typically come out at. If they're usually under 5 MiB, I think that's a reasonable default for downscaling. We can iteratively downscale for edge cases |
|
@paperboyo forgive me image ignorance, why is converting everything to sRGB a good idea? Would this save image bytes? |
This is but a hunch. Our corpus may stray more from sRGB which is the most prevalent on the web. Hunch is that robots weren’t trained to make better decisions on what they are looking at thanks to colourspace. It’s possible, they may just not see as well. Or – barf completely on something exotic colourmodels.
Converting would change pixel values. So that pixel differences that look the same would translate to similar numerical difference. Without conversion, they may give you smaller numerical difference. The smaller, relatively, the wider than sRGB the colourspace is (sRGB is the smallest space). I think. |
ellenmuller
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just some comments - not blockers :) Worked a dream when I tested it!!! 😍
image-embedder-lambda/src/index.ts
Outdated
| let result: Uint8Array = imageBytes; | ||
| let attempts = 0; | ||
| const maxAttempts = 5; | ||
|
|
||
| while (result.length > MAX_IMAGE_SIZE_BYTES && attempts < maxAttempts) { | ||
| attempts++; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I might prefer this as a for loop, eg:
for (
let attempts = 1;
attempts <= maxAttempts && result.length > MAX_IMAGE_SIZE_BYTES;
attempts++
)
Kind of personal preference, but it reads nicer imo!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I prefer the while loop. Or perhaps even a recursive function...
image-embedder-lambda/src/index.ts
Outdated
| const newWidth = Math.round(originalWidth * scaleFactor); | ||
| const newHeight = Math.round(originalHeight * scaleFactor); | ||
|
|
||
| console.log(`Attempt ${attempts}: scaling to ${newWidth}x${newHeight} (factor: ${scaleFactor.toFixed(3)})`); | ||
|
|
||
| let pipeline = sharp(imageBytes).resize(newWidth, newHeight, { | ||
| fit: "inside", | ||
| withoutEnlargement: true, | ||
| }); | ||
|
|
||
| // Output in the same format | ||
| if (mimeType === "image/jpeg") { | ||
| pipeline = pipeline.jpeg({ quality: 85 }); | ||
| } else if (mimeType === "image/png") { | ||
| pipeline = pipeline.png({ compressionLevel: 9 }); | ||
| } | ||
|
|
||
| result = new Uint8Array(await pipeline.toBuffer()); | ||
| console.log(`Result size: ${result.length} bytes`); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could probably be split off into its own function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I agree that this function is a little long, let me see if I can re-structure a bit
…-images-before-embedding
|
OK, I got curious (always the best and worst thing that can happen) and had a look at what was happening in detail, rather than just the test pass/fail. The file size change is much more aggressive than I would expect, coming out e.g. 0.14 times the size despite a scale factor of 0.9: Perhaps this isn't a bad thing if the resulting image still looks OK, but I should (a) understand why this is happening and (b) eyeball the resulting images. |
Depends on #4608, currently diffed to that branch
how to test
npm run test:integrationand see the large image tests now pass