Skip to content

Conversation

@macstevens
Copy link

Hello Mr. Balslev,

I have been using OpenXLSX and am happy with it.

I needed to embed images, so I added code for this.

I am new to the github pull request system. If you are open to new code submissions, please advise me how best to prepare my code so that you can most easily review it.

Regards,
Mac Stevens
Newport, Oregon

@aral-matrix
Copy link
Collaborator

Hi & thank you for your contribution,
These days, due to time constraints on @troldal's side, I am answering most of the issues - and currently we are trying to fix a major change to the CMake configuration, at which time we will merge into master. I need @troldal's support for this, and to avoid making the merge into master more complicated, I have currently paused implementation of new features.

However, embedded image support is on my to-do list (ref: #231) and when I get around to it, I will have a look at your implementation and see if it is close enough to the OpenXLSX "way of doing things" that it can be used.

In case we reuse major parts of your implementation, if attribution (beyond reference to the related pull request or issue number) is important to you, I could suggest a form in which to structure the pull request - but generally I try to avoid merging pull requests that are more than trivial changes, because reviewing other people's code is often more work than implementing something myself (plus I can trust myself to not try and introduce malware ;)

@aral-matrix aral-matrix added duplicate This issue or pull request already exists enhancement New feature or request labels Nov 12, 2025
@macstevens
Copy link
Author

I agree! It is often easier to write code than to understand someone else's code. Thanks for all the hard work. I hope my code can be of some help when you write the embedded image feature.

@macstevens
Copy link
Author

I won't bother you much more. But I felt the need to state some features of my implementation:

  1. Memory and file size efficiency. In particular, I mean that if you add multiple instances of the same image in one document, it keeps only one copy of the image data in memory and stores only one copy to disk when saved. Excel does not even do this -- it just duplicates the data.
  2. Read a file, iterate images.
  3. Add and remove images. You can read a file, add or remove images, and save the resulting document. If you remove all the images in a worksheet, it cleans up the XML just as if no images had been added. If you have multiple copies of the same image in a document, as you remove instances of that image, it keeps track, via shared_ptr, how many are in use and cleans up upon document save, as appropriate.
  4. Demo11.cpp and testXLImage.cpp provide many examples and a fairly thorough unit test.

My one worry about my implementation is reliance on a hash function for image equivalence. It will probably never fail, until it does fail at the worst moment.

@aral-matrix
Copy link
Collaborator

You are absolutely not bothering me - those are some good ideas for an implementation.

Maybe we can move these points to Issue #231 ? That way I have them all in one place with the image support on the to-do-list. If you could copy-paste your last comment there, that'd be appreciated. I will accordingly respond there.

Heads-up: I'll be on vacation from this weekend until 2nd week of December, so I may or may not be responsive here.

Quick note (that I'll copy over to issue #231 if you copy your feature suggestions): Image deduplication is a good idea, I will eventually implement something similar for shared strings (requires replacing the in-memory table of strings with a sorted list, and requires an idea for rich text strings), but my general idea for images would be:

  • for programmatically created XLSX workbooks, the user should be well aware if they are inserting duplicates
  • accordingly, similar to XLStyles elements, the API would let the user create an image object, and then reference it in all the cells that they want to - this way avoiding duplication themselves
  • similar to the XLDocument::cleanupSharedStrings function, I would implement a function to be called manually that will de-duplicate the workbook based on image hashes as you suggest

Depending on the hash function used, duplicate hashes for different images can be avoided (likeliness low enough that in reality they will never happen).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

duplicate This issue or pull request already exists enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants