Skip to content

Conversation

@polyfloyd
Copy link

Hi!

I am currently working on migrating my accounting to Beancount and am encountering a few things that I think could be improved here and there.

One such thing is the caching provided by beangulp. My usecase is to convert PDFs, but also run them through an LLM to stamp out some interesting information, the lather can take quite a bit of time.

Initially, I started out with beangulp.cache, but at some point had to make a round-trip to the source code to find out something that was clear to me. Then I found out that beangulp.cache is actually deprecated and beangulp.simple_cache succeeded it. But simple_cache was not as ergonomic to use as the first and also not released yet, so I decided on using cachier instead.

That left me wondering whether caching should be provided by beangulp at all. Not including a cache implementation leaves room to focus on the core domain of interfacing with Beancount.

This MR proposes to use cachier in the included example. A followup step would be deprecating all caching interfaces.

Let me know what you think of this :)

@dnicolodi
Copy link
Collaborator

I don't know the reason for beangulp.simple_cache to exist. It seems to me that @blain forgot that the beangulp.cache.cache() decorator exist and added simple_cache because if was easier to have an LLM to spit it out that learning how the existing facility works. The beangulp.cache.cache() decorator is not deprecated, the other stuff in beangulp.cache is. Eventually we should merge the improvements implemented in simple_cache into the cache() decorator and deprecate the former, but I haven't looked in detail to what the latter does. Adding a dependency on cachier for such a simple task seems overkill, but of course you can use it if you like.

@blais
Copy link
Member

blais commented Jan 8, 2026

totally possible indeed

@polyfloyd
Copy link
Author

It's a simple task, but it is worth doing well. I think delegating this responsibility to an external package that is dedicated to this task would achieve this. Especially if the included cache implementation is something half-baked by an LLM...

I do not intend to introduce cachier as an actual dependency of beangulp. Just show it in the examples so users can decide for themselves. The example that uses a cache is extracting text from a PDF file which is not too demanding so it could maybe even be left out altogether?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants