feat: read and write languageDirverId, add LanguageDriverIdToCodepage record #91

paulish · 2024-09-18T12:29:41Z

This commit allows to read languageDriverId from the header field and choose appropriate codepage to perform character conversion. I added LanguageDriverIdToCodepage record which is taken from the MS format explanation page.

… record

yortus · 2024-09-21T09:36:08Z

Hi @paulish, thanks for the PR! This is interesting because there is already support for reading and writing dbf files with different character encodings, including the code pages at the link you have referenced. It's done through the encoding option, which is more general and quite flexible because (a) dbase files don't have that header byte even though they do use various encodings and (b) some files are not conformant - e.g. they use different code pages for different fields, or the field names use a different code page from the field values.

Having said that, one thing you have in this PR that isn't currently implemented in this library is reading/writing the FoxPro "code page mark" / "code page id" (what you have called Language Driver ID). That would be a good way to try to get the encoding from the file itself without having to specify the encoding option separately when opening the file.

I'd be interested in keeping the code to read/write the "code page mark" from files, but with the following changes from the PR as it is currently:

Move the let/const changes to a separate commit (or remove them) as those changes are unrelated and make it difficult to see the actual changes being proposed in this PR.
Only read/write the code page mark for FoxPro versions that support it (dbase doesn't, at least in principle)
If the code page mark is present in the file and no options.encoding was given, use the code page mark to determine the encoding to use for the file.
When writing a dbffile, determine the code page mark from the encoding rather than having a separate options.languageDriverId option, since they do the same thing so we don't need both options. The encoding option is more general so that's the one I'd keep.

feat: read and write languageDirverId, add LanguageDriverIdToCodepage…

09c9c2a

… record

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: read and write languageDirverId, add LanguageDriverIdToCodepage record #91

feat: read and write languageDirverId, add LanguageDriverIdToCodepage record #91

Uh oh!

paulish commented Sep 18, 2024

Uh oh!

yortus commented Sep 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: read and write languageDirverId, add LanguageDriverIdToCodepage record #91

Are you sure you want to change the base?

feat: read and write languageDirverId, add LanguageDriverIdToCodepage record #91

Uh oh!

Conversation

paulish commented Sep 18, 2024

Uh oh!

yortus commented Sep 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants