Add Mixed Encoding Detection and Conversion Support - Resolves #25 #91

Egor-OSSRevival · 2025-09-01T20:36:08Z

Pull Request Description

Overview

This PR adds mixed encoding support to enca, resolving issue #25 where files with multiple encodings (e.g., GB2312 + UTF-8) could not be processed.

Features

Mixed Encoding Detection (-M / --mixed-encodings)
Detects multiple encodings within one file, reports segments with offsets and lengths.
Configurable Buffer Size (-B / --mixed-buffer-size)
Default 1024 bytes, range 1–1048576. Smaller = finer detection, larger = faster.
Error Handling (-I / --mixed-ignore-errors)
Skips corrupted/unknown segments, falls back to predominant encoding.
Mixed Encoding Conversion (-x with -M)
Converts each segment individually while preserving file integrity.

Usage

# Detect mixed encodings
enca -L pl -M mixed_file.txt

# Convert to UTF-8
enca -L pl -M -x utf8 mixed_file.txt

# Fine-tuned with buffer and error handling
enca -L pl -M -B 256 -I -x utf8 mixed_file.txt

Implementation

Chunk-based analysis with segment merging
Predominant encoding fallback
Integrated with existing conversion system (iconv/recode/internal)
Verbose logging for detailed progress

Documentation

Updated man page and CLI help with examples
Resolves issue can't convert mixed encode file #25: mixed encoding files could not be converted

…d help documentation

…erbosity logging

Egor-OSSRevival added 4 commits August 23, 2025 20:56

Add support for mixed encodings processing in command line options an…

1a3fe40

…d help documentation

Implement mixed encoding conversion with iconv support and enhanced v…

b53a65e

…erbosity logging

Add mixed encoding support with configurable options and error handling

a23b9a3

Merge branch 'nijel:master' into t25

633f5cd

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Mixed Encoding Detection and Conversion Support - Resolves #25 #91

Add Mixed Encoding Detection and Conversion Support - Resolves #25 #91

Uh oh!

Egor-OSSRevival commented Sep 1, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Add Mixed Encoding Detection and Conversion Support - Resolves #25 #91

Are you sure you want to change the base?

Add Mixed Encoding Detection and Conversion Support - Resolves #25 #91

Uh oh!

Conversation

Egor-OSSRevival commented Sep 1, 2025

Pull Request Description

Overview

Features

Usage

Implementation

Documentation

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant