Skip to content

Conversation

@frankslin
Copy link
Owner

frankslin and others added 9 commits January 3, 2026 07:27
* Add WASM demo scaffold and project notes
* Add OpenCC WASM demo with converter UI and test runner
  - 补充 WASM 编译结果在前端 JS 中的用法
* Polish WASM demo UI and paths, run tests, and streamline converter export
* Add wasm-based OpenCC package and update demo to consume it
* Add wasm-based OpenCC package, static demo bundle, and benchmarking page
* Add copyright notice and LICENSE
…eparation

This commit enhances the opencc-wasm library with TypeScript support and
implements a cleaner build architecture with semantic separation between
intermediate build artifacts and publishable distribution.

TypeScript Support:
- Add comprehensive type definitions (index.d.ts) with full JSDoc documentation
- Define interfaces: ConverterOptions, ConverterFunction, OpenCCNamespace, etc.
- Provide complete type safety for better IDE support and developer experience

Build Architecture Redesign (semantic separation):
- build/ - Intermediate WASM artifacts (gitignored, for tests/development)
  * build/opencc-wasm.esm.js - ESM WASM glue
  * build/opencc-wasm.cjs - CJS WASM glue
  * build/opencc-wasm.wasm - WASM binary
- dist/ - Publishable distribution (committed, for npm)
  * dist/esm/ - ESM package entry
  * dist/cjs/ - CJS package entry
  * dist/data/ - OpenCC config and dictionary files

Invariants and Semantics:
- Tests import source (index.js) → loads from build/
- Published package exports dist/ only
- build/ = internal intermediate artifacts
- dist/ = publishable artifacts
- Clear separation ensures tests validate actual build output

Enhanced .gitignore:
- Add build/ to gitignore (intermediate artifacts)
- Add node_modules/, logs, OS-specific files (.DS_Store, Thumbs.db)
- Exclude editor configurations (.vscode/, .idea/)
- Add cache and temporary file exclusions

Two-Stage Build Process:
Stage 1 (build.sh):
  - Compiles C++ to WASM using Emscripten
  - Outputs to build/ directory

Stage 2 (build-api.js):
  - Copies WASM artifacts from build/ to dist/
  - Transforms source paths for production
  - Generates API wrappers for ESM and CJS
  - Copies data files

Package Configuration (package.json):
- Add "types" field pointing to index.d.ts
- Update "main" and "module" to point to API wrappers in dist/
- Add comprehensive "exports" map:
  * "." - Main API (ESM/CJS wrappers)
  * "./wasm" - Direct access to WASM glue for advanced users
  * "./dist/*" - Wildcard for flexible file access
- Include LICENSE and NOTICE in published files

Documentation:
- Add comprehensive README section explaining build architecture
- Document project structure with invariants
- Explain semantic separation between build/ and dist/

Benefits:
- Better TypeScript integration and IDE autocomplete
- Cleaner, more maintainable directory structure
- Tests validate actual build output, not stale dist files
- Clear semantic separation between internal and publishable artifacts
- Professional project setup following modern npm best practices
- Long-term maintainability through clear invariants
## Summary
- add a `//data/config:config_dict_validation_test` to test dictionaries and configs against a `testcases.json` file
- switch all CLI/Python/Node tests to consume `testcases.json` as the single source of truth; drop `.in/.ans` dependencies and adjust Bazel/CMake wiring
- streamline dictionary build outputs (no standalone `TWPhrases{IT,Name,Other}.ocd2`) and align DictionaryTest with the actual generated dict set
- add maintenance helpers (refresh_assets.sh cleanup and fix, rapidjson dep/path for CLI test) and keep wasm assets in sync via `testcases.json`

## Testing
- bazel test //data/dictionary:dictionary_test
- bazel test //test:command_line_converter_test
- bazel test //python/tests:test_opencc
- node/test.js (sync/async/promise) using updated testcases.json
----

* feature: add a new ConfigDictValidationTest.cpp to be executed in bazel
* Changeover to JSON-based testcases and clean dictionary outputs
  - Switch all tests (C++ CLI, Python, Node) to consume `testcases.json` and drop `.in`/`.ans` dependencies; keep filegroup for the JSON.
  - Prune TWPhrases sub-dictionary artifacts and align DictionaryTest to current generated dict set.
  - Add rapidjson dep/path for CLI test, refresh_assets script fixes, and keep Bazel Python toolchain note.
* Normalize CommandLineConvertTest for CRLF comparisons on Windows
* Address review feedback for tests and Bazel-only validation
  - Rename and guard streams in CommandLineConvertTest; ensure input file opens and normalize CRLF.
  - Fix node test promise handling to propagate errors correctly.
  - Mark ConfigDictValidationTest as Bazel-only to skip CMake builds.
…cases.json (#10)

- add refresh_assets.sh to rebuild/copy only config-referenced .ocd2 files and testcases.json
- convert wasm-lib tests to consume the new `{cases:[...]}` JSON format
- update bundled .ocd2 dictionaries and testcases.json fixtures

----

* wasm-lib: refresh assets script and switch tests to consolidated testcases.json
  - add refresh_assets.sh to rebuild/copy only config-referenced .ocd2 files and testcases.json
  - convert wasm-lib tests to consume the new `{cases:[...]}` JSON format
  - update bundled .ocd2 dictionaries and testcases.json fixtures
* Rebuild the wasm-lib and update the documentations
claude and others added 4 commits January 2, 2026 23:33
新增完整的貢獻指南文檔,包含:
- 如何新增詞典條目(強調使用 Tab 字元分隔)
- 如何使用排序工具確保詞典正確排序
- 如何安裝 Bazel 並執行測試
- 如何撰寫測試案例(測試驅動開發流程)
- 簡轉繁轉換的特殊注意事項(需測試多個配置)

使用台灣繁體中文撰寫。
1. 新增演算法與理論局限性分析文件
   - 詳細說明最大正向匹配分詞演算法
   - 分析轉換鏈機制與詞典系統
   - 探討理論局限性(一對多歧義、缺乏上下文理解、維護負擔)
   - 與現代方法(統計模型、神經網路)的比較

2. 更新 AGENTS.md
   - 新增「延伸閱讀」章節
   - 連結到技術文件和貢獻指南

3. 新增 Claude Code 配置
   - .claude/hooks/session_start.sh - 會話啟動時顯示專案資訊
   - .claude/skills/opencc-dict-edit.md - 詞典編輯技能
   - .claude/skills/opencc-algorithm-explain.md - 演算法解釋技能

這些配置幫助 AI 代理更好地理解 OpenCC 專案架構與開發流程。
Avoid converting "台湾" to "臺灣" in `s2tw`/`s2twp` to match the behaviors of `s2hk`.
@frankslin frankslin force-pushed the fix/taiwan-variants branch from 954cdb9 to 8e3bf3b Compare January 3, 2026 23:06
@frankslin frankslin changed the title Add 台-> 臺 to TWVariants.txt to make match s2tw/s2twp/s2hk behaviors for this character Add 臺 -> 台 to TWVariants.txt. Avoid converting "台湾" to "臺灣" in s2tw/s2twp to match the behaviors of s2hk Jan 4, 2026
@frankslin frankslin force-pushed the master branch 8 times, most recently from ff66547 to f7dbca0 Compare January 13, 2026 23:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants