forked from BYVoid/OpenCC
-
Notifications
You must be signed in to change notification settings - Fork 0
Add 臺 -> 台 to TWVariants.txt. Avoid converting "台湾" to "臺灣" in s2tw/s2twp to match the behaviors of s2hk
#8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
frankslin
wants to merge
13
commits into
master
Choose a base branch
from
fix/taiwan-variants
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
a5be725 to
954cdb9
Compare
* Add WASM demo scaffold and project notes * Add OpenCC WASM demo with converter UI and test runner - 补充 WASM 编译结果在前端 JS 中的用法 * Polish WASM demo UI and paths, run tests, and streamline converter export * Add wasm-based OpenCC package and update demo to consume it * Add wasm-based OpenCC package, static demo bundle, and benchmarking page * Add copyright notice and LICENSE
…eparation This commit enhances the opencc-wasm library with TypeScript support and implements a cleaner build architecture with semantic separation between intermediate build artifacts and publishable distribution. TypeScript Support: - Add comprehensive type definitions (index.d.ts) with full JSDoc documentation - Define interfaces: ConverterOptions, ConverterFunction, OpenCCNamespace, etc. - Provide complete type safety for better IDE support and developer experience Build Architecture Redesign (semantic separation): - build/ - Intermediate WASM artifacts (gitignored, for tests/development) * build/opencc-wasm.esm.js - ESM WASM glue * build/opencc-wasm.cjs - CJS WASM glue * build/opencc-wasm.wasm - WASM binary - dist/ - Publishable distribution (committed, for npm) * dist/esm/ - ESM package entry * dist/cjs/ - CJS package entry * dist/data/ - OpenCC config and dictionary files Invariants and Semantics: - Tests import source (index.js) → loads from build/ - Published package exports dist/ only - build/ = internal intermediate artifacts - dist/ = publishable artifacts - Clear separation ensures tests validate actual build output Enhanced .gitignore: - Add build/ to gitignore (intermediate artifacts) - Add node_modules/, logs, OS-specific files (.DS_Store, Thumbs.db) - Exclude editor configurations (.vscode/, .idea/) - Add cache and temporary file exclusions Two-Stage Build Process: Stage 1 (build.sh): - Compiles C++ to WASM using Emscripten - Outputs to build/ directory Stage 2 (build-api.js): - Copies WASM artifacts from build/ to dist/ - Transforms source paths for production - Generates API wrappers for ESM and CJS - Copies data files Package Configuration (package.json): - Add "types" field pointing to index.d.ts - Update "main" and "module" to point to API wrappers in dist/ - Add comprehensive "exports" map: * "." - Main API (ESM/CJS wrappers) * "./wasm" - Direct access to WASM glue for advanced users * "./dist/*" - Wildcard for flexible file access - Include LICENSE and NOTICE in published files Documentation: - Add comprehensive README section explaining build architecture - Document project structure with invariants - Explain semantic separation between build/ and dist/ Benefits: - Better TypeScript integration and IDE autocomplete - Cleaner, more maintainable directory structure - Tests validate actual build output, not stale dist files - Clear semantic separation between internal and publishable artifacts - Professional project setup following modern npm best practices - Long-term maintainability through clear invariants
## Summary
- add a `//data/config:config_dict_validation_test` to test dictionaries and configs against a `testcases.json` file
- switch all CLI/Python/Node tests to consume `testcases.json` as the single source of truth; drop `.in/.ans` dependencies and adjust Bazel/CMake wiring
- streamline dictionary build outputs (no standalone `TWPhrases{IT,Name,Other}.ocd2`) and align DictionaryTest with the actual generated dict set
- add maintenance helpers (refresh_assets.sh cleanup and fix, rapidjson dep/path for CLI test) and keep wasm assets in sync via `testcases.json`
## Testing
- bazel test //data/dictionary:dictionary_test
- bazel test //test:command_line_converter_test
- bazel test //python/tests:test_opencc
- node/test.js (sync/async/promise) using updated testcases.json
----
* feature: add a new ConfigDictValidationTest.cpp to be executed in bazel
* Changeover to JSON-based testcases and clean dictionary outputs
- Switch all tests (C++ CLI, Python, Node) to consume `testcases.json` and drop `.in`/`.ans` dependencies; keep filegroup for the JSON.
- Prune TWPhrases sub-dictionary artifacts and align DictionaryTest to current generated dict set.
- Add rapidjson dep/path for CLI test, refresh_assets script fixes, and keep Bazel Python toolchain note.
* Normalize CommandLineConvertTest for CRLF comparisons on Windows
* Address review feedback for tests and Bazel-only validation
- Rename and guard streams in CommandLineConvertTest; ensure input file opens and normalize CRLF.
- Fix node test promise handling to propagate errors correctly.
- Mark ConfigDictValidationTest as Bazel-only to skip CMake builds.
…cases.json (#10) - add refresh_assets.sh to rebuild/copy only config-referenced .ocd2 files and testcases.json - convert wasm-lib tests to consume the new `{cases:[...]}` JSON format - update bundled .ocd2 dictionaries and testcases.json fixtures ---- * wasm-lib: refresh assets script and switch tests to consolidated testcases.json - add refresh_assets.sh to rebuild/copy only config-referenced .ocd2 files and testcases.json - convert wasm-lib tests to consume the new `{cases:[...]}` JSON format - update bundled .ocd2 dictionaries and testcases.json fixtures * Rebuild the wasm-lib and update the documentations
新增完整的貢獻指南文檔,包含: - 如何新增詞典條目(強調使用 Tab 字元分隔) - 如何使用排序工具確保詞典正確排序 - 如何安裝 Bazel 並執行測試 - 如何撰寫測試案例(測試驅動開發流程) - 簡轉繁轉換的特殊注意事項(需測試多個配置) 使用台灣繁體中文撰寫。
1. 新增演算法與理論局限性分析文件 - 詳細說明最大正向匹配分詞演算法 - 分析轉換鏈機制與詞典系統 - 探討理論局限性(一對多歧義、缺乏上下文理解、維護負擔) - 與現代方法(統計模型、神經網路)的比較 2. 更新 AGENTS.md - 新增「延伸閱讀」章節 - 連結到技術文件和貢獻指南 3. 新增 Claude Code 配置 - .claude/hooks/session_start.sh - 會話啟動時顯示專案資訊 - .claude/skills/opencc-dict-edit.md - 詞典編輯技能 - .claude/skills/opencc-algorithm-explain.md - 演算法解釋技能 這些配置幫助 AI 代理更好地理解 OpenCC 專案架構與開發流程。
Avoid converting "台湾" to "臺灣" in `s2tw`/`s2twp` to match the behaviors of `s2hk`.
954cdb9 to
8e3bf3b
Compare
s2tw/s2twp to match the behaviors of s2hk
ff66547 to
f7dbca0
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BYVoid#1001