Chinese Character Parser

Overview

This Python tool analyzes Chinese characters from a text file, utilizing the Unihan database for character metadata and hanzipy for decomposition into radical components. It retrieves details such as Mandarin pronunciation and definitions, and decomposes characters into their graphical and radical parts, exporting the results in a CSV format for easy analysis or integration with other data handling applications.

Features

Load Chinese characters from a simple text file.
Fetch character information from a local JSON formatted Unihan database.
Decompose characters into radical and graphical components using hanzipy.
Output the information into a structured CSV file with detailed annotations for each component.

Installation

To get started with the Chinese Character Parser, follow these steps:

Clone the repository:

git clone https://github.com/M3C3I/ChineseCharacterParser.git

Install the required dependencies:
```
pip install hanzipy
```

Usage

To run the program, ensure you have a text file named input.txt in the same directory as the script, with one Chinese character per line. Then execute the script:

python ChineseCharacterParser.py

The output will be a CSV file named output.csv containing detailed character analysis.

Input File Format

The input.txt file should contain one Chinese character per line, like this:

爱
橄
黃

Output Format

The output.csv will contain the following columns:

Number: Unique identifier for each character (e.g., 1, 1a, 1b, etc.)
Character: The Chinese character being analyzed.
Mandarin: Mandarin pronunciation of the character.
Definition: Definition of the character.
PrimaryRadical: The primary radical of the character.
RadicalMandarin: Mandarin pronunciation of the primary radical.
HanzipyStrokes: Graphical components of the character as determined by hanzipy.
HanzipyRadicals: Detailed radical components, each in a new line with its details.

Contributing

Contributions to the Chinese Character Parser are welcome! Please feel free to fork the repository, make changes, and submit pull requests.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
ChineseCharacterParser.py		ChineseCharacterParser.py
README.MD		README.MD
input.txt		input.txt
output.csv		output.csv
unihanLite.json		unihanLite.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Chinese Character Parser

Overview

Features

Installation

Usage

Input File Format

Output Format

Contributing

About

Uh oh!

Releases

Packages

Languages

M3C3I/ChineseCharacterParser

Folders and files

Latest commit

History

Repository files navigation

Chinese Character Parser

Overview

Features

Installation

Usage

Input File Format

Output Format

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages