Skip to content

A metadata extraction and transformation tool to filter and harmonize bibliometric records from Engineering Village, Scopus, and Web of Science based on BibTeX exports

License

Notifications You must be signed in to change notification settings

GarGarfie/CIDMET

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

CIDMET - Citation Index Database Metadata Extraction and Transformation

本项目提供一个本地 GUI 工具(Qt6 / PySide6),用于根据 Zotero 导出的 BibTeX 目标清单,从 Web of Science / Scopus / Engineering Village 导出文件中筛选对应文献,并按原数据库导出格式输出子集文件。

功能

  • 输入 BibTeX 目标文献。
  • 支持数据库输入:
    • WoS: .xls/.xlsx.txt
    • Scopus: .csv.txt
    • EI(Engineering Village): .csv.txt
  • 匹配优先级:
    1. DOI 精确匹配(标准化 DOI 前缀)
    2. 标题规范化后精确匹配
    3. 标题模糊匹配 + 年份 + 第一作者/前两作者(RapidFuzz,阈值可调)
  • 缺失字段自动降级匹配并记录依据。
  • 同 DOI/标题重复命中全部保留并标记。
  • 输出文件尽可能保持原格式风格:
    • CSV 保留分隔符/引号/编码(检测失败回退 UTF-8-sig)
    • XLS 优先输出 XLS(失败回退 XLSX 并记录)
    • TXT 基于记录块筛选,原样拼接,不重排字段布局
  • GUI 展示匹配报告、进度条、日志,并导出报告为 CSV/XLSX。

安装

pip install -r requirements.txt

运行

python -m src.main

或:

python src/main.py

代码结构

  • src/cidmet/normalize.py:DOI/标题/年份/作者规范化
  • src/cidmet/bibtex_loader.py:BibTeX 读取
  • src/cidmet/io_handlers.py:CSV/XLS(X)/TXT 读取与格式保持写回
  • src/cidmet/matcher.py:多级匹配策略
  • src/cidmet/processor.py:任务编排与报告导出
  • src/cidmet/gui.py:Qt6 图形界面与多线程执行

About

A metadata extraction and transformation tool to filter and harmonize bibliometric records from Engineering Village, Scopus, and Web of Science based on BibTeX exports

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages