From 1abcdb9282d4c7ba4b4d77276e9cacaa98ed7cf9 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=E5=B2=9B=E7=9F=B3?= <54824693+bruce2233@users.noreply.github.com> Date: Sat, 31 Jan 2026 14:43:37 +0800 Subject: [PATCH] Revert "WAP: generate paper page" --- index.html | 19 - .../grad-en.html | 88 ---- .../grad-zh.html | 88 ---- .../https-arxiv-org-abs-2511-13719/hs-en.html | 67 --- .../https-arxiv-org-abs-2511-13719/hs-zh.html | 67 --- .../https-arxiv-org-abs-2511-13719/index.html | 97 ---- .../https-arxiv-org-abs-2511-13719/script.js | 77 --- .../https-arxiv-org-abs-2511-13719/styles.css | 457 ------------------ 8 files changed, 960 deletions(-) delete mode 100644 papers/https-arxiv-org-abs-2511-13719/grad-en.html delete mode 100644 papers/https-arxiv-org-abs-2511-13719/grad-zh.html delete mode 100644 papers/https-arxiv-org-abs-2511-13719/hs-en.html delete mode 100644 papers/https-arxiv-org-abs-2511-13719/hs-zh.html delete mode 100644 papers/https-arxiv-org-abs-2511-13719/index.html delete mode 100644 papers/https-arxiv-org-abs-2511-13719/script.js delete mode 100644 papers/https-arxiv-org-abs-2511-13719/styles.css diff --git a/index.html b/index.html index 56e7fc2..a5082c9 100644 --- a/index.html +++ b/index.html @@ -103,25 +103,6 @@
A technical deep-dive into deep multimodal parsing, adaptive retrieval, and agentic evidence synthesis.
-Current "Deep Research" systems (based on LLMs) are largely restricted to text-based web scraping. In professional and scientific domains, knowledge is dense in highly structured multimodal documents (PDFs/Scans). Standard RAG (Retrieval-Augmented Generation) pipelines fail here because they often "flatten" the structure, losing vital visual semantics like the relationship between a chart's axes or the hierarchical context of a table.
-Doc-Researcher employs a parsing engine that preserves multimodal integrity. It creates multi-granular representations:
-The system utilizes an architecture that supports three paradigms:
-Unlike single-pass retrieval, Doc-Researcher uses an agentic loop:
-To evaluate these capabilities, the authors introduced M4DocBench (Multi-modal, Multi-hop, Multi-document, and Multi-turn). It consists of 158 expert-level questions spanning 304 documents. This benchmark requires the model to "connect the dots" across multiple files and modalities.
-技术深潜:深度多模态解析、自适应检索与代理式证据合成。
-当前的“深度研究 (Deep Research)”系统(如基于 LLM 的系统)主要局限于文本类 Web 数据。在专业领域,核心知识往往以高度结构化的多模态文档(PDF/扫描件)形式存在。传统的 RAG(检索增强生成)流程在这种场景下通常会失效,因为它们将文档“扁平化”,丢失了图表轴线、视觉层次或表格嵌套关系等关键视觉语义。
-Doc-Researcher 采用了一种能够保持多模态完整性的解析引擎。它建立了多层级的表示体系:
-Doc-Researcher 支持三种检索范式:
-不同于单次检索,Doc-Researcher 引入了代理循环:
-为了全面评估上述能力,作者提出了 M4DocBench(多模态、多跳、多文档、多轮对话)。它包含由专家标注的 158 个高难度问题,涉及 304 份复杂文档。该基准要求模型能够跨文件、跨模态“连接线索”。
-Most AI systems only "read" text. Doc-Researcher is a new system that actually understands charts, tables, and layouts like a human expert does.
-Imagine asking an AI to analyze a 50-page financial report or a scientific paper. Most current AIs can grab the text, but they get confused by complex layout diagrams, math equations, or data hidden in tables. They treat everything like a flat block of words, missing the "visual language" of the document.
-The researchers created a three-step brain for the AI:
-The team created a new test called M4DocBench. It has 158 very hard questions that requires "jumping" between different documents and looking at pictures to find the answer.
-If you want to see the specific technical architecture and deep data science behind this, check out the Graduate version.
- View Graduate Version (EN) -大多数 AI 系统只能“读文字”。Doc-Researcher 却像人类专家一样,能够读懂图表、表格和文档布局。
-想象一下,让你分析一份 50 页的财务报告或一篇科学论文。大多数 AI 只能提取其中的文本,但当遇到复杂的结构图、数学公式或隐藏在表格中的数据时,它们就会感到困惑。由于丢失了图片和布局信息,AI 无法从真正专业的文档中获取深层知识。
-研究人员为 AI 打造了三个关键组件:
-团队创建了一个名为 M4DocBench 的新测试,包含 158 个非常困难的问题,这些问题需要 AI 在多个文档之间“跳转”并查看图片才能回答。
-如果你想了解这背后的具体架构和深层数据科学,请查看研究生版本。
- 查看研究生版本 (中文) -A groundbreaking system that solves complex research queries by deeply parsing multimodal documents (figures, tables, charts) and using iterative agent workflows.
- -