bruce2233 · bruce2233 · Jan 31, 2026 · Jan 31, 2026
diff --git a/index.html b/index.html
@@ -103,6 +103,25 @@ <h2>快速入口</h2>
 
     <div id="paper-list" class="paper-grid">
       <a class="paper-card" href="/attention-is-all-you-need" data-title="Attention Is All You Need 注意力机制即你所需" data-tags="transformer attention machine translation encoder-decoder self-attention" data-arxiv="1706.03762">
+      <a class="paper-card" href="/https-arxiv-org-abs-2511-13719" data-title="Doc-Researcher: A Unified System for Multimodal Document Parsing and Deep Research" data-tags="multimodal-parsing hybrid-retrieval deep-research multi-agent benchmarks M4DocBench" data-arxiv="2510.21603">
+        <h3>
+          <span class="lang" data-lang="en">Doc-Researcher</span>
+          <span class="lang" data-lang="zh" lang="zh-Hans">Doc-Researcher</span>
+        </h3>
+        <div class="level" data-level="hs">
+          <span class="lang" data-lang="en">AI expert that reads charts, tables, and layouts like a human for complex documents.</span>
+          <span class="lang" data-lang="zh" lang="zh-Hans">像人类专家一样阅读图表、表格与布局，解决复杂文档研究任务。</span>
+        </div>
+        <div class="level" data-level="grad">
+          <span class="lang" data-lang="en">Deep multimodal parsing + hybrid retrieval paradigms + iterative multi-agent workflows; 50.6% on M4DocBench.</span>
+          <span class="lang" data-lang="zh" lang="zh-Hans">深度多模态解析 + 混合检索范式 + 迭代多代理流；在 M4DocBench 取得 50.6% 准确率。</span>
+        </div>
+        <div class="pill-row">
+          <span class="pill">arXiv 2510.21603</span>
+          <span class="pill">Multimodal</span>
+          <span class="pill">Agents</span>
+        </div>
+      </a>
         <h3>
           <span class="lang" data-lang="en">Attention Is All You Need</span>
           <span class="lang" data-lang="zh" lang="zh-Hans">注意力机制即你所需</span>

diff --git a/papers/https-arxiv-org-abs-2511-13719/grad-en.html b/papers/https-arxiv-org-abs-2511-13719/grad-en.html
@@ -0,0 +1,88 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1" />
+  <title>Doc-Researcher (Grad-EN) | WAP</title>
+  <link rel="stylesheet" href="/papers/https-arxiv-org-abs-2511-13719/styles.css" />
+  <link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&display=swap" rel="stylesheet" />
+</head>
+<body>
+  <div class="backdrop" aria-hidden="true"></div>
+  <div class="page">
+    <header class="hero reveal">
+      <div class="eyebrow">GRADUATE EDITION / ENGLISH</div>
+      <h1>Doc-Researcher: Overcoming the Multimodal Processing Bottleneck</h1>
+      <p class="subtitle">A technical deep-dive into deep multimodal parsing, adaptive retrieval, and agentic evidence synthesis.</p>
+    </header>
+
+    <nav class="section-nav reveal">
+      <a href="#motivation">Motivation</a>
+      <a href="#parsing">Deep Parsing</a>
+      <a href="#retrieval">Retrieval Architecture</a>
+      <a href="#agents">Agent Workflows</a>
+      <a href="#bench">M4DocBench</a>
+      <a href="#results">Results</a>
+    </nav>
+
+    <section id="motivation" class="chapter reveal" data-section>
+      <h2>Motivation & Problem Statement</h2>
+      <p>Current "Deep Research" systems (based on LLMs) are largely restricted to text-based web scraping. In professional and scientific domains, knowledge is dense in <strong>highly structured multimodal documents</strong> (PDFs/Scans). Standard RAG (Retrieval-Augmented Generation) pipelines fail here because they often "flatten" the structure, losing vital visual semantics like the relationship between a chart's axes or the hierarchical context of a table.</p>
+    </section>
+
+    <section id="parsing" class="chapter reveal" data-section>
+      <h2>I. Deep Multimodal Parsing</h2>
+      <p>Doc-Researcher employs a parsing engine that preserves <strong>multimodal integrity</strong>. It creates multi-granular representations:</p>
+      <ul>
+        <li><strong>Chunk-level:</strong> Captures local context including equations and inline symbols.</li>
+        <li><strong>Block-level:</strong> Respects logical visual boundaries (e.g., a specific figure with its caption).</li>
+        <li><strong>Document-level:</strong> Maintains layout hierarchy and global semantics.</li>
+      </ul>
+      <div class="callout">Key Innovation: The system maps visual elements to text descriptions while keeping the original pixel features for vision-centric retrieval.</div>
+    </section>
+
+    <section id="retrieval" class="chapter reveal" data-section>
+      <h2>II. Systematic Hybrid Retrieval</h2>
+      <p>The system utilizes an architecture that supports three paradigms:</p>
+      <ol>
+        <li><strong>Text-only:</strong> Standard semantic search on text chunks.</li>
+        <li><strong>Vision-only:</strong> Directly retrieving document segments based on visual similarity.</li>
+        <li><strong>Hybrid:</strong> Combining text and vision signals with <em>dynamic granularity selection</em>—choosing between fine-grained chunks or broader document context based on query ambiguity.</li>
+      </ol>
+    </section>
+
+    <section id="agents" class="chapter reveal" data-section>
+      <h2>III. Iterative Multi-Agent Workflows</h2>
+      <p>Unlike single-pass retrieval, Doc-Researcher uses an agentic loop:</p>
+      <ul>
+        <li><strong>Planner:</strong> Decomposes complex, multi-hop queries into sub-tasks.</li>
+        <li><strong>Searcher:</strong> Executes the hybrid retrieval to find candidates.</li>
+        <li><strong>Refiner:</strong> Evaluates retrieved evidence and decides if more searching is needed (iterative accumulation).</li>
+        <li><strong>Synthesizer:</strong> Integrates multimodal evidence to form a final, cited answer.</li>
+      </ul>
+    </section>
+
+    <section id="bench" class="chapter reveal" data-section>
+      <h2>M4DocBench & Evaluation</h2>
+      <p>To evaluate these capabilities, the authors introduced <strong>M4DocBench</strong> (Multi-modal, Multi-hop, Multi-document, and Multi-turn). It consists of 158 expert-level questions spanning 304 documents. This benchmark requires the model to "connect the dots" across multiple files and modalities.</p>
+    </section>
+
+    <section id="results" class="chapter reveal" data-section>
+      <h2>Experimental Outcomes</h2>
+      <div class="highlight-row">
+        <div class="highlight-card">
+          <strong>Direct Comparison</strong>
+          <div>50.6% accuracy vs. ~15% for state-of-the-art baselines (3.4x improvement).</div>
+        </div>
+        <div class="highlight-card">
+          <strong>Ablation</strong>
+          <div>Removing the "Visual Semantics" component caused the largest performance drop, proving layout matters.</div>
+        </div>
+      </div>
+    </section>
+
+    <footer class="footer">WAP - Academic rigor for deep documents.</footer>
+  </div>
+  <script src="/papers/https-arxiv-org-abs-2511-13719/script.js"></script>
+</body>
+</html>
diff --git a/papers/https-arxiv-org-abs-2511-13719/grad-zh.html b/papers/https-arxiv-org-abs-2511-13719/grad-zh.html
@@ -0,0 +1,88 @@
+<!doctype html>
+<html lang="zh-CN">
+<head>
+  <meta charset="utf-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1" />
+  <title>Doc-Researcher (研究生版) | WAP</title>
+  <link rel="stylesheet" href="/papers/https-arxiv-org-abs-2511-13719/styles.css" />
+  <link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@300;400;500;700&display=swap" rel="stylesheet" />
+</head>
+<body>
+  <div class="backdrop" aria-hidden="true"></div>
+  <div class="page">
+    <header class="hero reveal">
+      <div class="eyebrow">学术 / 研究生版</div>
+      <h1>Doc-Researcher：破解复杂文档多模态处理的瓶颈</h1>
+      <p class="subtitle">技术深潜：深度多模态解析、自适应检索与代理式证据合成。</p>
+    </header>
+
+    <nav class="section-nav reveal">
+      <a href="#motivation">动机</a>
+      <a href="#parsing">深度解析</a>
+      <a href="#retrieval">检索架构</a>
+      <a href="#agents">代理流</a>
+      <a href="#bench">评测体系</a>
+      <a href="#results">实验结果</a>
+    </nav>
+
+    <section id="motivation" class="chapter reveal" data-section>
+      <h2>研究动机与问题定义</h2>
+      <p>当前的“深度研究 (Deep Research)”系统（如基于 LLM 的系统）主要局限于文本类 Web 数据。在专业领域，核心知识往往以<strong>高度结构化的多模态文档</strong>（PDF/扫描件）形式存在。传统的 RAG（检索增强生成）流程在这种场景下通常会失效，因为它们将文档“扁平化”，丢失了图表轴线、视觉层次或表格嵌套关系等关键视觉语义。</p>
+    </section>
+
+    <section id="parsing" class="chapter reveal" data-section>
+      <h2>一、深度多模态解析引擎</h2>
+      <p>Doc-Researcher 采用了一种能够保持<strong>多模态完整性</strong>的解析引擎。它建立了多层级的表示体系：</p>
+      <ul>
+        <li><strong>块级 (Chunk-level)：</strong> 捕捉局部上下文，包括行内公式和数学符号。</li>
+        <li><strong>模块级 (Block-level)：</strong> 遵循逻辑视觉边界（例如带有标题的特定图表）。</li>
+        <li><strong>文档级 (Document-level)：</strong> 维护全局的排版结构与语义。</li>
+      </ul>
+      <div class="callout">核心创新：该系统将视觉元素映射到文本描述，同时保留原始像素特征，用于视觉中心路径的检索。</div>
+    </section>
+
+    <section id="retrieval" class="chapter reveal" data-section>
+      <h2>二、系统化的混合检索架构</h2>
+      <p>Doc-Researcher 支持三种检索范式：</p>
+      <ol>
+        <li><strong>纯文本检索 (Text-only)：</strong> 对文本块执行标准语义搜索。</li>
+        <li><strong>纯视觉检索 (Vision-only)：</strong> 基于视觉相似度直接检索文档区域。</li>
+        <li><strong>混合检索 (Hybrid)：</strong> 结合文本与视觉信号，并具备<em>动态粒度选择</em>能力——根据查询的模糊性在细粒度块或宏观文档上下文中自动切换。</li>
+      </ol>
+    </section>
+
+    <section id="agents" class="chapter reveal" data-section>
+      <h2>三、迭代多智能体工作流</h2>
+      <p>不同于单次检索，Doc-Researcher 引入了代理循环：</p>
+      <ul>
+        <li><strong>规划者 (Planner)：</strong> 将复杂的多跳查询拆分为子任务。</li>
+        <li><strong>搜寻者 (Searcher)：</strong> 执行混合检索寻找候选证据。</li>
+        <li><strong>精炼者 (Refiner)：</strong> 评估检索证据，决定是否需要继续搜索（迭代式累计）。</li>
+        <li><strong>合成者 (Synthesizer)：</strong> 整合多模态证据，生成带有引用的最终答案。</li>
+      </ul>
+    </section>
+
+    <section id="bench" class="chapter reveal" data-section>
+      <h2>M4DocBench 高难度评测</h2>
+      <p>为了全面评估上述能力，作者提出了 <strong>M4DocBench</strong>（多模态、多跳、多文档、多轮对话）。它包含由专家标注的 158 个高难度问题，涉及 304 份复杂文档。该基准要求模型能够跨文件、跨模态“连接线索”。</p>
+    </section>
+
+    <section id="results" class="chapter reveal" data-section>
+      <h2>实验表现</h2>
+      <div class="highlight-row">
+        <div class="highlight-card">
+          <strong>直接对比</strong>
+          <div>Doc-Researcher 准确率达到 50.6%，比目前最先进的基准系统（~15%）高出 3.4 倍。</div>
+        </div>
+        <div class="highlight-card">
+          <strong>消融实验</strong>
+          <div>移除“视觉语义”组件导致性能跌幅最大，证明了布局信息在文档理解中的核心地位。</div>
+        </div>
+      </div>
+    </section>
+
+    <footer class="footer">WAP - 为深度文档研究提供严谨洞察。</footer>
+  </div>
+  <script src="/papers/https-arxiv-org-abs-2511-13719/script.js"></script>
+</body>
+</html>
diff --git a/papers/https-arxiv-org-abs-2511-13719/hs-en.html b/papers/https-arxiv-org-abs-2511-13719/hs-en.html
@@ -0,0 +1,67 @@
+<!doctype html>
+<html lang="en">
+<head>
+  <meta charset="utf-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1" />
+  <title>Doc-Researcher (HS-EN) | WAP</title>
+  <link rel="stylesheet" href="/papers/https-arxiv-org-abs-2511-13719/styles.css" />
+  <link href="https://fonts.googleapis.com/css2?family=Space+Grotesk:wght@400;500;600;700&display=swap" rel="stylesheet" />
+</head>
+<body>
+  <div class="backdrop" aria-hidden="true"></div>
+  <div class="page">
+    <header class="hero reveal">
+      <div class="eyebrow">HIGH SCHOOL EDITION / ENGLISH</div>
+      <h1>How AI Reads Complex Documents: Doc-Researcher</h1>
+      <p class="subtitle">Most AI systems only "read" text. Doc-Researcher is a new system that actually understands charts, tables, and layouts like a human expert does.</p>
+    </header>
+
+    <nav class="section-nav reveal">
+      <a href="#problem">The Problem</a>
+      <a href="#how">How it Works</a>
+      <a href="#test">Testing It</a>
+      <a href="#grad">Go Deeper</a>
+    </nav>
+
+    <section id="problem" class="chapter reveal" data-section>
+      <h2>The "Wall" for Traditional AI</h2>
+      <p>Imagine asking an AI to analyze a 50-page financial report or a scientific paper. Most current AIs can grab the text, but they get confused by complex layout diagrams, math equations, or data hidden in tables. They treat everything like a flat block of words, missing the "visual language" of the document.</p>
+      <div class="callout">The Gap: AI has been "blind" to the visual structure and multimodal data (images + text) inside documents.</div>
+    </section>
+
+    <section id="#how" class="chapter reveal" data-section>
+      <h2>The Doc-Researcher Solution</h2>
+      <p>The researchers created a three-step brain for the AI:</p>
+      <div class="highlight-row">
+        <div class="highlight-card">
+          <strong>1. Smart Parsing</strong>
+          <div>It doesn't just copy text; it sees where every chart and table is, preserving its meaning.</div>
+        </div>
+        <div class="highlight-card">
+          <strong>2. Hybrid Search</strong>
+          <div>It can look for things by text descriptions or by visual looks, picking the best way to find evidence.</div>
+        </div>
+        <div class="highlight-card">
+          <strong>3. Teamwork Agents</strong>
+          <div>Instead of one try, it uses several "AI agents" that brainstorm, look for more clues, and combine them into a final answer.</div>
+        </div>
+      </div>
+    </section>
+
+    <section id="test" class="chapter reveal" data-section>
+      <h2>Real-World Results</h2>
+      <p>The team created a new test called <strong>M4DocBench</strong>. It has 158 very hard questions that requires "jumping" between different documents and looking at pictures to find the answer.</p>
+      <div class="callout">Doc-Researcher got 50.6% accuracy, which is 3.4 times better than previous top-tier AI systems!</div>
+    </section>
+
+    <section id="grad" class="chapter reveal" data-section>
+      <h2>Curious about the math and logic?</h2>
+      <p>If you want to see the specific technical architecture and deep data science behind this, check out the Graduate version.</p>
+      <a href="/papers/https-arxiv-org-abs-2511-13719/grad-en.html" class="btn primary">View Graduate Version (EN)</a>
+    </section>
+
+    <footer class="footer">WAP - Simplified paper insights.</footer>
+  </div>
+  <script src="/papers/https-arxiv-org-abs-2511-13719/script.js"></script>
+</body>
+</html>
diff --git a/papers/https-arxiv-org-abs-2511-13719/hs-zh.html b/papers/https-arxiv-org-abs-2511-13719/hs-zh.html
@@ -0,0 +1,67 @@
+<!doctype html>
+<html lang="zh-CN">
+<head>
+  <meta charset="utf-8" />
+  <meta name="viewport" content="width=device-width, initial-scale=1" />
+  <title>Doc-Researcher (高中版) | WAP</title>
+  <link rel="stylesheet" href="/papers/https-arxiv-org-abs-2511-13719/styles.css" />
+  <link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@300;400;500;700&display=swap" rel="stylesheet" />
+</head>
+<body>
+  <div class="backdrop" aria-hidden="true"></div>
+  <div class="page">
+    <header class="hero reveal">
+      <div class="eyebrow">科普 / 高中版</div>
+      <h1>AI 如何阅读复杂的文档：Doc-Researcher 详解</h1>
+      <p class="subtitle">大多数 AI 系统只能“读文字”。Doc-Researcher 却像人类专家一样，能够读懂图表、表格和文档布局。</p>
+    </header>
+
+    <nav class="section-nav reveal">
+      <a href="#problem">现状与挑战</a>
+      <a href="#how">它是如何工作的</a>
+      <a href="#test">测试结果</a>
+      <a href="#grad">深入研究</a>
+    </nav>
+
+    <section id="problem" class="chapter reveal" data-section>
+      <h2>传统 AI 的“盲区”</h2>
+      <p>想象一下，让你分析一份 50 页的财务报告或一篇科学论文。大多数 AI 只能提取其中的文本，但当遇到复杂的结构图、数学公式或隐藏在表格中的数据时，它们就会感到困惑。由于丢失了图片和布局信息，AI 无法从真正专业的文档中获取深层知识。</p>
+      <div class="callout">关键缺失：AI 以前由于无法“看懂”图片的视觉结构，导致在处理复杂文档时存在巨大盲区。</div>
+    </section>
+
+    <section id="how" class="chapter reveal" data-section>
+      <h2>Doc-Researcher 的解决方案</h2>
+      <p>研究人员为 AI 打造了三个关键组件：</p>
+      <div class="highlight-row">
+        <div class="highlight-card">
+          <strong>1. 深度多模态解析</strong>
+          <div>它不只是复制文字，而是会识别每个图表和表格的位置，保存它们的视觉含义。</div>
+        </div>
+        <div class="highlight-card">
+          <strong>2. 混合式搜索</strong>
+          <div>它既可以通过文字描述来搜索，也可以通过视觉特征来寻找证据，从而选择最佳路径。</div>
+        </div>
+        <div class="highlight-card">
+          <strong>3. 迭代协作流</strong>
+          <div>它使用多个“AI 智能体”进行团队协作：有的负责拆解问题，有的负责寻找证据，最后合并成完整答案。</div>
+        </div>
+      </div>
+    </section>
+
+    <section id="test" class="chapter reveal" data-section>
+      <h2>真实表现如何？</h2>
+      <p>团队创建了一个名为 <strong>M4DocBench</strong> 的新测试，包含 158 个非常困难的问题，这些问题需要 AI 在多个文档之间“跳转”并查看图片才能回答。</p>
+      <div class="callout">Doc-Researcher 的准确率达到了 50.6%，比之前最先进的 AI 系统提高了 3.4 倍！</div>
+    </section>
+
+    <section id="grad" class="chapter reveal" data-section>
+      <h2>想要了解更深层的逻辑？</h2>
+      <p>如果你想了解这背后的具体架构和深层数据科学，请查看研究生版本。</p>
+      <a href="/papers/https-arxiv-org-abs-2511-13719/grad-zh.html" class="btn primary">查看研究生版本 (中文)</a>
+    </section>
+
+    <footer class="footer">WAP - 让科学论文通俗易懂。</footer>
+  </div>
+  <script src="/papers/https-arxiv-org-abs-2511-13719/script.js"></script>
+</body>
+</html>