Skip to content

Conversation

@prinskumar-tigergraph
Copy link
Collaborator

@prinskumar-tigergraph prinskumar-tigergraph commented Nov 19, 2025

PR Type

Enhancement, Documentation, Other


Description

  • Graph lifecycle and ingestion REST APIs

  • ECC rebuild tracking and status endpoint

  • PDF extraction via pymupdf4llm with tg:// images

  • UI setup, uploads, and cloud downloads


Diagram Walkthrough

flowchart LR
  UI["UI (Setup, Uploads)"]
  APIRouter["UI Router (graph, ingest, uploads, cloud)"]
  ECC["ECC Service (task tracking, status)"]
  PDF["pymupdf4llm PDF → Markdown"]
  MD["MarkdownProcessor (describe/replace to tg://)"]
  Agent["agent_graph (tg:// → /ui/image_vertex)"]

  UI -- "calls" --> APIRouter
  APIRouter -- "rebuild request" --> ECC
  APIRouter -- "create/init/ingest" --> "TigerGraph"
  PDF -- "markdown + images" --> MD
  MD -- "markdown with tg:// refs" --> Agent
  Agent -- "serve images" --> "/ui/image_vertex"
Loading

File Walkthrough

Relevant files
Enhancement
13 files
ui.py
Add graph APIs, ingestion, uploads, and cloud download     
+746/-6 
text_extractors.py
Use pymupdf4llm; emit `tg://` image references                     
+128/-118
image_data_extractor.py
Path-based LLM vision; remove legacy saver                             
+31/-131
markdown_parsing.py
New Markdown image parser and replacer                                     
+63/-0   
main.py
Track rebuild tasks; add rebuild status endpoint                 
+81/-6   
supportai.py
Rename forceupdate handler; add create graph API                 
+45/-2   
agent_graph.py
Convert `tg://` image links to UI endpoint                             
+14/-9   
inquiryai.py
Convert `tg://` image links in responses                                 
+5/-5     
ModeToggle.tsx
Add Setup button and logout confirmation dialog                   
+42/-17 
Bot.tsx
Refresh graph list on focus/navigation                                     
+29/-3   
confirm-dialog.tsx
New reusable confirmation dialog component                             
+64/-0   
useConfirm.tsx
New hook to request user confirmation                                       
+44/-0   
main.tsx
Register `/setup` route and layout integration                     
+5/-0     
Documentation
6 files
workers.py
Update comments for markdown image handling                           
+1/-1     
single_chunker.py
Clarify preserving markdown image references                         
+2/-2     
ecc_util.py
Note single chunker preserves image references                     
+1/-1     
README.md
Update OpenAI/Bedrock config and prompts path                       
+30/-21 
chatbot_response.txt
Instruct preserving markdown image references                       
+1/-1     
pymupdf4llm-AGPL-3.0.txt
Add pymupdf4llm AGPL-3.0 license file                                       
+661/-0 
Configuration changes
2 files
vite.config.ts
Proxy `/{graph}/graphrag/*` to backend                                     
+2/-1     
nginx.conf
Increase upload size and timeouts for `/ui`                           
+7/-2     
Dependencies
1 files
requirements.txt
Replace PyMuPDF with pymupdf4llm; add multipart                   
+3/-1     
Additional files
5 files
chatbot_response.txt +1/-1     
docker-compose.yml +0/-1     
nginx.conf +0/-32   
nginx.conf +0/-32   
Setup.tsx +1862/-0

@tg-pr-agent
Copy link

tg-pr-agent bot commented Nov 19, 2025

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 No relevant tests
🔒 Security concerns

Path traversal:
Upload and deletion endpoints in graphrag/app/routers/ui.py use user-provided filenames directly in filesystem paths (e.g., os.path.join(upload_dir, file.filename) and os.path.join(upload_dir, filename)). This allows attackers to traverse directories or overwrite arbitrary files. Sanitize via os.path.basename, restrict allowed characters, and reject paths containing path separators.

Potential command injection in GSQL: Endpoints constructing GSQL with f"CREATE GRAPH {graphname}()" accept unvalidated path params. Validate graph names against a strict regex (e.g., ^[A-Za-z_][A-Za-z0-9_]*$) or use APIs that avoid string interpolation.

Sensitive information exposure in logs: Cloud download handlers may log exceptions that include credentials (e.g., gcs_credentials_json parsing errors, SDK ClientError). Ensure logs redact secrets and remove full trace logs containing credentials (logger.debug_pii).

Minor info exposure: init_graph returns host_name which might reveal internal topology. Consider omitting or gating behind debug.

⚡ Recommended focus areas for review

Possible Issue

Missing import for httpx in the rebuild status endpoint will cause a NameError at runtime when calling the ECC status URL.

response = httpx.get(
    ecc_status_url,
    headers={"Authorization": f"Basic {auth}"},
    timeout=10.0
)
Security

Unsanitized filenames in upload and delete endpoints allow path traversal (using user-supplied filenames directly in os.path.join). Sanitize with os.path.basename, validate allowed characters, and reject paths containing separators. Review uploads, single-file delete, and bulk delete handlers.

            existing_files = []
            for file in files:
                file_path = os.path.join(upload_dir, file.filename)
                if os.path.exists(file_path):
                    existing_files.append(file.filename)

            if existing_files:
                return {
                    "status": "conflict",
                    "message": "Some files already exist. Set overwrite=true to replace them.",
                    "existing_files": existing_files,
                }

        # Save uploaded files
        uploaded_files = []
        total_size = 0

        for file in files:
            file_path = os.path.join(upload_dir, file.filename)

            # Write file to disk
            with open(file_path, "wb") as f:
                content = await file.read()
                f.write(content)
                file_size = len(content)
                total_size += file_size

            uploaded_files.append({
                "filename": file.filename,
                "size": file_size,
                "path": file_path,
            })

            logger.info(f"Uploaded file {file.filename} ({file_size} bytes) for graph {graphname}")

        return {
            "status": "success",
            "message": f"Successfully uploaded {len(uploaded_files)} file(s)",
            "graphname": graphname,
            "uploaded_files": uploaded_files,
            "total_size": total_size,
        }

    except Exception as e:
        exc = traceback.format_exc()
        logger.error(f"Error uploading files for graph {graphname}: {e}")
        logger.debug_pii(f"Upload error trace:\n{exc}")
        raise HTTPException(status_code=500, detail=f"Error uploading files: {str(e)}")


@router.delete(route_prefix + "/{graphname}/uploads")
async def clear_uploaded_files(
    graphname: str,
    creds: Annotated[tuple[list[str], HTTPBasicCredentials], Depends(ui_basic_auth)],
    filename: str | None = None,
):
    """
    Clear uploaded files for a specific graphname.

    Parameters:
    - graphname: The graph name whose files to clear
    - filename: If provided, only delete this specific file. Otherwise, delete all files.
    """
    try:
        upload_dir = os.path.join("uploads", graphname)

        if not os.path.exists(upload_dir):
            return {
                "status": "success",
                "message": f"No files found for graph {graphname}",
                "deleted_files": [],
            }

        deleted_files = []

        if filename:
            # Delete specific file
            file_path = os.path.join(upload_dir, filename)
            if os.path.exists(file_path) and os.path.isfile(file_path):
                os.remove(file_path)
                deleted_files.append(filename)
                logger.info(f"Deleted file {filename} for graph {graphname}")
            else:
                raise HTTPException(status_code=404, detail=f"File {filename} not found")
        else:
Possible Issue

Small image filter currently does nothing (pass) and continues processing, likely unintended. Either remove the check or explicitly skip/return to avoid processing tiny images.

if pil_image.width < 100 or pil_image.height < 100:
    pass

description = describe_image_with_llm(str(Path(file_path).absolute()))
description_lower = description.lower()

@prinskumar-tigergraph prinskumar-tigergraph changed the title GML-2011: Pymupdf4llm integration GML-2011 Pymupdf4llm integration Dec 1, 2025
<CloudLightning className="h-4 w-4 mr-2" />
Use Amazon BDA
<TabsTrigger value="s3">
<CloudCog className="h-4 w-4 mr-2" />
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@prinskumar-tigergraph can you please double check if your branch is based on latest main?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants