[bin] `gh-repo-info`/`gh-repo-summary`: Explore CLI Helpers for Capturing GitHub Repo Summaries (eg. for Deepdive Gists)

When creating deepdive gists, it’s often useful to capture a concise snapshot of a GitHub repository — things like the repo URL, 'About' description, the main README title, and the opening paragraph. Doing this manually usually means copy/pasting from multiple places, which is a lot of manual repetitive effort and time consuming. The goal is to have a CLI helper that automates/assists with this process and outputs either machine-friendly JSON/CSV or human-friendly Markdown that can be dropped straight into research notes.

Two draft scripts explore this idea:

- **`gh-repo-info` (earlier prototype):** A Bash script that attempts to fetch the description and extract the README’s first heading and paragraph. It includes format options (text/json/csv), but is only partially implemented and limited in scope/robustness.
- **`gh-repo-summary` (later iteration):** A more complete Zsh helper that supports multiple repos, star-sorting, and richer JSON fields. It normalizes topics, extracts intro content more reliably, and can render summaries as JSON or Markdown, making it more flexible for gist usage.

This issue will track refining these approaches and deciding on the best path forward for a reliable helper script to add into dotfiles.

## Core Extraction Commands (Building Blocks)

These scripts combine a few key `gh` and parsing commands. Even without the full helpers, these base methods can be useful when capturing repo details manually.

- **Repository description**
  ```shell
  gh repo view OWNER/REPO --json description --jq .description
  ```
  Fetches the 'About' description from the repo.

- **Repository metadata (rich fields)**
  ```shell
  gh repo view OWNER/REPO --json nameWithOwner,description,url,homepageUrl,repositoryTopics,stargazerCount,createdAt,updatedAt,pushedAt,isArchived,latestRelease
  ```

  Retrieves richer metadata fields for automation/analysis.

  *Limitation:* raw JSON only; needs further post-processing for clean Markdown or summaries.

- **README content (base64-decoded)**
  ```shell
  gh api repos/OWNER/REPO/readme --jq '.content | @base64d'
  ```

  Pulls the full README in Markdown.

  *Limitation:* raw content can be noisy; needs parsing to extract intro/title.

- **README title (first heading)**
  ```shell
  gh api repos/OWNER/REPO/readme --jq '.content | @base64d' \
    | grep -m 1 "^#" | sed 's/^#* //'
  ```

  Extracts the first Markdown heading as a 'title'.

  *Limitation:* brittle if the README has unusual formatting (HTML headings, decorative banners, etc).

- **README intro paragraph (first meaningful block)**  
  ```shell
  gh api repos/OWNER/REPO/readme --jq '.content | @base64d' \
    | awk '
      BEGIN { inpara=0 }
      /^#/   { if (inpara) exit; next }          # heading = stop if already capturing
      /^---/ { if (inpara) exit; next }          # horizontal rule = stop if already capturing
      {
        # Handle blank lines
        if ($0 ~ /^[[:space:]]*$/) {
          if (inpara) { buf = buf "\n"; next }   # keep internal blank lines
          else next                              # skip leading blanks
        }
        inpara=1
        buf = buf $0 "\n"                        # accumulate content
      }
      END {
        sub(/^[[:space:]\n]+/, "", buf)          # trim leading blank lines/spaces
        sub(/[[:space:]\n]+$/, "", buf)          # trim trailing blank lines/spaces
        printf "%s", buf                         # print without adding extra newline
      }
    '
  ```

  Attempts to extract the first meaningful block of README text after the title, continuing until the next heading or horizontal rule. Leading/trailing blank lines are trimmed, and no trailing newline is left in the output.

  *Limitation:* fragile — can include decorative elements or miss content if formatting is non-standard.

We can chain these together (or mix in other variations like them) to:

- combine metadata and README extracts in one go,
- normalize topics and clean up text,
- handle multiple repos with flexible output formats (JSON/Markdown),
- produce cleaner, more reliable snippets ready to drop straight into deepdive gists.

## `gh-repo-info` (earlier prototype)

A first attempt at solving the problem of extracting repo metadata + README context for use in deepdive gists:

- **Purpose:** Fetch repo description, README title, and intro paragraph.
- **Status:** Early WIP / partially implemented.
- **Strengths:** Simple, demonstrates the core idea; supports `--format` (text/json/csv).
- **Limitations:** Brittle parsing of README, limited metadata, not robust for varied formatting.

```shell
#!/bin/bash

SCRIPT_NAME="$(basename "$0")"

# Default values
verbose=false
output_format="text"

# Function to display usage
show_usage() {
    cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] REPOSITORY

Extract repository description and README information from a GitHub repository.

REPOSITORY can be a full GitHub URL (https://github.com/owner/repo) or
just the repo path (owner/repo).

Options:
  -h, --help           Show this help message and exit
  -v, --verbose        Show more detailed output
  -f, --format FORMAT  Output format: text (default), json, or csv

Examples:
  $SCRIPT_NAME sourcegraph/lsif-protocol
  $SCRIPT_NAME --format json https://github.com/sourcegraph/lsif-protocol
  $SCRIPT_NAME -v kubernetes/kubernetes

This script requires the GitHub CLI (gh) to be installed and authenticated.
EOF
    exit 0
}

error_exit() {
  local msg="$1"
  local footer="${2:-Usage: '$SCRIPT_NAME --help'}"

  echo "Error: $msg" >&2
  echo "  $footer" >&2
  exit 1
}

# Parse arguments
while [[ $# -gt 0 ]]; do
  case "$1" in
    -h|--help)
      show_usage
      ;;
    -v|--verbose)
      verbose=true
      shift
      ;;
    -f|--format)
      if [[ -n "$2" ]]; then
        output_format="$2"
        shift 2
      else
        error_exit "Error: --format requires an argument"
      fi
      ;;
    --) # end of options
      shift
      break
      ;;
    -*) # unknown flag
      error_exit "Unknown option: $1"
      ;;
    *) # first non-flag arg = repo
      break
      ;;
  esac
done

# After getopt parsing, before resolving repos
for arg in "$@"; do
  case "$arg" in
    -*)
      error_exit "Error: option '$arg' must be placed before the repository argument(s)."
      ;;
  esac
done

# Check that exactly one repository is provided
if [ $# -lt 1 ]; then
    error_exit "REPOSITORY is required"
elif [ $# -gt 1 ]; then
    error_exit "Only a single REPOSITORY may be specified at a time"
fi

# Extract repo name from input
if [[ $1 =~ github.com/(.+) ]]; then
    repo=${BASH_REMATCH[1]}
else
    repo=$1
fi

# Check if gh is installed
if ! command -v gh &> /dev/null; then
    error_exit \
        "Error: GitHub CLI (gh) is not installed." \
        "Please install it from https://cli.github.com/"
fi

# Check if gh is authenticated
if ! gh auth status &> /dev/null; then
    error_exit \
        "Error: GitHub CLI is not authenticated." \
        "Please run 'gh auth login' first."
fi

# Get repository information
if $verbose; then
    echo "Fetching data for $repo..."
fi

# Get repository description
if ! description=$(gh repo view "$repo" --json description --jq .description 2>/dev/null); then
    error_exit \
        "Error: Failed to fetch repository information." \
        "Please check if the repository exists and that you have access."
fi

# Get README content and extract information
if ! readme_response=$(gh api "repos/$repo/readme" 2>/dev/null); then
    readme_title="N/A"
    readme_first_para="N/A"
    if $verbose; then
        echo "Warning: README not found or not accessible."
    fi
else
    readme_content=$(echo "$readme_response" | jq -r .content | base64 -d)

    # Extract title (first heading)
    if ! readme_title=$(echo "$readme_content" | grep -m 1 "^#" | sed 's/^#* //'); then
        readme_title="N/A"
    fi

    # Extract first paragraph / intro block
    if ! readme_first_para=$(
        echo "$readme_content" | awk '
          BEGIN { inpara=0 }
          /^#/   { if (inpara) exit; next }          # heading = stop if already capturing
          /^---/ { if (inpara) exit; next }          # horizontal rule = stop if already capturing
          {
            # Handle blank lines
            if ($0 ~ /^[[:space:]]*$/) {
              if (inpara) { buf = buf "\n"; next }   # keep internal blank lines
              else next                              # skip leading blanks
            }
            inpara=1
            buf = buf $0 "\n"                        # accumulate content
          }
          END {
            sub(/^[[:space:]\n]+/, "", buf)          # trim leading blank lines/spaces
            sub(/[[:space:]\n]+$/, "", buf)          # trim trailing blank lines/spaces
            printf "%s", buf                         # print without adding extra newline
          }
        '
    ); then
        readme_first_para="N/A"
    fi
fi

# Output based on format
case "$output_format" in
    "json")
        jq -n \
          --arg repository "$repo" \
          --arg description "$description" \
          --arg readme_title "$readme_title" \
          --arg readme_first_paragraph "$readme_first_para" \
          '{
            $repository,
            $description,
            $readme_title,
            $readme_first_paragraph
          }'
        ;;
    "csv")
        echo "Repository,Description,README Title,README First Paragraph"
        printf '"%s","%s","%s","%s"\n' \
          "$repo" \
          "${description//\"/\"\"}" \
          "${readme_title//\"/\"\"}" \
          "${readme_first_para//\"/\"\"}"
        ;;
    "text"|*)
        echo "Repository: $repo"
        echo "Description: $description"
        echo "README Title: $readme_title"
        echo "README First Paragraph:"
        echo "$readme_first_para"
        ;;
esac
```

## `gh-repo-summary` (later iteration)

A more refined helper that builds on the prototype and adds richer features:

- **Purpose:** Provide clean JSON or Markdown summaries of one or more repos, ready for gist inclusion.
- **Status:** Functional, with more complete implementation.
- **Strengths:**
  - Multiple repos at once
  - Optional star-sorting
  - Customizable JSON fields
  - Cleaner Markdown output
  - Normalizes topics and extracts intro text more reliably
- **Limitations:** More complex; help text and some flags still marked as TODO.

```shell
# TODO: Include the current local WIP script here
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[bin] `gh-repo-info`/`gh-repo-summary`: Explore CLI Helpers for Capturing GitHub Repo Summaries (eg. for Deepdive Gists) #35

Core Extraction Commands (Building Blocks)

`gh-repo-info` (earlier prototype)

`gh-repo-summary` (later iteration)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

[bin] gh-repo-info/gh-repo-summary: Explore CLI Helpers for Capturing GitHub Repo Summaries (eg. for Deepdive Gists) #35

Description

Core Extraction Commands (Building Blocks)

gh-repo-info (earlier prototype)

gh-repo-summary (later iteration)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

[bin] `gh-repo-info`/`gh-repo-summary`: Explore CLI Helpers for Capturing GitHub Repo Summaries (eg. for Deepdive Gists) #35

`gh-repo-info` (earlier prototype)

`gh-repo-summary` (later iteration)