-
-
Notifications
You must be signed in to change notification settings - Fork 2
Description
When creating deepdive gists, it’s often useful to capture a concise snapshot of a GitHub repository — things like the repo URL, 'About' description, the main README title, and the opening paragraph. Doing this manually usually means copy/pasting from multiple places, which is a lot of manual repetitive effort and time consuming. The goal is to have a CLI helper that automates/assists with this process and outputs either machine-friendly JSON/CSV or human-friendly Markdown that can be dropped straight into research notes.
Two draft scripts explore this idea:
gh-repo-info(earlier prototype): A Bash script that attempts to fetch the description and extract the README’s first heading and paragraph. It includes format options (text/json/csv), but is only partially implemented and limited in scope/robustness.gh-repo-summary(later iteration): A more complete Zsh helper that supports multiple repos, star-sorting, and richer JSON fields. It normalizes topics, extracts intro content more reliably, and can render summaries as JSON or Markdown, making it more flexible for gist usage.
This issue will track refining these approaches and deciding on the best path forward for a reliable helper script to add into dotfiles.
Core Extraction Commands (Building Blocks)
These scripts combine a few key gh and parsing commands. Even without the full helpers, these base methods can be useful when capturing repo details manually.
-
Repository description
gh repo view OWNER/REPO --json description --jq .description
Fetches the 'About' description from the repo.
-
Repository metadata (rich fields)
gh repo view OWNER/REPO --json nameWithOwner,description,url,homepageUrl,repositoryTopics,stargazerCount,createdAt,updatedAt,pushedAt,isArchived,latestRelease
Retrieves richer metadata fields for automation/analysis.
Limitation: raw JSON only; needs further post-processing for clean Markdown or summaries.
-
README content (base64-decoded)
gh api repos/OWNER/REPO/readme --jq '.content | @base64d'Pulls the full README in Markdown.
Limitation: raw content can be noisy; needs parsing to extract intro/title.
-
README title (first heading)
gh api repos/OWNER/REPO/readme --jq '.content | @base64d' \ | grep -m 1 "^#" | sed 's/^#* //'
Extracts the first Markdown heading as a 'title'.
Limitation: brittle if the README has unusual formatting (HTML headings, decorative banners, etc).
-
README intro paragraph (first meaningful block)
gh api repos/OWNER/REPO/readme --jq '.content | @base64d' \ | awk ' BEGIN { inpara=0 } /^#/ { if (inpara) exit; next } # heading = stop if already capturing /^---/ { if (inpara) exit; next } # horizontal rule = stop if already capturing { # Handle blank lines if ($0 ~ /^[[:space:]]*$/) { if (inpara) { buf = buf "\n"; next } # keep internal blank lines else next # skip leading blanks } inpara=1 buf = buf $0 "\n" # accumulate content } END { sub(/^[[:space:]\n]+/, "", buf) # trim leading blank lines/spaces sub(/[[:space:]\n]+$/, "", buf) # trim trailing blank lines/spaces printf "%s", buf # print without adding extra newline } '
Attempts to extract the first meaningful block of README text after the title, continuing until the next heading or horizontal rule. Leading/trailing blank lines are trimmed, and no trailing newline is left in the output.
Limitation: fragile — can include decorative elements or miss content if formatting is non-standard.
We can chain these together (or mix in other variations like them) to:
- combine metadata and README extracts in one go,
- normalize topics and clean up text,
- handle multiple repos with flexible output formats (JSON/Markdown),
- produce cleaner, more reliable snippets ready to drop straight into deepdive gists.
gh-repo-info (earlier prototype)
A first attempt at solving the problem of extracting repo metadata + README context for use in deepdive gists:
- Purpose: Fetch repo description, README title, and intro paragraph.
- Status: Early WIP / partially implemented.
- Strengths: Simple, demonstrates the core idea; supports
--format(text/json/csv). - Limitations: Brittle parsing of README, limited metadata, not robust for varied formatting.
#!/bin/bash
SCRIPT_NAME="$(basename "$0")"
# Default values
verbose=false
output_format="text"
# Function to display usage
show_usage() {
cat << EOF
Usage: $SCRIPT_NAME [OPTIONS] REPOSITORY
Extract repository description and README information from a GitHub repository.
REPOSITORY can be a full GitHub URL (https://github.com/owner/repo) or
just the repo path (owner/repo).
Options:
-h, --help Show this help message and exit
-v, --verbose Show more detailed output
-f, --format FORMAT Output format: text (default), json, or csv
Examples:
$SCRIPT_NAME sourcegraph/lsif-protocol
$SCRIPT_NAME --format json https://github.com/sourcegraph/lsif-protocol
$SCRIPT_NAME -v kubernetes/kubernetes
This script requires the GitHub CLI (gh) to be installed and authenticated.
EOF
exit 0
}
error_exit() {
local msg="$1"
local footer="${2:-Usage: '$SCRIPT_NAME --help'}"
echo "Error: $msg" >&2
echo " $footer" >&2
exit 1
}
# Parse arguments
while [[ $# -gt 0 ]]; do
case "$1" in
-h|--help)
show_usage
;;
-v|--verbose)
verbose=true
shift
;;
-f|--format)
if [[ -n "$2" ]]; then
output_format="$2"
shift 2
else
error_exit "Error: --format requires an argument"
fi
;;
--) # end of options
shift
break
;;
-*) # unknown flag
error_exit "Unknown option: $1"
;;
*) # first non-flag arg = repo
break
;;
esac
done
# After getopt parsing, before resolving repos
for arg in "$@"; do
case "$arg" in
-*)
error_exit "Error: option '$arg' must be placed before the repository argument(s)."
;;
esac
done
# Check that exactly one repository is provided
if [ $# -lt 1 ]; then
error_exit "REPOSITORY is required"
elif [ $# -gt 1 ]; then
error_exit "Only a single REPOSITORY may be specified at a time"
fi
# Extract repo name from input
if [[ $1 =~ github.com/(.+) ]]; then
repo=${BASH_REMATCH[1]}
else
repo=$1
fi
# Check if gh is installed
if ! command -v gh &> /dev/null; then
error_exit \
"Error: GitHub CLI (gh) is not installed." \
"Please install it from https://cli.github.com/"
fi
# Check if gh is authenticated
if ! gh auth status &> /dev/null; then
error_exit \
"Error: GitHub CLI is not authenticated." \
"Please run 'gh auth login' first."
fi
# Get repository information
if $verbose; then
echo "Fetching data for $repo..."
fi
# Get repository description
if ! description=$(gh repo view "$repo" --json description --jq .description 2>/dev/null); then
error_exit \
"Error: Failed to fetch repository information." \
"Please check if the repository exists and that you have access."
fi
# Get README content and extract information
if ! readme_response=$(gh api "repos/$repo/readme" 2>/dev/null); then
readme_title="N/A"
readme_first_para="N/A"
if $verbose; then
echo "Warning: README not found or not accessible."
fi
else
readme_content=$(echo "$readme_response" | jq -r .content | base64 -d)
# Extract title (first heading)
if ! readme_title=$(echo "$readme_content" | grep -m 1 "^#" | sed 's/^#* //'); then
readme_title="N/A"
fi
# Extract first paragraph / intro block
if ! readme_first_para=$(
echo "$readme_content" | awk '
BEGIN { inpara=0 }
/^#/ { if (inpara) exit; next } # heading = stop if already capturing
/^---/ { if (inpara) exit; next } # horizontal rule = stop if already capturing
{
# Handle blank lines
if ($0 ~ /^[[:space:]]*$/) {
if (inpara) { buf = buf "\n"; next } # keep internal blank lines
else next # skip leading blanks
}
inpara=1
buf = buf $0 "\n" # accumulate content
}
END {
sub(/^[[:space:]\n]+/, "", buf) # trim leading blank lines/spaces
sub(/[[:space:]\n]+$/, "", buf) # trim trailing blank lines/spaces
printf "%s", buf # print without adding extra newline
}
'
); then
readme_first_para="N/A"
fi
fi
# Output based on format
case "$output_format" in
"json")
jq -n \
--arg repository "$repo" \
--arg description "$description" \
--arg readme_title "$readme_title" \
--arg readme_first_paragraph "$readme_first_para" \
'{
$repository,
$description,
$readme_title,
$readme_first_paragraph
}'
;;
"csv")
echo "Repository,Description,README Title,README First Paragraph"
printf '"%s","%s","%s","%s"\n' \
"$repo" \
"${description//\"/\"\"}" \
"${readme_title//\"/\"\"}" \
"${readme_first_para//\"/\"\"}"
;;
"text"|*)
echo "Repository: $repo"
echo "Description: $description"
echo "README Title: $readme_title"
echo "README First Paragraph:"
echo "$readme_first_para"
;;
esacgh-repo-summary (later iteration)
A more refined helper that builds on the prototype and adds richer features:
- Purpose: Provide clean JSON or Markdown summaries of one or more repos, ready for gist inclusion.
- Status: Functional, with more complete implementation.
- Strengths:
- Multiple repos at once
- Optional star-sorting
- Customizable JSON fields
- Cleaner Markdown output
- Normalizes topics and extracts intro text more reliably
- Limitations: More complex; help text and some flags still marked as TODO.
# TODO: Include the current local WIP script here