Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ on:
# │ │ │ ┌────── restricted to month (1-12)
# │ │ │ │ ┌──── restricted to day of week (0-6, 0=Sunday)
# │ │ │ │ │ * means doesn't restrict anything
- cron: "0 9 * * 1" # Runs once every Monday at 9 AM
- cron: "0 9 * * 1" # Runs once every Monday at 9 AM UTC
workflow_dispatch:
inputs:
repo_type:
Expand Down Expand Up @@ -44,4 +44,4 @@ jobs:
GH_TOKEN: ${{ secrets.GH_TOKEN }}
GOOGLE_CREDENTIALS_PATH: service_account.json
REPO_TYPE: ${{ github.event.inputs.repo_type || 'all' }}
run: python export_repos.py
run: python gh_repo_exporter.py
37 changes: 37 additions & 0 deletions .github/workflows/hf-repo-exporter.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
name: Update Metadata for Hugging Face Repository Sheet

on:
workflow_dispatch:
schedule:
# ┌──────────── restricted to minute (0-59)
# │ ┌────────── restricted to hour (0-23)
# │ │ ┌──────── restricted to day of month (1-31)
# │ │ │ ┌────── restricted to month (1-12)
# │ │ │ │ ┌──── restricted to day of week (0-6, 0=Sunday)
# │ │ │ │ │ * means doesn't restrict anything
- cron: "0 9 * * 1" # Runs once every Monday at 9 AM UTC

jobs:
update-sheet:
runs-on: ubuntu-latest

steps:
- name: Checkout repo
uses: actions/checkout@v4

- name: Set up Python
uses: actions/setup-python@v5
with:
python-version: "3.x"

- name: Install dependencies
run: pip install -r requirements.txt

- name: Write Google credentials
run: printf "%s" '${{ secrets.GOOGLE_SERVICE_ACCOUNT_JSON }}' > service_account.json

- name: Run script
env:
HF_TOKEN: ${{ secrets.HF_TOKEN }}
GOOGLE_CREDENTIALS_PATH: service_account.json
run: python hf_repo_exporter.py
54 changes: 41 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@ A Python script that gathers metadata for all repositories in a GitHub organizat
- [Features](#features)
- [Usage](#usage)
- [Set up your own GitHub Actions workflow](#set-up-your-own-github-actions-workflow)
- [Create a GitHub Personal Access Token](#create-a-github-personal-access-token)
- [Create a GitHub Personal Access Token](#create-a-github-personal-access-token)
- [Create a Hugging Face Token](#create-a-hugging-face-token)
- [Set up Google Cloud Service Account Access](#set-up-google-cloud-service-account-access)
- [Run repo exporter locally](#run-repo-exporter-locally)
- [Important Notes](#important-notes)
Expand Down Expand Up @@ -42,17 +43,29 @@ To use this script within your own GitHub organization, first fork this repo, th

To create one with permissions for both private and public repositories (public repository read-access only is enabled by default without adminstrator approval):

1. Go to [github.com/settings/personal-access-tokens](https://github.com/settings/personal-access-tokens)
2. Click **Generate new token Fine-grained token**
3. Under **Resource owner**, select the **organization** you want to access.
4. Under **Repository access**, choose **All repositories**.
5. Under **Permissions** select **Repositories** and set:
1. Go to [github.com/settings/personal-access-tokens](https://github.com/settings/personal-access-tokens)
2. Click **Generate new token -> Fine-grained token**
3. Under **Resource owner**, select the **organization** you want to access.
4. Under **Repository access**, choose **All repositories**.
5. Under **Permissions** select **Repositories** and set:
- **Metadata** -> Read-only
- **Contents** -> Read-only
- **Adminstration** -> Read-only
6. Click **Generate token** and **copy it** (make sure to store it somewhere safe for future use).
7. Navigate to `https://github.com/<gh-org-name>/repo-exporter/settings/secrets/actions` and click **New repository secret** and name it **GH_TOKEN** and copy paste the token into the **Secret** section and click **Add secret**
**Note:** The token must be approved by the organization administrator before accessing private repositories.
6. Click **Generate token** and **copy it** (make sure to store it somewhere safe for future use).
7. Navigate to `https://github.com/<gh-org-name>/repo-exporter/settings/secrets/actions` and click **New repository secret** and name it **GH_TOKEN** and copy paste the token into the **Secret** section and click **Add secret**
**Note:** The token must be approved by the organization administrator before accessing private repositories.

### Create a Hugging Face Token

To create one with permissions for both private and public repositoriesL

1. Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens)
2. Click on **New Token** and name it **repo-exporter**
3. For permissions select **Fine-grained**:
- Specify the desired organization (under **Org permissions**)
- Under **Repositories**, select "Read access to contents of all repos in selected organizations"
4. Click **Generate** and **copy it** (make sure to store it somewhere safe for future use)
5. Navigate to `https://github.com/<gh-org-name>/repo-exporter/settings/secrets/actions` and click **New repository secret** and name it **HF_TOKEN** and copy paste the token into the **Secret** section and click **Add secret**

### Set up Google Cloud Service Account Access

Expand Down Expand Up @@ -97,10 +110,25 @@ Now update the script with [your GitHub Organization name](https://github.com/Im
pip install -r requirements.txt
```

5. Run the program
```
python export_repos.py
```
5. Run the exporters

You can run **either exporter individually** or **both**, depending on your needs:

- **Run only the GitHub repository exporter**
```
python gh_repo_exporter.py
```

- **Run only the Hugging Face repository exporter**
```
python hf_repo_exporter.py
```

- **Run both exporters (wait for one to finish before running the other)**
```
python hf_repo_exporter.py
python gh_repo_exporter.py
```

## Important Notes

Expand Down
11 changes: 6 additions & 5 deletions export_repos.py → gh_repo_exporter.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
import os
import pandas as pd
from github import Github, GithubException, Auth
import pandas as pd
from tqdm import tqdm
from datetime import datetime, timedelta, timezone
from google.oauth2.service_account import Credentials
import gspread
import yaml

from datetime import datetime, timedelta, timezone
import time
import os
import re
import gspread
from google.oauth2.service_account import Credentials

# Config
ORG_NAME = "Imageomics"
Expand Down
Loading