Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 6 additions & 6 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,11 +39,11 @@ flowchart LR

| Repositório | Descrição | Tecnologia |
|-------------|-----------|------------|
| [govbrnews-scraper](https://github.com/destaquesgovbr/govbrnews-scraper) | Scraper + Pipeline de dados | Python/Poetry |
| [destaquesgovbr-portal](https://github.com/destaquesgovbr/destaquesgovbr-portal) | Portal web principal | Next.js 15 |
| [destaquesgovbr-infra](https://github.com/destaquesgovbr/destaquesgovbr-infra) | Infraestrutura como código | Terraform/GCP |
| [destaquesgovbr-typesense](https://github.com/destaquesgovbr/destaquesgovbr-typesense) | Typesense para dev local | Docker |
| [destaquesgovbr-agencies](https://github.com/destaquesgovbr/destaquesgovbr-agencies) | Dados dos órgãos | YAML |
| [scraper](https://github.com/destaquesgovbr/scraper) | Scraper + Pipeline de dados | Python/Poetry |
| [portal](https://github.com/destaquesgovbr/portal) | Portal web principal | Next.js 15 |
| [infra](https://github.com/destaquesgovbr/infra) | Infraestrutura como código | Terraform/GCP |
| [typesense](https://github.com/destaquesgovbr/typesense) | Typesense para dev local | Docker |
| [agencies](https://github.com/destaquesgovbr/agencies) | Dados dos órgãos | YAML |

## Estrutura da Documentação

Expand All @@ -60,7 +60,7 @@ docs/

## Recursos Externos

- **Portal (Preview)**: [destaquesgovbr-portal](https://destaquesgovbr-portal-klvx64dufq-rj.a.run.app/) *(URL provisória)*
- **Portal (Preview)**: [portal](https://portal-klvx64dufq-rj.a.run.app/) *(URL provisória)*
- **Dataset Principal**: [nitaibezerra/govbrnews](https://huggingface.co/datasets/nitaibezerra/govbrnews)
- **Dataset Reduzido**: [nitaibezerra/govbrnews-reduced](https://huggingface.co/datasets/nitaibezerra/govbrnews-reduced)
- **Organização GitHub**: [github.com/destaquesgovbr](https://github.com/destaquesgovbr)
Expand Down
18 changes: 9 additions & 9 deletions docs/arquitetura/componentes-estruturantes.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ A árvore temática está duplicada em dois repositórios (sincronização manua

| Repositório | Arquivo | Formato |
|-------------|---------|---------|
| govbrnews-scraper | `src/enrichment/themes_tree.yaml` | YAML plano |
| destaquesgovbr-portal | `src/lib/themes.yaml` | YAML estruturado |
| scraper | `src/enrichment/themes_tree.yaml` | YAML plano |
| portal | `src/lib/themes.yaml` | YAML estruturado |

#### Formato no Scraper (`themes_tree.yaml`)
```yaml
Expand Down Expand Up @@ -124,11 +124,11 @@ Cada órgão possui:

| Repositório | Arquivo | Conteúdo |
|-------------|---------|----------|
| destaquesgovbr-agencies | `agencies.yaml` | Dados dos 156 órgãos |
| destaquesgovbr-agencies | `hierarchy.yaml` | Árvore hierárquica |
| destaquesgovbr-portal | `src/lib/agencies.yaml` | Cópia (sincronização manual) |
| govbrnews-scraper | `src/scraper/agencies.yaml` | Mapeamento ID → Nome |
| govbrnews-scraper | `src/scraper/site_urls.yaml` | URLs de raspagem |
| agencies | `agencies.yaml` | Dados dos 156 órgãos |
| agencies | `hierarchy.yaml` | Árvore hierárquica |
| portal | `src/lib/agencies.yaml` | Cópia (sincronização manual) |
| scraper | `src/scraper/agencies.yaml` | Mapeamento ID → Nome |
| scraper | `src/scraper/site_urls.yaml` | URLs de raspagem |

### Exemplo de Entrada

Expand Down Expand Up @@ -186,7 +186,7 @@ presidencia:

Automatizar sincronização:

1. Editar apenas em `destaquesgovbr-agencies`
1. Editar apenas em `agencies`
2. GitHub Action publica automaticamente no scraper e portal
3. Possível interface web para gestão

Expand Down Expand Up @@ -339,6 +339,6 @@ themes_tree.yaml → Cópia manual → portal/themes.yaml

### Futuro (Automatizado)
```
destaquesgovbr-agencies → GitHub Action → portal + scraper
agencies → GitHub Action → portal + scraper
destaquesgovbr-themes → GitHub Action → portal + scraper
```
6 changes: 3 additions & 3 deletions docs/arquitetura/visao-geral.md
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ flowchart TB

## Componentes por Camada

### 1. Coleta (`govbrnews-scraper`)
### 1. Coleta (`scraper`)

| Componente | Arquivo | Responsabilidade |
|------------|---------|------------------|
Expand Down Expand Up @@ -86,7 +86,7 @@ flowchart TB
| `category` | Categoria original do site |
| `tags` | Tags/keywords do site |

### 2. Enriquecimento (`govbrnews-scraper` + Cogfy)
### 2. Enriquecimento (`scraper` + Cogfy)

| Componente | Arquivo | Responsabilidade |
|------------|---------|------------------|
Expand Down Expand Up @@ -125,7 +125,7 @@ Configurado para:

| App | Tecnologia | URL |
|-----|------------|-----|
| Portal | Next.js 15 + Typesense | [destaquesgovbr-portal](https://destaquesgovbr-portal-klvx64dufq-rj.a.run.app/) *(provisória)* |
| Portal | Next.js 15 + Typesense | [portal](https://portal-klvx64dufq-rj.a.run.app/) *(provisória)* |
| Streamlit | Python + Altair | [HuggingFace Spaces](https://huggingface.co/spaces/nitaibezerra/govbrnews) |

## Fluxo de Dados Diário
Expand Down
14 changes: 7 additions & 7 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ flowchart LR

**Para criar sua Dev VM:**

1. Clone o repo [destaquesgovbr-infra](https://github.com/destaquesgovbr/destaquesgovbr-infra)
1. Clone o repo [infra](https://github.com/destaquesgovbr/infra)
2. Adicione sua configuração em `terraform/terraform.tfvars`
3. Abra um PR e aguarde o merge

Expand All @@ -70,15 +70,15 @@ flowchart LR

| Repositório | Descrição | Tecnologia |
|-------------|-----------|------------|
| [govbrnews-scraper](https://github.com/destaquesgovbr/govbrnews-scraper) | Scraper + Pipeline de dados | Python/Poetry |
| [destaquesgovbr-portal](https://github.com/destaquesgovbr/destaquesgovbr-portal) | Portal web principal | Next.js 15 |
| [destaquesgovbr-infra](https://github.com/destaquesgovbr/destaquesgovbr-infra) | Infraestrutura como código | Terraform/GCP |
| [destaquesgovbr-typesense](https://github.com/destaquesgovbr/destaquesgovbr-typesense) | Typesense para dev local | Docker |
| [destaquesgovbr-agencies](https://github.com/destaquesgovbr/destaquesgovbr-agencies) | Dados dos órgãos | YAML |
| [scraper](https://github.com/destaquesgovbr/scraper) | Scraper + Pipeline de dados | Python/Poetry |
| [portal](https://github.com/destaquesgovbr/portal) | Portal web principal | Next.js 15 |
| [infra](https://github.com/destaquesgovbr/infra) | Infraestrutura como código | Terraform/GCP |
| [typesense](https://github.com/destaquesgovbr/typesense) | Typesense para dev local | Docker |
| [agencies](https://github.com/destaquesgovbr/agencies) | Dados dos órgãos | YAML |

## Recursos Externos

- **Portal (Preview)**: [destaquesgovbr-portal](https://destaquesgovbr-portal-klvx64dufq-rj.a.run.app/) *(URL provisória)*
- **Portal (Preview)**: [portal](https://portal-klvx64dufq-rj.a.run.app/) *(URL provisória)*
- **Dataset Principal**: [nitaibezerra/govbrnews](https://huggingface.co/datasets/nitaibezerra/govbrnews)
- **Dataset Reduzido**: [nitaibezerra/govbrnews-reduced](https://huggingface.co/datasets/nitaibezerra/govbrnews-reduced)
- **Organização GitHub**: [github.com/destaquesgovbr](https://github.com/destaquesgovbr)
Expand Down
10 changes: 5 additions & 5 deletions docs/infraestrutura/arquitetura-gcp.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> Infraestrutura do DestaquesGovbr no Google Cloud Platform.

**Repositório**: [github.com/destaquesgovbr/destaquesgovbr-infra](https://github.com/destaquesgovbr/destaquesgovbr-infra) (privado)
**Repositório**: [github.com/destaquesgovbr/infra](https://github.com/destaquesgovbr/infra) (privado)

## Visão Geral

Expand Down Expand Up @@ -50,7 +50,7 @@ flowchart TB

| Propriedade | Valor |
|-------------|-------|
| Serviço | `destaquesgovbr-portal` |
| Serviço | `portal` |
| Região | `us-east1` |
| CPU | 1 |
| Memória | 512Mi |
Expand Down Expand Up @@ -204,13 +204,13 @@ curl http://localhost:8108/health

```bash
# Status do serviço
gcloud run services describe destaquesgovbr-portal --region=us-east1
gcloud run services describe portal --region=us-east1

# Logs
gcloud run services logs read destaquesgovbr-portal --region=us-east1
gcloud run services logs read portal --region=us-east1

# Métricas (via Console)
# Console > Cloud Run > destaquesgovbr-portal > Metrics
# Console > Cloud Run > portal > Metrics
```

### Compute Engine
Expand Down
8 changes: 4 additions & 4 deletions docs/infraestrutura/devvm.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

> Ambientes de desenvolvimento isolados no GCP para a equipe.

**Repositório**: [github.com/destaquesgovbr/destaquesgovbr-infra](https://github.com/destaquesgovbr/destaquesgovbr-infra) (privado)
**Repositório**: [github.com/destaquesgovbr/infra](https://github.com/destaquesgovbr/infra) (privado)

## Visão Geral

Expand Down Expand Up @@ -69,8 +69,8 @@ flowchart TB
### Passo 1: Clone o Repositório

```bash
git clone https://github.com/destaquesgovbr/destaquesgovbr-infra.git
cd destaquesgovbr-infra
git clone https://github.com/destaquesgovbr/infra.git
cd infra
```

### Passo 2: Crie uma Branch
Expand Down Expand Up @@ -196,7 +196,7 @@ mkdir -p /mnt/data/projects
cd /mnt/data/projects

# Clonar repositórios
git clone https://github.com/destaquesgovbr/govbrnews-scraper.git
git clone https://github.com/destaquesgovbr/scraper.git
```

!!! warning "Importante"
Expand Down
8 changes: 4 additions & 4 deletions docs/infraestrutura/secrets-iam.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,7 +224,7 @@ resource "google_service_account_iam_binding" "workload_identity" {
role = "roles/iam.workloadIdentityUser"

members = [
"principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github.name}/attribute.repository/destaquesgovbr/destaquesgovbr-portal"
"principalSet://iam.googleapis.com/${google_iam_workload_identity_pool.github.name}/attribute.repository/destaquesgovbr/portal"
]
}
```
Expand Down Expand Up @@ -254,7 +254,7 @@ gh secret set WIF_SERVICE_ACCOUNT --body "github-actions@project.iam.gserviceacc

### Secrets por Repositório

#### `destaquesgovbr-portal`
#### `portal`

| Secret | Descrição |
|--------|-----------|
Expand All @@ -265,15 +265,15 @@ gh secret set WIF_SERVICE_ACCOUNT --body "github-actions@project.iam.gserviceacc
| `TYPESENSE_PORT` | Porta (8108) |
| `TYPESENSE_API_KEY` | API Key do Typesense |

#### `govbrnews-scraper`
#### `scraper`

| Secret | Descrição |
|--------|-----------|
| `HF_TOKEN` | Token HuggingFace (write) |
| `COGFY_API_KEY` | API Key do Cogfy |
| `COGFY_COLLECTION_ID` | ID da collection Cogfy |

#### `destaquesgovbr-infra`
#### `infra`

| Secret | Descrição |
|--------|-----------|
Expand Down
8 changes: 4 additions & 4 deletions docs/infraestrutura/terraform-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,14 +2,14 @@

> Como gerenciar a infraestrutura GCP com Terraform.

**Repositório**: [github.com/destaquesgovbr/destaquesgovbr-infra](https://github.com/destaquesgovbr/destaquesgovbr-infra) (privado)
**Repositório**: [github.com/destaquesgovbr/infra](https://github.com/destaquesgovbr/infra) (privado)

## Visão Geral

A infraestrutura é gerenciada como código (IaC) usando Terraform:

```
destaquesgovbr-infra/
infra/
├── terraform/
│ ├── main.tf # Provider e networking
│ ├── variables.tf # Variáveis de entrada
Expand Down Expand Up @@ -68,7 +68,7 @@ zone = "us-east1-b"

# GitHub (para Workload Identity)
github_org = "destaquesgovbr"
github_repo = "destaquesgovbr-portal"
github_repo = "portal"
```

### 3. Inicializar Terraform
Expand Down Expand Up @@ -171,7 +171,7 @@ resource "google_compute_firewall" "typesense" {

```hcl
resource "google_cloud_run_v2_service" "portal" {
name = "destaquesgovbr-portal"
name = "portal"
location = var.region

template {
Expand Down
28 changes: 14 additions & 14 deletions docs/modulos/agencies.md
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
# Módulo: Agencies (destaquesgovbr-agencies)
# Módulo: Agencies (agencies)

> Catálogo centralizado de órgãos governamentais.

**Repositório**: [github.com/destaquesgovbr/destaquesgovbr-agencies](https://github.com/destaquesgovbr/destaquesgovbr-agencies)
**Repositório**: [github.com/destaquesgovbr/agencies](https://github.com/destaquesgovbr/agencies)

## Visão Geral

O repositório `destaquesgovbr-agencies` é a **fonte centralizada** de dados sobre os órgãos governamentais do Brasil, contendo:
O repositório `agencies` é a **fonte centralizada** de dados sobre os órgãos governamentais do Brasil, contendo:

- **156 órgãos** catalogados
- **29 tipos** diferentes (Ministério, Agência, Instituto, etc.)
- **Hierarquia organizacional** completa

```mermaid
flowchart TB
AG[destaquesgovbr-agencies] -->|Sincronização| SC[govbrnews-scraper]
AG -->|Sincronização| PO[destaquesgovbr-portal]
AG[agencies] -->|Sincronização| SC[scraper]
AG -->|Sincronização| PO[portal]

subgraph "Arquivos"
AG --> A1[agencies.yaml]
Expand All @@ -28,7 +28,7 @@ flowchart TB
## Estrutura do Repositório

```
destaquesgovbr-agencies/
agencies/
├── agencies.yaml # Dados completos dos órgãos
├── hierarchy.yaml # Árvore hierárquica
└── README.md
Expand Down Expand Up @@ -182,7 +182,7 @@ O scraper usa os dados para:
- Mapear **IDs para nomes** completos

```python
# govbrnews-scraper/src/scraper/agencies.yaml
# scraper/src/scraper/agencies.yaml
agencies:
gestao: Ministério da Gestão e da Inovação em Serviços Públicos
```
Expand All @@ -196,7 +196,7 @@ O portal usa para:
- **Navegação** hierárquica

```yaml
# destaquesgovbr-portal/src/lib/agencies.yaml
# portal/src/lib/agencies.yaml
sources:
gestao:
name: Ministério da Gestão...
Expand Down Expand Up @@ -229,9 +229,9 @@ flowchart LR
```

**Processo:**
1. Editar `destaquesgovbr-agencies/agencies.yaml`
2. Copiar manualmente para `govbrnews-scraper`
3. Copiar manualmente para `destaquesgovbr-portal`
1. Editar `agencies/agencies.yaml`
2. Copiar manualmente para `scraper`
3. Copiar manualmente para `portal`
4. Atualizar `site_urls.yaml` se necessário

### Situação Futura (Automática)
Expand All @@ -243,7 +243,7 @@ flowchart LR
```

**Meta:**
- Push em `destaquesgovbr-agencies` dispara workflow
- Push em `agencies` dispara workflow
- Workflow atualiza automaticamente scraper e portal
- PRs automáticos ou commits diretos

Expand Down Expand Up @@ -275,10 +275,10 @@ orgao-pai:

```bash
# Copiar para scraper
cp agencies.yaml ../govbrnews-scraper/src/scraper/agencies.yaml
cp agencies.yaml ../scraper/src/scraper/agencies.yaml

# Copiar para portal
cp agencies.yaml ../destaquesgovbr-portal/src/lib/agencies.yaml
cp agencies.yaml ../portal/src/lib/agencies.yaml

# Atualizar site_urls.yaml no scraper
```
Expand Down
12 changes: 6 additions & 6 deletions docs/modulos/arvore-tematica.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,13 +93,13 @@ Nível 1 (Tema) → Nível 2 (Subtema) → Nível 3 (Tópico)

| Repositório | Arquivo | Formato |
|-------------|---------|---------|
| govbrnews-scraper | `src/enrichment/themes_tree.yaml` | YAML plano |
| destaquesgovbr-portal | `src/lib/themes.yaml` | YAML estruturado |
| scraper | `src/enrichment/themes_tree.yaml` | YAML plano |
| portal | `src/lib/themes.yaml` | YAML estruturado |

### Formato no Scraper

```yaml
# govbrnews-scraper/src/enrichment/themes_tree.yaml
# scraper/src/enrichment/themes_tree.yaml
01 - Economia e Finanças:
01.01 - Política Econômica:
- 01.01.01 - Política Fiscal
Expand All @@ -118,7 +118,7 @@ Nível 1 (Tema) → Nível 2 (Subtema) → Nível 3 (Tópico)
### Formato no Portal

```yaml
# destaquesgovbr-portal/src/lib/themes.yaml
# portal/src/lib/themes.yaml
themes:
- label: Economia e Finanças
code: "01"
Expand Down Expand Up @@ -251,8 +251,8 @@ flowchart LR

### 3. Atualizar arquivos

1. Editar `govbrnews-scraper/src/enrichment/themes_tree.yaml`
2. Editar `destaquesgovbr-portal/src/lib/themes.yaml`
1. Editar `scraper/src/enrichment/themes_tree.yaml`
2. Editar `portal/src/lib/themes.yaml`
3. Atualizar configuração no Cogfy (via interface web)

### 4. Testar
Expand Down
Loading